diff options
author | Liang Qi <liang.qi@qt.io> | 2017-03-06 14:33:16 +0100 |
---|---|---|
committer | Liang Qi <liang.qi@qt.io> | 2017-03-06 14:33:34 +0100 |
commit | f2dbc67c2b032a5f27d0224e020fb6dfcd3fd142 (patch) | |
tree | c5c195998f538fd7b9aa1382ced53caf2dec3926 /src/3rdparty/libwebp | |
parent | d2306d74850986692c02b70df0d7a6a6e933d0dc (diff) | |
parent | 03e507c8f3816053257b3c351d79c494b6f9174d (diff) | |
download | qtimageformats-f2dbc67c2b032a5f27d0224e020fb6dfcd3fd142.tar.gz |
Merge remote-tracking branch 'origin/5.8' into 5.9
Change-Id: I9cf7f04769944935d7b836453c7982839857a909
Diffstat (limited to 'src/3rdparty/libwebp')
71 files changed, 4164 insertions, 1724 deletions
diff --git a/src/3rdparty/libwebp/AUTHORS b/src/3rdparty/libwebp/AUTHORS index ea6e21f..0f382da 100644 --- a/src/3rdparty/libwebp/AUTHORS +++ b/src/3rdparty/libwebp/AUTHORS @@ -10,10 +10,13 @@ Contributors: - Lode Vandevenne (lode at google dot com) - Lou Quillio (louquillio at google dot com) - Mans Rullgard (mans at mansr dot com) +- Marcin Kowalczyk (qrczak at google dot com) - Martin Olsson (mnemo at minimum dot se) - Mikołaj Zalewski (mikolajz at google dot com) - Mislav Bradac (mislavm at google dot com) +- Nico Weber (thakis at chromium dot org) - Noel Chromium (noel at chromium dot org) +- Parag Salasakar (img dot mips1 at gmail dot com) - Pascal Massimino (pascal dot massimino at gmail dot com) - Paweł Hajdan, Jr (phajdan dot jr at chromium dot org) - Pierre Joye (pierre dot php at gmail dot com) diff --git a/src/3rdparty/libwebp/ChangeLog b/src/3rdparty/libwebp/ChangeLog index 2f8def2..99fb3c0 100644 --- a/src/3rdparty/libwebp/ChangeLog +++ b/src/3rdparty/libwebp/ChangeLog @@ -1,3 +1,179 @@ +deb54d9 Clarify the expected 'config' lifespan in WebPIDecode() +c7e2d24 update ChangeLog (tag: v0.5.1-rc5) +c7eb06f Fix corner case in CostManagerInit. +ab7937a gif2webp: normalize the number of .'s in the help message +3cdec84 vwebp: normalize the number of .'s in the help message +bdf6241 cwebp: normalize the number of .'s in the help message +06a38c7 fix rescaling bug: alpha plane wasn't filled with 0xff +319e37b Improve lossless compression. +447adbc 'our bug tracker' -> 'the bug tracker' +97b9e64 normalize the number of .'s in the help message +bb50bf4 pngdec,ReadFunc: throw an error on invalid read +38063af decode.h,WebPGetInfo: normalize function comment +9e8e1b7 Inline GetResidual for speed. +7d58d1b Speed-up uniform-region processing. +23e29cb Merge "Fix a boundary case in BackwardReferencesHashChainDistanceOnly." into 0.5.1 +0bb23b2 free -> WebPSafeFree() +e7b9177 Merge "DecodeImageData(): change the incorrect assert" into 0.5.1 +2abfa54 DecodeImageData(): change the incorrect assert +5a48fcd Merge "configure: test for -Wfloat-conversion" +0174d18 Fix a boundary case in BackwardReferencesHashChainDistanceOnly. +6a9c262 Merge "Added MSA optimized transform functions" +cfbcc5e Make sure to consider small distances in LZ77. +5e60c42 Added MSA optimized transform functions +3dc28d7 configure: test for -Wfloat-conversion +f2a0946 add some asserts to delimit the perimeter of CostManager's operation +9a583c6 fix invalid-write bug for alpha-decoding +f66512d make gradlew executable +6fda58f backward_references: quiet double->int warning +a48cc9d Merge "Fix a compression regression for images with long uniform regions." into 0.5.1 +cc2720c Merge "Revert an LZ77 boundary constant." into 0.5.1 +059aab4 Fix a compression regression for images with long uniform regions. +b0c7e49 Check more backward matches with higher quality. +a361151 Revert an LZ77 boundary constant. +8190374 README: fix typo +7551db4 update NEWS +0fb2269 bump version to 0.5.1 +f453761 update AUTHORS & .mailmap +3259571 Refactor GetColorPalette method. +1df5e26 avoid using tmp histogram in PreparePair() +7685123 fix comment typos +a246b92 Speedup backward references. +76d73f1 Merge "CostManager: introduce a free-list of ~10 intervals" +eab39d8 CostManager: introduce a free-list of ~10 intervals +4c59aac Merge "mips msa webp configuration" +043c33f Merge "Improve speed and compression in backward reference for lossless." +71be9b8 Merge "clarify variable names in HistogramRemap()" +0ba7fd7 Improve speed and compression in backward reference for lossless. +0481d42 CostManager: cache one interval and re-use it when possible +41b7e6b Merge "histogram: fix bin calculation" +96c3d62 histogram: fix bin calculation +fe9e31e clarify variable names in HistogramRemap() +ce3c824 disable near-lossless quantization if palette is used +e11da08 mips msa webp configuration +5f8f998 mux: Presence of unknown chunks should trigger VP8X chunk output. +cadec0b Merge "Sync mips32 and dsp_r2 YUV->RGB code with C verison" +d963775 Compute the hash chain once and for all for lossless compression. +50a4866 Sync mips32 and dsp_r2 YUV->RGB code with C verison +eee788e Merge "introduce a common signature for all image reader function" +d77b877 introduce a common signature for all image reader function +ca8d951 remove some obsolete TODOs +ae2a722 collect all decoding utilities from examples/ in libexampledec.a +0b8ae85 Merge "Move DitherCombine8x8 to dsp/dec.c" +77cad88 Merge "ReadWebP: avoid conversion to ARGB if final format is YUVA" +ab8d669 ReadWebP: avoid conversion to ARGB if final format is YUVA +f8b7ce9 Merge "test pointer to NULL explicitly" +5df6f21 test pointer to NULL explicitly +77f21c9 Move DitherCombine8x8 to dsp/dec.c +c9e6d86 Add gradle support +c65f41e Revert "Add gradle support" +bf731ed Add gradle support +08333b8 WebPAnimEncoder: Detect when canvas is modified, restore only when needed. +0209d7e Merge "speed-up MapToPalette() with binary search" +fdd29a3 speed-up MapToPalette() with binary search +cf4a651 Revert "Refactor GetColorPalette method." +0a27aca Merge changes Idfa8ce83,I19adc9c4 +f25c440 WebPAnimEncoder: Restore original canvas between multiple encodes. +169004b Refactor GetColorPalette method. +576362a VP8LDoFillBitWindow: support big-endian in fast path +ac49e4e bit_reader.c: s/VP8L_USE_UNALIGNED_LOAD/VP8L_USE_FAST_LOAD/ +d39ceb5 VP8LDoFillBitWindow: remove stale TODO +2ec2de1 Merge "Speed-up BackwardReferencesHashChainDistanceOnly." +3e023c1 Speed-up BackwardReferencesHashChainDistanceOnly. +f2e1efb Improve near lossless compression when a prediction filter is used. +e15afbc dsp.h: fix ubsan macro name +e53c9cc dsp.h: add WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW +af81fdb utils.h: quiet -fsanitize=undefined warnings +ea0be35 dsp.h: remove utils.h include +cd276ae utils/*.c: ../utils/utils.h -> ./utils.h +c892713 utils/Makefile.am: add some missing headers +ea24e02 Merge "dsp.h: add WEBP_UBSAN_IGNORE_UNDEF" +369e264 dsp.h: add WEBP_UBSAN_IGNORE_UNDEF +0d020a7 Merge "add runtime NEON detection" +5ee2136 Merge "add VP8LAddPixels() to lossless.h" +47435a6 add VP8LAddPixels() to lossless.h +8fa6ac6 remove two ubsan warnings +74fb56f add runtime NEON detection +4154a83 MIPS update to new Unfilter API +c80b9fc Merge "cherry-pick decoder fix for 64-bit android devices" +6235147 cherry-pick decoder fix for 64-bit android devices +d41b8c4 configure: test for -Wformat-* w/-Wformat present +5f95589 Fix WEBP_ALIGN in case the argument is a pointer to a type larger than a byte. +2309fd5 replace num_parts_ by num_parts_minus_one_ (unsigned) +9629f4b SimplifySegments: quiet -Warray-bounds warning +de47492 Merge "update the Unfilter API in dsp to process one row independently" +2102ccd update the Unfilter API in dsp to process one row independently +e3912d5 WebPAnimEncoder: Restore canvas before evaluating blending possibility. +6e12e1e WebPAnimEncoder: Fix for single-frame optimization. +602f344 Merge changes I1d03acac,Ifcb64219 +95ecccf only apply color-mapping for alpha on the cropped area +47dd070 anim_diff: Add an experimental option for max inter-frame diff. +aa809cf only allocate alpha_plane_ up to crop_bottom row +31f2b8d WebPAnimEncoder: FlattenSimilarPixels(): look for similar +774dfbd perform alpha filtering within the decoding loop +a4cae68 lossless decoding: only process decoded row up to last_row +238cdcd Only call WebPDequantizeLevels() on cropped area +cf6c713 alpha: preparatory cleanup +b95ac0a Merge "VP8GetHeaders(): initialize VP8Io with sane value for crop/scale dimensions" +8923139 VP8GetHeaders(): initialize VP8Io with sane value for crop/scale dimensions +5828e19 use_8b_decode -> use_8b_decode_ +8dca024 fix bug in alpha.c that was triggering a memory error in incremental mode +9a950c5 WebPAnimEncoder: Disable filtering when blending is used with lossy encoding. +eb42390 WebPAnimEncoder: choose max diff for framerect based on quality. +ff0a94b WebPAnimEncoder lossy: ignore small pixel differences for frame rectangles. +f804008 gif2webp: Remove the 'prev_to_prev_canvas' buffer. +6d8c07d Merge "WebPDequantizeLevels(): use stride in CountLevels()" +d96fe5e WebPDequantizeLevels(): use stride in CountLevels() +ec1b240 WebPPictureImport*: check output pointer +c076876 Merge "Revert "Re-enable encoding of alpha plane with color cache for next release."" +41f14bc WebPPictureImport*: check src pointer +64eed38 Pass stride parameter to WebPDequantizeLevels() +97934e2 Revert "Re-enable encoding of alpha plane with color cache for next release." +e88c4ca fix -m 2 mode-cost evaluation (causing partition0 overflow) +4562e83 Merge "add extra meaning to WebPDecBuffer::is_external_memory" +abdb109 add extra meaning to WebPDecBuffer::is_external_memory +875aec7 enc_neon,cosmetics: break long comment +71e856c GetMBSSIM,cosmetics: fix alignment +a90edff fix missing 'extern' for SSIM function in dsp/ +423ecaf move some SSIM-accumulation function for dsp/ +f08e662 Merge "Fix FindClosestDiscretized in near lossless:" +0d40cc5 enc_neon,Disto4x4: remove an unnecessary transpose +e8feb20 Fix FindClosestDiscretized in near lossless: +8200643 anim_util: quiet static analysis warning +a6f23c4 Merge "AnimEncoder: Support progress hook and user data." +a519377 Merge "Near lossless feature: fix some comments." +da98d31 AnimEncoder: Support progress hook and user data. +3335713 Near lossless feature: fix some comments. +0beed01 cosmetics: fix indent after 2f5e898 +6753f35 Merge "FTransformWHT optimization." +6583bb1 Improve SSE4.1 implementation of TTransform. +7561d0c FTransformWHT optimization. +7ccdb73 fix indentation after patch #328220 +6ec0d2a clarify the logic of the error path when decoding fails. +8aa352b Merge "Remove an unnecessary transposition in TTransform." +db86088 Merge "remove useless #include" +9960c31 Remove an unnecessary transposition in TTransform. +6e36b51 Small speedup in FTransform. +9dbd4aa Merge "fix C and SIMD flags completion." +e60853e Add missing common_sse2.h file to makefile.unix +696eb2b fix C and SIMD flags completion. +2b4fe33 Merge "fix multiple allocation for transform buffer" +2f5e898 fix multiple allocation for transform buffer +bf2b4f1 Regroup common SSE code + optimization. +4ed650a force "-pass 6" if -psnr or -size is used but -pass isn't. +3ef1ce9 yuv_sse2: fix -Wconstant-conversion warning +a7a03e9 Merge changes I4852d18f,I51ccb85d +5e122bd gif2webp: set enc_options.verbose = 0 w/-quiet +ab3c258 anim_encode,DefaultEncoderOptions: init verbose +8f0dee7 Merge "configure: fix builtin detection w/-Werror" +4a7b85a cmake: fix builtin detection w/-Werror +b74657f configure: fix builtin detection w/-Werror +3661b98 Add a CMakeLists.txt +75f4af4 remove useless #include +6c1d763 avoid Yoda style for comparison +8ce975a SSE optimization for vector mismatch. +7db5383 Merge tag 'v0.5.0' +37f0494 update ChangeLog (tag: v0.5.0-rc1, tag: v0.5.0, origin/0.5.0, 0.5.0) 7e7b6cc faster rgb565/rgb4444/argb output 4c7f565 update NEWS 1f62b6b update AUTHORS @@ -140,7 +316,7 @@ f7c507a Merge "remove unnecessary #include "yuv.h"" 7861578 for ReadXXXX() image-readers, use the value of pic->use_argb 14e4043 remove unnecessary #include "yuv.h" 469ba2c vwebp: fix incorrect clipping w/NO_BLEND -4b9186b update issue tracker url (master) +4b9186b update issue tracker url d64d376 change WEBP_ALIGN_CST value to 31 f717b82 vp8l.c, cosmetics: fix indent after 95509f9 927ccdc Merge "fix alignment of allocated memory in AllocateTransformBuffer" diff --git a/src/3rdparty/libwebp/NEWS b/src/3rdparty/libwebp/NEWS index a72f179..30554bf 100644 --- a/src/3rdparty/libwebp/NEWS +++ b/src/3rdparty/libwebp/NEWS @@ -1,3 +1,17 @@ +- 6/14/2016: version 0.5.1 + This is a binary compatible release. + * miscellaneous bug fixes (issues #280, #289) + * reverted alpha plane encoding with color cache for compatibility with + libwebp 0.4.0->0.4.3 (issues #291, #298) + * lossless encoding performance improvements + * memory reduction in both lossless encoding and decoding + * force mux output to be in the extended format (VP8X) when undefined chunks + are present (issue #294) + * gradle, cmake build support + * workaround for compiler bug causing 64-bit decode failures on android + devices using clang-3.8 in the r11c NDK + * various WebPAnimEncoder improvements + - 12/17/2015: version 0.5.0 * miscellaneous bug & build fixes (issues #234, #258, #274, #275, #278) * encoder & decoder speed-ups on x86/ARM/MIPS for lossy & lossless diff --git a/src/3rdparty/libwebp/README b/src/3rdparty/libwebp/README index 381b927..90f8f10 100644 --- a/src/3rdparty/libwebp/README +++ b/src/3rdparty/libwebp/README @@ -4,7 +4,7 @@ \__\__/\____/\_____/__/ ____ ___ / _/ / \ \ / _ \/ _/ / \_/ / / \ \ __/ \__ - \____/____/\_____/_____/____/v0.5.0 + \____/____/\_____/_____/____/v0.5.1 Description: ============ @@ -20,7 +20,7 @@ https://chromium.googlesource.com/webm/libwebp It is released under the same license as the WebM project. See http://www.webmproject.org/license/software/ or the -file "COPYING" file for details. An additional intellectual +"COPYING" file for details. An additional intellectual property rights grant can be found in the file PATENTS. Building: @@ -84,6 +84,73 @@ be installed independently using a minor modification in the corresponding Makefile.am configure files (see comments there). See './configure --help' for more options. +Building for MIPS Linux: +------------------------ +MIPS Linux toolchain stable available releases can be found at: +https://community.imgtec.com/developers/mips/tools/codescape-mips-sdk/available-releases/ + +# Add toolchain to PATH +export PATH=$PATH:/path/to/toolchain/bin + +# 32-bit build for mips32r5 (p5600) +HOST=mips-mti-linux-gnu +MIPS_CFLAGS="-O3 -mips32r5 -mabi=32 -mtune=p5600 -mmsa -mfp64 \ + -msched-weight -mload-store-pairs -fPIE" +MIPS_LDFLAGS="-mips32r5 -mabi=32 -mmsa -mfp64 -pie" + +# 64-bit build for mips64r6 (i6400) +HOST=mips-img-linux-gnu +MIPS_CFLAGS="-O3 -mips64r6 -mabi=64 -mtune=i6400 -mmsa -mfp64 \ + -msched-weight -mload-store-pairs -fPIE" +MIPS_LDFLAGS="-mips64r6 -mabi=64 -mmsa -mfp64 -pie" + +./configure --host=${HOST} --build=`config.guess` \ + CC="${HOST}-gcc -EL" \ + CFLAGS="$MIPS_CFLAGS" \ + LDFLAGS="$MIPS_LDFLAGS" +make +make install + +CMake: +------ +The support for CMake is minimal: it only helps you compile libwebp, cwebp and +dwebp. + +Prerequisites: +A compiler (e.g., gcc with autotools) and CMake. +On a Debian-like system the following should install everything you need for a +minimal build: +$ sudo apt-get install build-essential cmake + +When building from git sources, you will need to run cmake to generate the +configure script. + +mkdir build && cd build && cmake ../ +make +make install + +If you also want cwebp or dwebp, you will need to enable them through CMake: + +cmake -DWEBP_BUILD_CWEBP=ON -DWEBP_BUILD_DWEBP=ON ../ + +or through your favorite interface (like ccmake or cmake-qt-gui). + +Gradle: +------- +The support for Gradle is minimal: it only helps you compile libwebp, cwebp and +dwebp and webpmux_example. + +Prerequisites: +A compiler (e.g., gcc with autotools) and gradle. +On a Debian-like system the following should install everything you need for a +minimal build: +$ sudo apt-get install build-essential gradle + +When building from git sources, you will need to run the Gradle wrapper with the +appropriate target, e.g. : + +./gradlew buildAllExecutables + SWIG bindings: -------------- @@ -151,8 +218,8 @@ If input size (-s) for an image is not specified, it is assumed to be a PNG, JPEG, TIFF or WebP file. Options: - -h / -help ............ short help - -H / -longhelp ........ long help + -h / -help ............. short help + -H / -longhelp ......... long help -q <float> ............. quality factor (0:small..100:big) -alpha_q <int> ......... transparency-compression quality (0..100) -preset <string> ....... preset setting, one of: @@ -274,7 +341,7 @@ Use following options to convert into alternate image formats: -yuv ......... save the raw YUV samples in flat layout Other options are: - -version .... print version number and exit + -version ..... print version number and exit -nofancy ..... don't use the fancy YUV420 upscaler -nofilter .... disable in-loop filtering -nodither .... disable dithering @@ -286,8 +353,8 @@ Use following options to convert into alternate image formats: -flip ........ flip the output vertically -alpha ....... only save the alpha plane -incremental . use incremental decoding (useful for tests) - -h ....... this help message - -v ....... verbose (e.g. print encoding/decoding times) + -h ........... this help message + -v ........... verbose (e.g. print encoding/decoding times) -quiet ....... quiet mode, don't print anything -noasm ....... disable all assembly optimizations @@ -303,7 +370,7 @@ Usage: vwebp in_file [options] Decodes the WebP image file and visualize it using OpenGL Options are: - -version .... print version number and exit + -version ..... print version number and exit -noicc ....... don't use the icc profile if present -nofancy ..... don't use the fancy YUV420 upscaler -nofilter .... disable in-loop filtering @@ -311,7 +378,7 @@ Options are: -noalphadither disable alpha plane dithering -mt .......... use multi-threading -info ........ print info - -h ....... this help message + -h ........... this help message Keyboard shortcuts: 'c' ................ toggle use of color profile @@ -353,7 +420,7 @@ vwebp. Usage: gif2webp [options] gif_file -o webp_file Options: - -h / -help ............ this help + -h / -help ............. this help -lossy ................. encode image using lossy compression -mixed ................. for each frame in the image, pick lossy or lossless compression heuristically @@ -637,7 +704,7 @@ an otherwise too-large picture. Some CPU can be saved too, incidentally. Bugs: ===== -Please report all bugs to our issue tracker: +Please report all bugs to the issue tracker: https://bugs.chromium.org/p/webp Patches welcome! See this page to get started: http://www.webmproject.org/code/contribute/submitting-patches/ diff --git a/src/3rdparty/libwebp/qt_attribution.json b/src/3rdparty/libwebp/qt_attribution.json index 6084e7a..825cfea 100644 --- a/src/3rdparty/libwebp/qt_attribution.json +++ b/src/3rdparty/libwebp/qt_attribution.json @@ -6,7 +6,7 @@ "Description": "WebP is a new image format that provides lossless and lossy compression for images on the web.", "Homepage": "https://developers.google.com/speed/webp/", - "Version": "0.5.0", + "Version": "0.5.1", "License": "BSD 3-clause \"New\" or \"Revised\" License", "LicenseId": "BSD-3-Clause", "LicenseFile": "COPYING", diff --git a/src/3rdparty/libwebp/src/dec/alpha.c b/src/3rdparty/libwebp/src/dec/alpha.c index 52216fc..028eb3d 100644 --- a/src/3rdparty/libwebp/src/dec/alpha.c +++ b/src/3rdparty/libwebp/src/dec/alpha.c @@ -23,12 +23,14 @@ //------------------------------------------------------------------------------ // ALPHDecoder object. -ALPHDecoder* ALPHNew(void) { +// Allocates a new alpha decoder instance. +static ALPHDecoder* ALPHNew(void) { ALPHDecoder* const dec = (ALPHDecoder*)WebPSafeCalloc(1ULL, sizeof(*dec)); return dec; } -void ALPHDelete(ALPHDecoder* const dec) { +// Clears and deallocates an alpha decoder instance. +static void ALPHDelete(ALPHDecoder* const dec) { if (dec != NULL) { VP8LDelete(dec->vp8l_dec_); dec->vp8l_dec_ = NULL; @@ -44,17 +46,21 @@ void ALPHDelete(ALPHDecoder* const dec) { // Returns false in case of error in alpha header (data too short, invalid // compression method or filter, error in lossless header data etc). static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data, - size_t data_size, int width, int height, uint8_t* output) { + size_t data_size, const VP8Io* const src_io, + uint8_t* output) { int ok = 0; const uint8_t* const alpha_data = data + ALPHA_HEADER_LEN; const size_t alpha_data_size = data_size - ALPHA_HEADER_LEN; int rsrv; + VP8Io* const io = &dec->io_; - assert(width > 0 && height > 0); - assert(data != NULL && output != NULL); + assert(data != NULL && output != NULL && src_io != NULL); - dec->width_ = width; - dec->height_ = height; + VP8FiltersInit(); + dec->output_ = output; + dec->width_ = src_io->width; + dec->height_ = src_io->height; + assert(dec->width_ > 0 && dec->height_ > 0); if (data_size <= ALPHA_HEADER_LEN) { return 0; @@ -72,14 +78,28 @@ static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data, return 0; } + // Copy the necessary parameters from src_io to io + VP8InitIo(io); + WebPInitCustomIo(NULL, io); + io->opaque = dec; + io->width = src_io->width; + io->height = src_io->height; + + io->use_cropping = src_io->use_cropping; + io->crop_left = src_io->crop_left; + io->crop_right = src_io->crop_right; + io->crop_top = src_io->crop_top; + io->crop_bottom = src_io->crop_bottom; + // No need to copy the scaling parameters. + if (dec->method_ == ALPHA_NO_COMPRESSION) { const size_t alpha_decoded_size = dec->width_ * dec->height_; ok = (alpha_data_size >= alpha_decoded_size); } else { assert(dec->method_ == ALPHA_LOSSLESS_COMPRESSION); - ok = VP8LDecodeAlphaHeader(dec, alpha_data, alpha_data_size, output); + ok = VP8LDecodeAlphaHeader(dec, alpha_data, alpha_data_size); } - VP8FiltersInit(); + return ok; } @@ -90,15 +110,30 @@ static int ALPHInit(ALPHDecoder* const dec, const uint8_t* data, static int ALPHDecode(VP8Decoder* const dec, int row, int num_rows) { ALPHDecoder* const alph_dec = dec->alph_dec_; const int width = alph_dec->width_; - const int height = alph_dec->height_; - WebPUnfilterFunc unfilter_func = WebPUnfilters[alph_dec->filter_]; - uint8_t* const output = dec->alpha_plane_; + const int height = alph_dec->io_.crop_bottom; if (alph_dec->method_ == ALPHA_NO_COMPRESSION) { - const size_t offset = row * width; - const size_t num_pixels = num_rows * width; - assert(dec->alpha_data_size_ >= ALPHA_HEADER_LEN + offset + num_pixels); - memcpy(dec->alpha_plane_ + offset, - dec->alpha_data_ + ALPHA_HEADER_LEN + offset, num_pixels); + int y; + const uint8_t* prev_line = dec->alpha_prev_line_; + const uint8_t* deltas = dec->alpha_data_ + ALPHA_HEADER_LEN + row * width; + uint8_t* dst = dec->alpha_plane_ + row * width; + assert(deltas <= &dec->alpha_data_[dec->alpha_data_size_]); + if (alph_dec->filter_ != WEBP_FILTER_NONE) { + assert(WebPUnfilters[alph_dec->filter_] != NULL); + for (y = 0; y < num_rows; ++y) { + WebPUnfilters[alph_dec->filter_](prev_line, deltas, dst, width); + prev_line = dst; + dst += width; + deltas += width; + } + } else { + for (y = 0; y < num_rows; ++y) { + memcpy(dst, deltas, width * sizeof(*dst)); + prev_line = dst; + dst += width; + deltas += width; + } + } + dec->alpha_prev_line_ = prev_line; } else { // alph_dec->method_ == ALPHA_LOSSLESS_COMPRESSION assert(alph_dec->vp8l_dec_ != NULL); if (!VP8LDecodeAlphaImageStream(alph_dec, row + num_rows)) { @@ -106,62 +141,92 @@ static int ALPHDecode(VP8Decoder* const dec, int row, int num_rows) { } } - if (unfilter_func != NULL) { - unfilter_func(width, height, width, row, num_rows, output); + if (row + num_rows >= height) { + dec->is_alpha_decoded_ = 1; } + return 1; +} - if (row + num_rows == dec->pic_hdr_.height_) { - dec->is_alpha_decoded_ = 1; +static int AllocateAlphaPlane(VP8Decoder* const dec, const VP8Io* const io) { + const int stride = io->width; + const int height = io->crop_bottom; + const uint64_t alpha_size = (uint64_t)stride * height; + assert(dec->alpha_plane_mem_ == NULL); + dec->alpha_plane_mem_ = + (uint8_t*)WebPSafeMalloc(alpha_size, sizeof(*dec->alpha_plane_)); + if (dec->alpha_plane_mem_ == NULL) { + return 0; } + dec->alpha_plane_ = dec->alpha_plane_mem_; + dec->alpha_prev_line_ = NULL; return 1; } +void WebPDeallocateAlphaMemory(VP8Decoder* const dec) { + assert(dec != NULL); + WebPSafeFree(dec->alpha_plane_mem_); + dec->alpha_plane_mem_ = NULL; + dec->alpha_plane_ = NULL; + ALPHDelete(dec->alph_dec_); + dec->alph_dec_ = NULL; +} + //------------------------------------------------------------------------------ // Main entry point. const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec, + const VP8Io* const io, int row, int num_rows) { - const int width = dec->pic_hdr_.width_; - const int height = dec->pic_hdr_.height_; + const int width = io->width; + const int height = io->crop_bottom; + + assert(dec != NULL && io != NULL); if (row < 0 || num_rows <= 0 || row + num_rows > height) { return NULL; // sanity check. } - if (row == 0) { - // Initialize decoding. - assert(dec->alpha_plane_ != NULL); - dec->alph_dec_ = ALPHNew(); - if (dec->alph_dec_ == NULL) return NULL; - if (!ALPHInit(dec->alph_dec_, dec->alpha_data_, dec->alpha_data_size_, - width, height, dec->alpha_plane_)) { - ALPHDelete(dec->alph_dec_); - dec->alph_dec_ = NULL; - return NULL; - } - // if we allowed use of alpha dithering, check whether it's needed at all - if (dec->alph_dec_->pre_processing_ != ALPHA_PREPROCESSED_LEVELS) { - dec->alpha_dithering_ = 0; // disable dithering - } else { - num_rows = height; // decode everything in one pass + if (!dec->is_alpha_decoded_) { + if (dec->alph_dec_ == NULL) { // Initialize decoder. + dec->alph_dec_ = ALPHNew(); + if (dec->alph_dec_ == NULL) return NULL; + if (!AllocateAlphaPlane(dec, io)) goto Error; + if (!ALPHInit(dec->alph_dec_, dec->alpha_data_, dec->alpha_data_size_, + io, dec->alpha_plane_)) { + goto Error; + } + // if we allowed use of alpha dithering, check whether it's needed at all + if (dec->alph_dec_->pre_processing_ != ALPHA_PREPROCESSED_LEVELS) { + dec->alpha_dithering_ = 0; // disable dithering + } else { + num_rows = height - row; // decode everything in one pass + } } - } - if (!dec->is_alpha_decoded_) { - int ok = 0; assert(dec->alph_dec_ != NULL); - ok = ALPHDecode(dec, row, num_rows); - if (ok && dec->alpha_dithering_ > 0) { - ok = WebPDequantizeLevels(dec->alpha_plane_, width, height, - dec->alpha_dithering_); - } - if (!ok || dec->is_alpha_decoded_) { + assert(row + num_rows <= height); + if (!ALPHDecode(dec, row, num_rows)) goto Error; + + if (dec->is_alpha_decoded_) { // finished? ALPHDelete(dec->alph_dec_); dec->alph_dec_ = NULL; + if (dec->alpha_dithering_ > 0) { + uint8_t* const alpha = dec->alpha_plane_ + io->crop_top * width + + io->crop_left; + if (!WebPDequantizeLevels(alpha, + io->crop_right - io->crop_left, + io->crop_bottom - io->crop_top, + width, dec->alpha_dithering_)) { + goto Error; + } + } } - if (!ok) return NULL; // Error. } // Return a pointer to the current decoded row. return dec->alpha_plane_ + row * width; + + Error: + WebPDeallocateAlphaMemory(dec); + return NULL; } diff --git a/src/3rdparty/libwebp/src/dec/alphai.h b/src/3rdparty/libwebp/src/dec/alphai.h index 5fa230c..69dd7c0 100644 --- a/src/3rdparty/libwebp/src/dec/alphai.h +++ b/src/3rdparty/libwebp/src/dec/alphai.h @@ -32,19 +32,18 @@ struct ALPHDecoder { int pre_processing_; struct VP8LDecoder* vp8l_dec_; VP8Io io_; - int use_8b_decode; // Although alpha channel requires only 1 byte per - // pixel, sometimes VP8LDecoder may need to allocate - // 4 bytes per pixel internally during decode. + int use_8b_decode_; // Although alpha channel requires only 1 byte per + // pixel, sometimes VP8LDecoder may need to allocate + // 4 bytes per pixel internally during decode. + uint8_t* output_; + const uint8_t* prev_line_; // last output row (or NULL) }; //------------------------------------------------------------------------------ // internal functions. Not public. -// Allocates a new alpha decoder instance. -ALPHDecoder* ALPHNew(void); - -// Clears and deallocates an alpha decoder instance. -void ALPHDelete(ALPHDecoder* const dec); +// Deallocate memory associated to dec->alpha_plane_ decoding +void WebPDeallocateAlphaMemory(VP8Decoder* const dec); //------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/dec/buffer.c b/src/3rdparty/libwebp/src/dec/buffer.c index 9ed2b3f..547e69b 100644 --- a/src/3rdparty/libwebp/src/dec/buffer.c +++ b/src/3rdparty/libwebp/src/dec/buffer.c @@ -92,7 +92,7 @@ static VP8StatusCode AllocateBuffer(WebPDecBuffer* const buffer) { return VP8_STATUS_INVALID_PARAM; } - if (!buffer->is_external_memory && buffer->private_memory == NULL) { + if (buffer->is_external_memory <= 0 && buffer->private_memory == NULL) { uint8_t* output; int uv_stride = 0, a_stride = 0; uint64_t uv_size = 0, a_size = 0, total_size; @@ -227,7 +227,7 @@ int WebPInitDecBufferInternal(WebPDecBuffer* buffer, int version) { void WebPFreeDecBuffer(WebPDecBuffer* buffer) { if (buffer != NULL) { - if (!buffer->is_external_memory) { + if (buffer->is_external_memory <= 0) { WebPSafeFree(buffer->private_memory); } buffer->private_memory = NULL; @@ -256,5 +256,45 @@ void WebPGrabDecBuffer(WebPDecBuffer* const src, WebPDecBuffer* const dst) { } } -//------------------------------------------------------------------------------ +VP8StatusCode WebPCopyDecBufferPixels(const WebPDecBuffer* const src_buf, + WebPDecBuffer* const dst_buf) { + assert(src_buf != NULL && dst_buf != NULL); + assert(src_buf->colorspace == dst_buf->colorspace); + + dst_buf->width = src_buf->width; + dst_buf->height = src_buf->height; + if (CheckDecBuffer(dst_buf) != VP8_STATUS_OK) { + return VP8_STATUS_INVALID_PARAM; + } + if (WebPIsRGBMode(src_buf->colorspace)) { + const WebPRGBABuffer* const src = &src_buf->u.RGBA; + const WebPRGBABuffer* const dst = &dst_buf->u.RGBA; + WebPCopyPlane(src->rgba, src->stride, dst->rgba, dst->stride, + src_buf->width * kModeBpp[src_buf->colorspace], + src_buf->height); + } else { + const WebPYUVABuffer* const src = &src_buf->u.YUVA; + const WebPYUVABuffer* const dst = &dst_buf->u.YUVA; + WebPCopyPlane(src->y, src->y_stride, dst->y, dst->y_stride, + src_buf->width, src_buf->height); + WebPCopyPlane(src->u, src->u_stride, dst->u, dst->u_stride, + (src_buf->width + 1) / 2, (src_buf->height + 1) / 2); + WebPCopyPlane(src->v, src->v_stride, dst->v, dst->v_stride, + (src_buf->width + 1) / 2, (src_buf->height + 1) / 2); + if (WebPIsAlphaMode(src_buf->colorspace)) { + WebPCopyPlane(src->a, src->a_stride, dst->a, dst->a_stride, + src_buf->width, src_buf->height); + } + } + return VP8_STATUS_OK; +} +int WebPAvoidSlowMemory(const WebPDecBuffer* const output, + const WebPBitstreamFeatures* const features) { + assert(output != NULL); + return (output->is_external_memory >= 2) && + WebPIsPremultipliedMode(output->colorspace) && + (features != NULL && features->has_alpha); +} + +//------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/dec/frame.c b/src/3rdparty/libwebp/src/dec/frame.c index b882133..22d291d 100644 --- a/src/3rdparty/libwebp/src/dec/frame.c +++ b/src/3rdparty/libwebp/src/dec/frame.c @@ -316,6 +316,9 @@ static void PrecomputeFilterStrengths(VP8Decoder* const dec) { //------------------------------------------------------------------------------ // Dithering +// minimal amp that will provide a non-zero dithering effect +#define MIN_DITHER_AMP 4 + #define DITHER_AMP_TAB_SIZE 12 static const int kQuantToDitherAmp[DITHER_AMP_TAB_SIZE] = { // roughly, it's dqm->uv_mat_[1] @@ -356,27 +359,14 @@ void VP8InitDithering(const WebPDecoderOptions* const options, } } -// minimal amp that will provide a non-zero dithering effect -#define MIN_DITHER_AMP 4 -#define DITHER_DESCALE 4 -#define DITHER_DESCALE_ROUNDER (1 << (DITHER_DESCALE - 1)) -#define DITHER_AMP_BITS 8 -#define DITHER_AMP_CENTER (1 << DITHER_AMP_BITS) - +// Convert to range: [-2,2] for dither=50, [-4,4] for dither=100 static void Dither8x8(VP8Random* const rg, uint8_t* dst, int bps, int amp) { - int i, j; - for (j = 0; j < 8; ++j) { - for (i = 0; i < 8; ++i) { - // TODO: could be made faster with SSE2 - const int bits = - VP8RandomBits2(rg, DITHER_AMP_BITS + 1, amp) - DITHER_AMP_CENTER; - // Convert to range: [-2,2] for dither=50, [-4,4] for dither=100 - const int delta = (bits + DITHER_DESCALE_ROUNDER) >> DITHER_DESCALE; - const int v = (int)dst[i] + delta; - dst[i] = (v < 0) ? 0 : (v > 255) ? 255u : (uint8_t)v; - } - dst += bps; + uint8_t dither[64]; + int i; + for (i = 0; i < 8 * 8; ++i) { + dither[i] = VP8RandomBits2(rg, VP8_DITHER_AMP_BITS + 1, amp); } + VP8DitherCombine8x8(dither, dst, bps); } static void DitherRow(VP8Decoder* const dec) { @@ -462,7 +452,7 @@ static int FinishRow(VP8Decoder* const dec, VP8Io* const io) { if (dec->alpha_data_ != NULL && y_start < y_end) { // TODO(skal): testing presence of alpha with dec->alpha_data_ is not a // good idea. - io->a = VP8DecompressAlphaRows(dec, y_start, y_end - y_start); + io->a = VP8DecompressAlphaRows(dec, io, y_start, y_end - y_start); if (io->a == NULL) { return VP8SetError(dec, VP8_STATUS_BITSTREAM_ERROR, "Could not decode alpha data."); diff --git a/src/3rdparty/libwebp/src/dec/idec.c b/src/3rdparty/libwebp/src/dec/idec.c index e0cf0c9..8de1319 100644 --- a/src/3rdparty/libwebp/src/dec/idec.c +++ b/src/3rdparty/libwebp/src/dec/idec.c @@ -70,7 +70,9 @@ struct WebPIDecoder { VP8Io io_; MemBuffer mem_; // input memory buffer. - WebPDecBuffer output_; // output buffer (when no external one is supplied) + WebPDecBuffer output_; // output buffer (when no external one is supplied, + // or if the external one has slow-memory) + WebPDecBuffer* final_output_; // Slow-memory output to copy to eventually. size_t chunk_size_; // Compressed VP8/VP8L size extracted from Header. int last_mb_y_; // last row reached for intra-mode decoding @@ -118,9 +120,9 @@ static void DoRemap(WebPIDecoder* const idec, ptrdiff_t offset) { if (idec->dec_ != NULL) { if (!idec->is_lossless_) { VP8Decoder* const dec = (VP8Decoder*)idec->dec_; - const int last_part = dec->num_parts_ - 1; + const uint32_t last_part = dec->num_parts_minus_one_; if (offset != 0) { - int p; + uint32_t p; for (p = 0; p <= last_part; ++p) { VP8RemapBitReader(dec->parts_ + p, offset); } @@ -132,7 +134,6 @@ static void DoRemap(WebPIDecoder* const idec, ptrdiff_t offset) { } { const uint8_t* const last_start = dec->parts_[last_part].buf_; - assert(last_part >= 0); VP8BitReaderSetBuffer(&dec->parts_[last_part], last_start, mem->buf_ + mem->end_ - last_start); } @@ -249,10 +250,16 @@ static VP8StatusCode FinishDecoding(WebPIDecoder* const idec) { idec->state_ = STATE_DONE; if (options != NULL && options->flip) { - return WebPFlipBuffer(output); - } else { - return VP8_STATUS_OK; + const VP8StatusCode status = WebPFlipBuffer(output); + if (status != VP8_STATUS_OK) return status; + } + if (idec->final_output_ != NULL) { + WebPCopyDecBufferPixels(output, idec->final_output_); // do the slow-copy + WebPFreeDecBuffer(&idec->output_); + *output = *idec->final_output_; + idec->final_output_ = NULL; } + return VP8_STATUS_OK; } //------------------------------------------------------------------------------ @@ -457,19 +464,20 @@ static VP8StatusCode DecodeRemaining(WebPIDecoder* const idec) { } for (; dec->mb_x_ < dec->mb_w_; ++dec->mb_x_) { VP8BitReader* const token_br = - &dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)]; + &dec->parts_[dec->mb_y_ & dec->num_parts_minus_one_]; MBContext context; SaveContext(dec, token_br, &context); if (!VP8DecodeMB(dec, token_br)) { // We shouldn't fail when MAX_MB data was available - if (dec->num_parts_ == 1 && MemDataSize(&idec->mem_) > MAX_MB_SIZE) { + if (dec->num_parts_minus_one_ == 0 && + MemDataSize(&idec->mem_) > MAX_MB_SIZE) { return IDecError(idec, VP8_STATUS_BITSTREAM_ERROR); } RestoreContext(&context, dec, token_br); return VP8_STATUS_SUSPENDED; } // Release buffer only if there is only one partition - if (dec->num_parts_ == 1) { + if (dec->num_parts_minus_one_ == 0) { idec->mem_.start_ = token_br->buf_ - idec->mem_.buf_; assert(idec->mem_.start_ <= idec->mem_.end_); } @@ -575,9 +583,10 @@ static VP8StatusCode IDecode(WebPIDecoder* idec) { } //------------------------------------------------------------------------------ -// Public functions +// Internal constructor -WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) { +static WebPIDecoder* NewDecoder(WebPDecBuffer* const output_buffer, + const WebPBitstreamFeatures* const features) { WebPIDecoder* idec = (WebPIDecoder*)WebPSafeCalloc(1ULL, sizeof(*idec)); if (idec == NULL) { return NULL; @@ -593,25 +602,46 @@ WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) { VP8InitIo(&idec->io_); WebPResetDecParams(&idec->params_); - idec->params_.output = (output_buffer != NULL) ? output_buffer - : &idec->output_; + if (output_buffer == NULL || WebPAvoidSlowMemory(output_buffer, features)) { + idec->params_.output = &idec->output_; + idec->final_output_ = output_buffer; + if (output_buffer != NULL) { + idec->params_.output->colorspace = output_buffer->colorspace; + } + } else { + idec->params_.output = output_buffer; + idec->final_output_ = NULL; + } WebPInitCustomIo(&idec->params_, &idec->io_); // Plug the I/O functions. return idec; } +//------------------------------------------------------------------------------ +// Public functions + +WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) { + return NewDecoder(output_buffer, NULL); +} + WebPIDecoder* WebPIDecode(const uint8_t* data, size_t data_size, WebPDecoderConfig* config) { WebPIDecoder* idec; + WebPBitstreamFeatures tmp_features; + WebPBitstreamFeatures* const features = + (config == NULL) ? &tmp_features : &config->input; + memset(&tmp_features, 0, sizeof(tmp_features)); // Parse the bitstream's features, if requested: - if (data != NULL && data_size > 0 && config != NULL) { - if (WebPGetFeatures(data, data_size, &config->input) != VP8_STATUS_OK) { + if (data != NULL && data_size > 0) { + if (WebPGetFeatures(data, data_size, features) != VP8_STATUS_OK) { return NULL; } } + // Create an instance of the incremental decoder - idec = WebPINewDecoder(config ? &config->output : NULL); + idec = (config != NULL) ? NewDecoder(&config->output, features) + : NewDecoder(NULL, features); if (idec == NULL) { return NULL; } @@ -645,11 +675,11 @@ void WebPIDelete(WebPIDecoder* idec) { WebPIDecoder* WebPINewRGB(WEBP_CSP_MODE mode, uint8_t* output_buffer, size_t output_buffer_size, int output_stride) { - const int is_external_memory = (output_buffer != NULL); + const int is_external_memory = (output_buffer != NULL) ? 1 : 0; WebPIDecoder* idec; if (mode >= MODE_YUV) return NULL; - if (!is_external_memory) { // Overwrite parameters to sane values. + if (is_external_memory == 0) { // Overwrite parameters to sane values. output_buffer_size = 0; output_stride = 0; } else { // A buffer was passed. Validate the other params. @@ -671,11 +701,11 @@ WebPIDecoder* WebPINewYUVA(uint8_t* luma, size_t luma_size, int luma_stride, uint8_t* u, size_t u_size, int u_stride, uint8_t* v, size_t v_size, int v_stride, uint8_t* a, size_t a_size, int a_stride) { - const int is_external_memory = (luma != NULL); + const int is_external_memory = (luma != NULL) ? 1 : 0; WebPIDecoder* idec; WEBP_CSP_MODE colorspace; - if (!is_external_memory) { // Overwrite parameters to sane values. + if (is_external_memory == 0) { // Overwrite parameters to sane values. luma_size = u_size = v_size = a_size = 0; luma_stride = u_stride = v_stride = a_stride = 0; u = v = a = NULL; @@ -783,6 +813,9 @@ static const WebPDecBuffer* GetOutputBuffer(const WebPIDecoder* const idec) { if (idec->state_ <= STATE_VP8_PARTS0) { return NULL; } + if (idec->final_output_ != NULL) { + return NULL; // not yet slow-copied + } return idec->params_.output; } @@ -792,7 +825,7 @@ const WebPDecBuffer* WebPIDecodedArea(const WebPIDecoder* idec, const WebPDecBuffer* const src = GetOutputBuffer(idec); if (left != NULL) *left = 0; if (top != NULL) *top = 0; - if (src) { + if (src != NULL) { if (width != NULL) *width = src->width; if (height != NULL) *height = idec->params_.last_y; } else { diff --git a/src/3rdparty/libwebp/src/dec/io.c b/src/3rdparty/libwebp/src/dec/io.c index 13e469a..8d5c43f 100644 --- a/src/3rdparty/libwebp/src/dec/io.c +++ b/src/3rdparty/libwebp/src/dec/io.c @@ -119,6 +119,14 @@ static int EmitFancyRGB(const VP8Io* const io, WebPDecParams* const p) { //------------------------------------------------------------------------------ +static void FillAlphaPlane(uint8_t* dst, int w, int h, int stride) { + int j; + for (j = 0; j < h; ++j) { + memset(dst, 0xff, w * sizeof(*dst)); + dst += stride; + } +} + static int EmitAlphaYUV(const VP8Io* const io, WebPDecParams* const p, int expected_num_lines_out) { const uint8_t* alpha = io->a; @@ -137,10 +145,7 @@ static int EmitAlphaYUV(const VP8Io* const io, WebPDecParams* const p, } } else if (buf->a != NULL) { // the user requested alpha, but there is none, set it to opaque. - for (j = 0; j < mb_h; ++j) { - memset(dst, 0xff, mb_w * sizeof(*dst)); - dst += buf->a_stride; - } + FillAlphaPlane(dst, mb_w, mb_h, buf->a_stride); } return 0; } @@ -269,8 +274,8 @@ static int EmitRescaledYUV(const VP8Io* const io, WebPDecParams* const p) { static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p, int expected_num_lines_out) { + const WebPYUVABuffer* const buf = &p->output->u.YUVA; if (io->a != NULL) { - const WebPYUVABuffer* const buf = &p->output->u.YUVA; uint8_t* dst_y = buf->y + p->last_y * buf->y_stride; const uint8_t* src_a = buf->a + p->last_y * buf->a_stride; const int num_lines_out = Rescale(io->a, io->width, io->mb_h, &p->scaler_a); @@ -280,6 +285,11 @@ static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p, WebPMultRows(dst_y, buf->y_stride, src_a, buf->a_stride, p->scaler_a.dst_width, num_lines_out, 1); } + } else if (buf->a != NULL) { + // the user requested alpha, but there is none, set it to opaque. + assert(p->last_y + expected_num_lines_out <= io->scaled_height); + FillAlphaPlane(buf->a + p->last_y * buf->a_stride, + io->scaled_width, expected_num_lines_out, buf->a_stride); } return 0; } diff --git a/src/3rdparty/libwebp/src/dec/vp8.c b/src/3rdparty/libwebp/src/dec/vp8.c index d89eb1c..336680c 100644 --- a/src/3rdparty/libwebp/src/dec/vp8.c +++ b/src/3rdparty/libwebp/src/dec/vp8.c @@ -50,7 +50,7 @@ VP8Decoder* VP8New(void) { SetOk(dec); WebPGetWorkerInterface()->Init(&dec->worker_); dec->ready_ = 0; - dec->num_parts_ = 1; + dec->num_parts_minus_one_ = 0; } return dec; } @@ -194,8 +194,8 @@ static VP8StatusCode ParsePartitions(VP8Decoder* const dec, size_t last_part; size_t p; - dec->num_parts_ = 1 << VP8GetValue(br, 2); - last_part = dec->num_parts_ - 1; + dec->num_parts_minus_one_ = (1 << VP8GetValue(br, 2)) - 1; + last_part = dec->num_parts_minus_one_; if (size < 3 * last_part) { // we can't even read the sizes with sz[]! That's a failure. return VP8_STATUS_NOT_ENOUGH_DATA; @@ -303,15 +303,22 @@ int VP8GetHeaders(VP8Decoder* const dec, VP8Io* const io) { dec->mb_w_ = (pic_hdr->width_ + 15) >> 4; dec->mb_h_ = (pic_hdr->height_ + 15) >> 4; + // Setup default output area (can be later modified during io->setup()) io->width = pic_hdr->width_; io->height = pic_hdr->height_; - io->use_scaling = 0; + // IMPORTANT! use some sane dimensions in crop_* and scaled_* fields. + // So they can be used interchangeably without always testing for + // 'use_cropping'. io->use_cropping = 0; io->crop_top = 0; io->crop_left = 0; io->crop_right = io->width; io->crop_bottom = io->height; + io->use_scaling = 0; + io->scaled_width = io->width; + io->scaled_height = io->height; + io->mb_w = io->width; // sanity check io->mb_h = io->height; // ditto @@ -579,7 +586,7 @@ static int ParseFrame(VP8Decoder* const dec, VP8Io* io) { for (dec->mb_y_ = 0; dec->mb_y_ < dec->br_mb_y_; ++dec->mb_y_) { // Parse bitstream for this row. VP8BitReader* const token_br = - &dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)]; + &dec->parts_[dec->mb_y_ & dec->num_parts_minus_one_]; if (!VP8ParseIntraModeRow(&dec->br_, dec)) { return VP8SetError(dec, VP8_STATUS_NOT_ENOUGH_DATA, "Premature end-of-partition0 encountered."); @@ -649,8 +656,7 @@ void VP8Clear(VP8Decoder* const dec) { return; } WebPGetWorkerInterface()->End(&dec->worker_); - ALPHDelete(dec->alph_dec_); - dec->alph_dec_ = NULL; + WebPDeallocateAlphaMemory(dec); WebPSafeFree(dec->mem_); dec->mem_ = NULL; dec->mem_size_ = 0; @@ -659,4 +665,3 @@ void VP8Clear(VP8Decoder* const dec) { } //------------------------------------------------------------------------------ - diff --git a/src/3rdparty/libwebp/src/dec/vp8i.h b/src/3rdparty/libwebp/src/dec/vp8i.h index 0104f25..00da02b 100644 --- a/src/3rdparty/libwebp/src/dec/vp8i.h +++ b/src/3rdparty/libwebp/src/dec/vp8i.h @@ -32,7 +32,7 @@ extern "C" { // version numbers #define DEC_MAJ_VERSION 0 #define DEC_MIN_VERSION 5 -#define DEC_REV_VERSION 0 +#define DEC_REV_VERSION 1 // YUV-cache parameters. Cache is 32-bytes wide (= one cacheline). // Constraints are: We need to store one 16x16 block of luma samples (y), @@ -209,8 +209,8 @@ struct VP8Decoder { int tl_mb_x_, tl_mb_y_; // top-left MB that must be in-loop filtered int br_mb_x_, br_mb_y_; // last bottom-right MB that must be decoded - // number of partitions. - int num_parts_; + // number of partitions minus one. + uint32_t num_parts_minus_one_; // per-partition boolean decoders. VP8BitReader parts_[MAX_NUM_PARTITIONS]; @@ -258,9 +258,11 @@ struct VP8Decoder { struct ALPHDecoder* alph_dec_; // alpha-plane decoder object const uint8_t* alpha_data_; // compressed alpha data (if present) size_t alpha_data_size_; - int is_alpha_decoded_; // true if alpha_data_ is decoded in alpha_plane_ - uint8_t* alpha_plane_; // output. Persistent, contains the whole data. - int alpha_dithering_; // derived from decoding options (0=off, 100=full). + int is_alpha_decoded_; // true if alpha_data_ is decoded in alpha_plane_ + uint8_t* alpha_plane_mem_; // memory allocated for alpha_plane_ + uint8_t* alpha_plane_; // output. Persistent, contains the whole data. + const uint8_t* alpha_prev_line_; // last decoded alpha row (or NULL) + int alpha_dithering_; // derived from decoding options (0=off, 100=full) }; //------------------------------------------------------------------------------ @@ -306,6 +308,7 @@ int VP8DecodeMB(VP8Decoder* const dec, VP8BitReader* const token_br); // in alpha.c const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec, + const VP8Io* const io, int row, int num_rows); //------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/dec/vp8l.c b/src/3rdparty/libwebp/src/dec/vp8l.c index a76ad6a..cb2e317 100644 --- a/src/3rdparty/libwebp/src/dec/vp8l.c +++ b/src/3rdparty/libwebp/src/dec/vp8l.c @@ -714,34 +714,22 @@ static void ApplyInverseTransforms(VP8LDecoder* const dec, int num_rows, } } -// Special method for paletted alpha data. -static void ApplyInverseTransformsAlpha(VP8LDecoder* const dec, int num_rows, - const uint8_t* const rows) { - const int start_row = dec->last_row_; - const int end_row = start_row + num_rows; - const uint8_t* rows_in = rows; - uint8_t* rows_out = (uint8_t*)dec->io_->opaque + dec->io_->width * start_row; - VP8LTransform* const transform = &dec->transforms_[0]; - assert(dec->next_transform_ == 1); - assert(transform->type_ == COLOR_INDEXING_TRANSFORM); - VP8LColorIndexInverseTransformAlpha(transform, start_row, end_row, rows_in, - rows_out); -} - // Processes (transforms, scales & color-converts) the rows decoded after the // last call. static void ProcessRows(VP8LDecoder* const dec, int row) { const uint32_t* const rows = dec->pixels_ + dec->width_ * dec->last_row_; const int num_rows = row - dec->last_row_; - if (num_rows <= 0) return; // Nothing to be done. - ApplyInverseTransforms(dec, num_rows, rows); - - // Emit output. - { + assert(row <= dec->io_->crop_bottom); + // We can't process more than NUM_ARGB_CACHE_ROWS at a time (that's the size + // of argb_cache_), but we currently don't need more than that. + assert(num_rows <= NUM_ARGB_CACHE_ROWS); + if (num_rows > 0) { // Emit output. VP8Io* const io = dec->io_; uint8_t* rows_data = (uint8_t*)dec->argb_cache_; const int in_stride = io->width * sizeof(uint32_t); // in unit of RGBA + + ApplyInverseTransforms(dec, num_rows, rows); if (!SetCropWindow(io, dec->last_row_, row, &rows_data, in_stride)) { // Nothing to output (this time). } else { @@ -786,14 +774,46 @@ static int Is8bOptimizable(const VP8LMetadata* const hdr) { return 1; } -static void ExtractPalettedAlphaRows(VP8LDecoder* const dec, int row) { - const int num_rows = row - dec->last_row_; - const uint8_t* const in = - (uint8_t*)dec->pixels_ + dec->width_ * dec->last_row_; - if (num_rows > 0) { - ApplyInverseTransformsAlpha(dec, num_rows, in); +static void AlphaApplyFilter(ALPHDecoder* const alph_dec, + int first_row, int last_row, + uint8_t* out, int stride) { + if (alph_dec->filter_ != WEBP_FILTER_NONE) { + int y; + const uint8_t* prev_line = alph_dec->prev_line_; + assert(WebPUnfilters[alph_dec->filter_] != NULL); + for (y = first_row; y < last_row; ++y) { + WebPUnfilters[alph_dec->filter_](prev_line, out, out, stride); + prev_line = out; + out += stride; + } + alph_dec->prev_line_ = prev_line; } - dec->last_row_ = dec->last_out_row_ = row; +} + +static void ExtractPalettedAlphaRows(VP8LDecoder* const dec, int last_row) { + // For vertical and gradient filtering, we need to decode the part above the + // crop_top row, in order to have the correct spatial predictors. + ALPHDecoder* const alph_dec = (ALPHDecoder*)dec->io_->opaque; + const int top_row = + (alph_dec->filter_ == WEBP_FILTER_NONE || + alph_dec->filter_ == WEBP_FILTER_HORIZONTAL) ? dec->io_->crop_top + : dec->last_row_; + const int first_row = (dec->last_row_ < top_row) ? top_row : dec->last_row_; + assert(last_row <= dec->io_->crop_bottom); + if (last_row > first_row) { + // Special method for paletted alpha data. We only process the cropped area. + const int width = dec->io_->width; + uint8_t* out = alph_dec->output_ + width * first_row; + const uint8_t* const in = + (uint8_t*)dec->pixels_ + dec->width_ * first_row; + VP8LTransform* const transform = &dec->transforms_[0]; + assert(dec->next_transform_ == 1); + assert(transform->type_ == COLOR_INDEXING_TRANSFORM); + VP8LColorIndexInverseTransformAlpha(transform, first_row, last_row, + in, out); + AlphaApplyFilter(alph_dec, first_row, last_row, out, width); + } + dec->last_row_ = dec->last_out_row_ = last_row; } //------------------------------------------------------------------------------ @@ -922,14 +942,14 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data, int col = dec->last_pixel_ % width; VP8LBitReader* const br = &dec->br_; VP8LMetadata* const hdr = &dec->hdr_; - const HTreeGroup* htree_group = GetHtreeGroupForPos(hdr, col, row); int pos = dec->last_pixel_; // current position const int end = width * height; // End of data const int last = width * last_row; // Last pixel to decode const int len_code_limit = NUM_LITERAL_CODES + NUM_LENGTH_CODES; const int mask = hdr->huffman_mask_; - assert(htree_group != NULL); - assert(pos < end); + const HTreeGroup* htree_group = + (pos < last) ? GetHtreeGroupForPos(hdr, col, row) : NULL; + assert(pos <= end); assert(last_row <= height); assert(Is8bOptimizable(hdr)); @@ -939,6 +959,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data, if ((col & mask) == 0) { htree_group = GetHtreeGroupForPos(hdr, col, row); } + assert(htree_group != NULL); VP8LFillBitWindow(br); code = ReadSymbol(htree_group->htrees[GREEN], br); if (code < NUM_LITERAL_CODES) { // Literal @@ -948,7 +969,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data, if (col >= width) { col = 0; ++row; - if (row % NUM_ARGB_CACHE_ROWS == 0) { + if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) { ExtractPalettedAlphaRows(dec, row); } } @@ -971,7 +992,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data, while (col >= width) { col -= width; ++row; - if (row % NUM_ARGB_CACHE_ROWS == 0) { + if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) { ExtractPalettedAlphaRows(dec, row); } } @@ -985,7 +1006,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data, assert(br->eos_ == VP8LIsEndOfStream(br)); } // Process the remaining rows corresponding to last row-block. - ExtractPalettedAlphaRows(dec, row); + ExtractPalettedAlphaRows(dec, row > last_row ? last_row : row); End: if (!ok || (br->eos_ && pos < end)) { @@ -1025,7 +1046,6 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data, int col = dec->last_pixel_ % width; VP8LBitReader* const br = &dec->br_; VP8LMetadata* const hdr = &dec->hdr_; - HTreeGroup* htree_group = GetHtreeGroupForPos(hdr, col, row); uint32_t* src = data + dec->last_pixel_; uint32_t* last_cached = src; uint32_t* const src_end = data + width * height; // End of data @@ -1036,8 +1056,9 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data, VP8LColorCache* const color_cache = (hdr->color_cache_size_ > 0) ? &hdr->color_cache_ : NULL; const int mask = hdr->huffman_mask_; - assert(htree_group != NULL); - assert(src < src_end); + const HTreeGroup* htree_group = + (src < src_last) ? GetHtreeGroupForPos(hdr, col, row) : NULL; + assert(dec->last_row_ < last_row); assert(src_last <= src_end); while (src < src_last) { @@ -1049,7 +1070,10 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data, // Only update when changing tile. Note we could use this test: // if "((((prev_col ^ col) | prev_row ^ row)) > mask)" -> tile changed // but that's actually slower and needs storing the previous col/row. - if ((col & mask) == 0) htree_group = GetHtreeGroupForPos(hdr, col, row); + if ((col & mask) == 0) { + htree_group = GetHtreeGroupForPos(hdr, col, row); + } + assert(htree_group != NULL); if (htree_group->is_trivial_code) { *src = htree_group->literal_arb; goto AdvanceByOne; @@ -1080,8 +1104,10 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data, if (col >= width) { col = 0; ++row; - if ((row % NUM_ARGB_CACHE_ROWS == 0) && (process_func != NULL)) { - process_func(dec, row); + if (process_func != NULL) { + if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) { + process_func(dec, row); + } } if (color_cache != NULL) { while (last_cached < src) { @@ -1108,8 +1134,10 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data, while (col >= width) { col -= width; ++row; - if ((row % NUM_ARGB_CACHE_ROWS == 0) && (process_func != NULL)) { - process_func(dec, row); + if (process_func != NULL) { + if (row <= last_row && (row % NUM_ARGB_CACHE_ROWS == 0)) { + process_func(dec, row); + } } } // Because of the check done above (before 'src' was incremented by @@ -1140,7 +1168,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data, } else if (!br->eos_) { // Process the remaining rows corresponding to last row-block. if (process_func != NULL) { - process_func(dec, row); + process_func(dec, row > last_row ? last_row : row); } dec->status_ = VP8_STATUS_OK; dec->last_pixel_ = (int)(src - data); // end-of-scan marker @@ -1438,46 +1466,51 @@ static int AllocateInternalBuffers8b(VP8LDecoder* const dec) { //------------------------------------------------------------------------------ // Special row-processing that only stores the alpha data. -static void ExtractAlphaRows(VP8LDecoder* const dec, int row) { - const int num_rows = row - dec->last_row_; - const uint32_t* const in = dec->pixels_ + dec->width_ * dec->last_row_; - - if (num_rows <= 0) return; // Nothing to be done. - ApplyInverseTransforms(dec, num_rows, in); - - // Extract alpha (which is stored in the green plane). - { +static void ExtractAlphaRows(VP8LDecoder* const dec, int last_row) { + int cur_row = dec->last_row_; + int num_rows = last_row - cur_row; + const uint32_t* in = dec->pixels_ + dec->width_ * cur_row; + + assert(last_row <= dec->io_->crop_bottom); + while (num_rows > 0) { + const int num_rows_to_process = + (num_rows > NUM_ARGB_CACHE_ROWS) ? NUM_ARGB_CACHE_ROWS : num_rows; + // Extract alpha (which is stored in the green plane). + ALPHDecoder* const alph_dec = (ALPHDecoder*)dec->io_->opaque; + uint8_t* const output = alph_dec->output_; const int width = dec->io_->width; // the final width (!= dec->width_) - const int cache_pixs = width * num_rows; - uint8_t* const dst = (uint8_t*)dec->io_->opaque + width * dec->last_row_; + const int cache_pixs = width * num_rows_to_process; + uint8_t* const dst = output + width * cur_row; const uint32_t* const src = dec->argb_cache_; int i; + ApplyInverseTransforms(dec, num_rows_to_process, in); for (i = 0; i < cache_pixs; ++i) dst[i] = (src[i] >> 8) & 0xff; - } - dec->last_row_ = dec->last_out_row_ = row; + AlphaApplyFilter(alph_dec, + cur_row, cur_row + num_rows_to_process, dst, width); + num_rows -= num_rows_to_process; + in += num_rows_to_process * dec->width_; + cur_row += num_rows_to_process; + } + assert(cur_row == last_row); + dec->last_row_ = dec->last_out_row_ = last_row; } int VP8LDecodeAlphaHeader(ALPHDecoder* const alph_dec, - const uint8_t* const data, size_t data_size, - uint8_t* const output) { + const uint8_t* const data, size_t data_size) { int ok = 0; - VP8LDecoder* dec; - VP8Io* io; + VP8LDecoder* dec = VP8LNew(); + + if (dec == NULL) return 0; + assert(alph_dec != NULL); - alph_dec->vp8l_dec_ = VP8LNew(); - if (alph_dec->vp8l_dec_ == NULL) return 0; - dec = alph_dec->vp8l_dec_; + alph_dec->vp8l_dec_ = dec; dec->width_ = alph_dec->width_; dec->height_ = alph_dec->height_; dec->io_ = &alph_dec->io_; - io = dec->io_; - - VP8InitIo(io); - WebPInitCustomIo(NULL, io); // Just a sanity Init. io won't be used. - io->opaque = output; - io->width = alph_dec->width_; - io->height = alph_dec->height_; + dec->io_->opaque = alph_dec; + dec->io_->width = alph_dec->width_; + dec->io_->height = alph_dec->height_; dec->status_ = VP8_STATUS_OK; VP8LInitBitReader(&dec->br_, data, data_size); @@ -1492,11 +1525,11 @@ int VP8LDecodeAlphaHeader(ALPHDecoder* const alph_dec, if (dec->next_transform_ == 1 && dec->transforms_[0].type_ == COLOR_INDEXING_TRANSFORM && Is8bOptimizable(&dec->hdr_)) { - alph_dec->use_8b_decode = 1; + alph_dec->use_8b_decode_ = 1; ok = AllocateInternalBuffers8b(dec); } else { // Allocate internal buffers (note that dec->width_ may have changed here). - alph_dec->use_8b_decode = 0; + alph_dec->use_8b_decode_ = 0; ok = AllocateInternalBuffers32b(dec, alph_dec->width_); } @@ -1515,12 +1548,12 @@ int VP8LDecodeAlphaImageStream(ALPHDecoder* const alph_dec, int last_row) { assert(dec != NULL); assert(last_row <= dec->height_); - if (dec->last_pixel_ == dec->width_ * dec->height_) { + if (dec->last_row_ >= last_row) { return 1; // done } // Decode (with special row processing). - return alph_dec->use_8b_decode ? + return alph_dec->use_8b_decode_ ? DecodeAlphaData(dec, (uint8_t*)dec->pixels_, dec->width_, dec->height_, last_row) : DecodeImageData(dec, dec->pixels_, dec->width_, dec->height_, @@ -1611,7 +1644,7 @@ int VP8LDecodeImage(VP8LDecoder* const dec) { // Decode. if (!DecodeImageData(dec, dec->pixels_, dec->width_, dec->height_, - dec->height_, ProcessRows)) { + io->crop_bottom, ProcessRows)) { goto Err; } diff --git a/src/3rdparty/libwebp/src/dec/vp8li.h b/src/3rdparty/libwebp/src/dec/vp8li.h index 8886e47..9313bdc 100644 --- a/src/3rdparty/libwebp/src/dec/vp8li.h +++ b/src/3rdparty/libwebp/src/dec/vp8li.h @@ -100,8 +100,7 @@ struct ALPHDecoder; // Defined in dec/alphai.h. // Decodes image header for alpha data stored using lossless compression. // Returns false in case of error. int VP8LDecodeAlphaHeader(struct ALPHDecoder* const alph_dec, - const uint8_t* const data, size_t data_size, - uint8_t* const output); + const uint8_t* const data, size_t data_size); // Decodes *at least* 'last_row' rows of alpha. If some of the initial rows are // already decoded in previous call(s), it will resume decoding from where it diff --git a/src/3rdparty/libwebp/src/dec/webp.c b/src/3rdparty/libwebp/src/dec/webp.c index 952178f..d0b912f 100644 --- a/src/3rdparty/libwebp/src/dec/webp.c +++ b/src/3rdparty/libwebp/src/dec/webp.c @@ -415,7 +415,8 @@ static VP8StatusCode ParseHeadersInternal(const uint8_t* data, } VP8StatusCode WebPParseHeaders(WebPHeaderStructure* const headers) { - VP8StatusCode status; + // status is marked volatile as a workaround for a clang-3.8 (aarch64) bug + volatile VP8StatusCode status; int has_animation = 0; assert(headers != NULL); // fill out headers, ignore width/height/has_alpha. @@ -512,10 +513,12 @@ static VP8StatusCode DecodeInto(const uint8_t* const data, size_t data_size, if (status != VP8_STATUS_OK) { WebPFreeDecBuffer(params->output); - } - - if (params->options != NULL && params->options->flip) { - status = WebPFlipBuffer(params->output); + } else { + if (params->options != NULL && params->options->flip) { + // This restores the original stride values if options->flip was used + // during the call to WebPAllocateDecBuffer above. + status = WebPFlipBuffer(params->output); + } } return status; } @@ -758,9 +761,24 @@ VP8StatusCode WebPDecode(const uint8_t* data, size_t data_size, } WebPResetDecParams(¶ms); - params.output = &config->output; params.options = &config->options; - status = DecodeInto(data, data_size, ¶ms); + params.output = &config->output; + if (WebPAvoidSlowMemory(params.output, &config->input)) { + // decoding to slow memory: use a temporary in-mem buffer to decode into. + WebPDecBuffer in_mem_buffer; + WebPInitDecBuffer(&in_mem_buffer); + in_mem_buffer.colorspace = config->output.colorspace; + in_mem_buffer.width = config->input.width; + in_mem_buffer.height = config->input.height; + params.output = &in_mem_buffer; + status = DecodeInto(data, data_size, ¶ms); + if (status == VP8_STATUS_OK) { // do the slow-copy + status = WebPCopyDecBufferPixels(&in_mem_buffer, &config->output); + } + WebPFreeDecBuffer(&in_mem_buffer); + } else { + status = DecodeInto(data, data_size, ¶ms); + } return status; } @@ -809,7 +827,7 @@ int WebPIoInitFromOptions(const WebPDecoderOptions* const options, } // Filter - io->bypass_filtering = options && options->bypass_filtering; + io->bypass_filtering = (options != NULL) && options->bypass_filtering; // Fancy upsampler #ifdef FANCY_UPSAMPLING @@ -826,4 +844,3 @@ int WebPIoInitFromOptions(const WebPDecoderOptions* const options, } //------------------------------------------------------------------------------ - diff --git a/src/3rdparty/libwebp/src/dec/webpi.h b/src/3rdparty/libwebp/src/dec/webpi.h index c75a2e4..991b194 100644 --- a/src/3rdparty/libwebp/src/dec/webpi.h +++ b/src/3rdparty/libwebp/src/dec/webpi.h @@ -45,11 +45,20 @@ struct WebPDecParams { OutputFunc emit; // output RGB or YUV samples OutputAlphaFunc emit_alpha; // output alpha channel OutputRowFunc emit_alpha_row; // output one line of rescaled alpha values + + WebPDecBuffer* final_output; // In case the user supplied a slow-memory + // output, we decode image in temporary buffer + // (this::output) and copy it here. + WebPDecBuffer tmp_buffer; // this::output will point to this one in case + // of slow memory. }; // Should be called first, before any use of the WebPDecParams object. void WebPResetDecParams(WebPDecParams* const params); +// Delete all memory (after an error occurred, for instance) +void WebPFreeDecParams(WebPDecParams* const params); + //------------------------------------------------------------------------------ // Header parsing helpers @@ -107,13 +116,23 @@ VP8StatusCode WebPAllocateDecBuffer(int width, int height, VP8StatusCode WebPFlipBuffer(WebPDecBuffer* const buffer); // Copy 'src' into 'dst' buffer, making sure 'dst' is not marked as owner of the -// memory (still held by 'src'). +// memory (still held by 'src'). No pixels are copied. void WebPCopyDecBuffer(const WebPDecBuffer* const src, WebPDecBuffer* const dst); // Copy and transfer ownership from src to dst (beware of parameter order!) void WebPGrabDecBuffer(WebPDecBuffer* const src, WebPDecBuffer* const dst); +// Copy pixels from 'src' into a *preallocated* 'dst' buffer. Returns +// VP8_STATUS_INVALID_PARAM if the 'dst' is not set up correctly for the copy. +VP8StatusCode WebPCopyDecBufferPixels(const WebPDecBuffer* const src, + WebPDecBuffer* const dst); + +// Returns true if decoding will be slow with the current configuration +// and bitstream features. +int WebPAvoidSlowMemory(const WebPDecBuffer* const output, + const WebPBitstreamFeatures* const features); + //------------------------------------------------------------------------------ #ifdef __cplusplus diff --git a/src/3rdparty/libwebp/src/dsp/common_sse2.h b/src/3rdparty/libwebp/src/dsp/common_sse2.h new file mode 100644 index 0000000..7cea13f --- /dev/null +++ b/src/3rdparty/libwebp/src/dsp/common_sse2.h @@ -0,0 +1,109 @@ +// Copyright 2016 Google Inc. All Rights Reserved. +// +// Use of this source code is governed by a BSD-style license +// that can be found in the COPYING file in the root of the source +// tree. An additional intellectual property rights grant can be found +// in the file PATENTS. All contributing project authors may +// be found in the AUTHORS file in the root of the source tree. +// ----------------------------------------------------------------------------- +// +// SSE2 code common to several files. +// +// Author: Vincent Rabaud (vrabaud@google.com) + +#ifndef WEBP_DSP_COMMON_SSE2_H_ +#define WEBP_DSP_COMMON_SSE2_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#if defined(WEBP_USE_SSE2) + +#include <emmintrin.h> + +//------------------------------------------------------------------------------ +// Quite useful macro for debugging. Left here for convenience. + +#if 0 +#include <stdio.h> +static WEBP_INLINE void PrintReg(const __m128i r, const char* const name, + int size) { + int n; + union { + __m128i r; + uint8_t i8[16]; + uint16_t i16[8]; + uint32_t i32[4]; + uint64_t i64[2]; + } tmp; + tmp.r = r; + fprintf(stderr, "%s\t: ", name); + if (size == 8) { + for (n = 0; n < 16; ++n) fprintf(stderr, "%.2x ", tmp.i8[n]); + } else if (size == 16) { + for (n = 0; n < 8; ++n) fprintf(stderr, "%.4x ", tmp.i16[n]); + } else if (size == 32) { + for (n = 0; n < 4; ++n) fprintf(stderr, "%.8x ", tmp.i32[n]); + } else { + for (n = 0; n < 2; ++n) fprintf(stderr, "%.16lx ", tmp.i64[n]); + } + fprintf(stderr, "\n"); +} +#endif + +//------------------------------------------------------------------------------ +// Math functions. + +// Return the sum of all the 8b in the register. +static WEBP_INLINE int VP8HorizontalAdd8b(const __m128i* const a) { + const __m128i zero = _mm_setzero_si128(); + const __m128i sad8x2 = _mm_sad_epu8(*a, zero); + // sum the two sads: sad8x2[0:1] + sad8x2[8:9] + const __m128i sum = _mm_add_epi32(sad8x2, _mm_shuffle_epi32(sad8x2, 2)); + return _mm_cvtsi128_si32(sum); +} + +// Transpose two 4x4 16b matrices horizontally stored in registers. +static WEBP_INLINE void VP8Transpose_2_4x4_16b( + const __m128i* const in0, const __m128i* const in1, + const __m128i* const in2, const __m128i* const in3, __m128i* const out0, + __m128i* const out1, __m128i* const out2, __m128i* const out3) { + // Transpose the two 4x4. + // a00 a01 a02 a03 b00 b01 b02 b03 + // a10 a11 a12 a13 b10 b11 b12 b13 + // a20 a21 a22 a23 b20 b21 b22 b23 + // a30 a31 a32 a33 b30 b31 b32 b33 + const __m128i transpose0_0 = _mm_unpacklo_epi16(*in0, *in1); + const __m128i transpose0_1 = _mm_unpacklo_epi16(*in2, *in3); + const __m128i transpose0_2 = _mm_unpackhi_epi16(*in0, *in1); + const __m128i transpose0_3 = _mm_unpackhi_epi16(*in2, *in3); + // a00 a10 a01 a11 a02 a12 a03 a13 + // a20 a30 a21 a31 a22 a32 a23 a33 + // b00 b10 b01 b11 b02 b12 b03 b13 + // b20 b30 b21 b31 b22 b32 b23 b33 + const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); + const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); + const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); + const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); + // a00 a10 a20 a30 a01 a11 a21 a31 + // b00 b10 b20 b30 b01 b11 b21 b31 + // a02 a12 a22 a32 a03 a13 a23 a33 + // b02 b12 a22 b32 b03 b13 b23 b33 + *out0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); + *out1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); + *out2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); + *out3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); + // a00 a10 a20 a30 b00 b10 b20 b30 + // a01 a11 a21 a31 b01 b11 b21 b31 + // a02 a12 a22 a32 b02 b12 b22 b32 + // a03 a13 a23 a33 b03 b13 b23 b33 +} + +#endif // WEBP_USE_SSE2 + +#ifdef __cplusplus +} // extern "C" +#endif + +#endif // WEBP_DSP_COMMON_SSE2_H_ diff --git a/src/3rdparty/libwebp/src/dsp/cpu.c b/src/3rdparty/libwebp/src/dsp/cpu.c index 8844cb4..cbb08db 100644 --- a/src/3rdparty/libwebp/src/dsp/cpu.c +++ b/src/3rdparty/libwebp/src/dsp/cpu.c @@ -13,6 +13,11 @@ #include "./dsp.h" +#if defined(WEBP_HAVE_NEON_RTCD) +#include <stdio.h> +#include <string.h> +#endif + #if defined(WEBP_ANDROID_NEON) #include <cpu-features.h> #endif @@ -142,13 +147,33 @@ VP8CPUInfo VP8GetCPUInfo = AndroidCPUInfo; // define a dummy function to enable turning off NEON at runtime by setting // VP8DecGetCPUInfo = NULL static int armCPUInfo(CPUFeature feature) { - (void)feature; + if (feature != kNEON) return 0; +#if defined(__linux__) && defined(WEBP_HAVE_NEON_RTCD) + { + int has_neon = 0; + char line[200]; + FILE* const cpuinfo = fopen("/proc/cpuinfo", "r"); + if (cpuinfo == NULL) return 0; + while (fgets(line, sizeof(line), cpuinfo)) { + if (!strncmp(line, "Features", 8)) { + if (strstr(line, " neon ") != NULL) { + has_neon = 1; + break; + } + } + } + fclose(cpuinfo); + return has_neon; + } +#else return 1; +#endif } VP8CPUInfo VP8GetCPUInfo = armCPUInfo; -#elif defined(WEBP_USE_MIPS32) || defined(WEBP_USE_MIPS_DSP_R2) +#elif defined(WEBP_USE_MIPS32) || defined(WEBP_USE_MIPS_DSP_R2) || \ + defined(WEBP_USE_MSA) static int mipsCPUInfo(CPUFeature feature) { - if ((feature == kMIPS32) || (feature == kMIPSdspR2)) { + if ((feature == kMIPS32) || (feature == kMIPSdspR2) || (feature == kMSA)) { return 1; } else { return 0; diff --git a/src/3rdparty/libwebp/src/dsp/dec.c b/src/3rdparty/libwebp/src/dsp/dec.c index a787206..e92d693 100644 --- a/src/3rdparty/libwebp/src/dsp/dec.c +++ b/src/3rdparty/libwebp/src/dsp/dec.c @@ -13,6 +13,7 @@ #include "./dsp.h" #include "../dec/vp8i.h" +#include "../utils/utils.h" //------------------------------------------------------------------------------ @@ -654,6 +655,23 @@ static void HFilter8i(uint8_t* u, uint8_t* v, int stride, //------------------------------------------------------------------------------ +static void DitherCombine8x8(const uint8_t* dither, uint8_t* dst, + int dst_stride) { + int i, j; + for (j = 0; j < 8; ++j) { + for (i = 0; i < 8; ++i) { + const int delta0 = dither[i] - VP8_DITHER_AMP_CENTER; + const int delta1 = + (delta0 + VP8_DITHER_DESCALE_ROUNDER) >> VP8_DITHER_DESCALE; + dst[i] = clip_8b((int)dst[i] + delta1); + } + dst += dst_stride; + dither += 8; + } +} + +//------------------------------------------------------------------------------ + VP8DecIdct2 VP8Transform; VP8DecIdct VP8TransformAC3; VP8DecIdct VP8TransformUV; @@ -673,11 +691,15 @@ VP8SimpleFilterFunc VP8SimpleHFilter16; VP8SimpleFilterFunc VP8SimpleVFilter16i; VP8SimpleFilterFunc VP8SimpleHFilter16i; +void (*VP8DitherCombine8x8)(const uint8_t* dither, uint8_t* dst, + int dst_stride); + extern void VP8DspInitSSE2(void); extern void VP8DspInitSSE41(void); extern void VP8DspInitNEON(void); extern void VP8DspInitMIPS32(void); extern void VP8DspInitMIPSdspR2(void); +extern void VP8DspInitMSA(void); static volatile VP8CPUInfo dec_last_cpuinfo_used = (VP8CPUInfo)&dec_last_cpuinfo_used; @@ -734,6 +756,8 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8DspInit(void) { VP8PredChroma8[5] = DC8uvNoLeft; VP8PredChroma8[6] = DC8uvNoTopLeft; + VP8DitherCombine8x8 = DitherCombine8x8; + // If defined, use CPUInfo() to overwrite some pointers with faster versions. if (VP8GetCPUInfo != NULL) { #if defined(WEBP_USE_SSE2) @@ -761,6 +785,11 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8DspInit(void) { VP8DspInitMIPSdspR2(); } #endif +#if defined(WEBP_USE_MSA) + if (VP8GetCPUInfo(kMSA)) { + VP8DspInitMSA(); + } +#endif } dec_last_cpuinfo_used = VP8GetCPUInfo; } diff --git a/src/3rdparty/libwebp/src/dsp/dec_msa.c b/src/3rdparty/libwebp/src/dsp/dec_msa.c new file mode 100644 index 0000000..f76055c --- /dev/null +++ b/src/3rdparty/libwebp/src/dsp/dec_msa.c @@ -0,0 +1,172 @@ +// Copyright 2016 Google Inc. All Rights Reserved. +// +// Use of this source code is governed by a BSD-style license +// that can be found in the COPYING file in the root of the source +// tree. An additional intellectual property rights grant can be found +// in the file PATENTS. All contributing project authors may +// be found in the AUTHORS file in the root of the source tree. +// ----------------------------------------------------------------------------- +// +// MSA version of dsp functions +// +// Author(s): Prashant Patil (prashant.patil@imgtec.com) + + +#include "./dsp.h" + +#if defined(WEBP_USE_MSA) + +#include "./msa_macro.h" + +//------------------------------------------------------------------------------ +// Transforms + +#define IDCT_1D_W(in0, in1, in2, in3, out0, out1, out2, out3) { \ + v4i32 a1_m, b1_m, c1_m, d1_m; \ + v4i32 c_tmp1_m, c_tmp2_m, d_tmp1_m, d_tmp2_m; \ + const v4i32 cospi8sqrt2minus1 = __msa_fill_w(20091); \ + const v4i32 sinpi8sqrt2 = __msa_fill_w(35468); \ + \ + a1_m = in0 + in2; \ + b1_m = in0 - in2; \ + c_tmp1_m = (in1 * sinpi8sqrt2) >> 16; \ + c_tmp2_m = in3 + ((in3 * cospi8sqrt2minus1) >> 16); \ + c1_m = c_tmp1_m - c_tmp2_m; \ + d_tmp1_m = in1 + ((in1 * cospi8sqrt2minus1) >> 16); \ + d_tmp2_m = (in3 * sinpi8sqrt2) >> 16; \ + d1_m = d_tmp1_m + d_tmp2_m; \ + BUTTERFLY_4(a1_m, b1_m, c1_m, d1_m, out0, out1, out2, out3); \ +} +#define MULT1(a) ((((a) * 20091) >> 16) + (a)) +#define MULT2(a) (((a) * 35468) >> 16) + +static void TransformOne(const int16_t* in, uint8_t* dst) { + v8i16 input0, input1; + v4i32 in0, in1, in2, in3, hz0, hz1, hz2, hz3, vt0, vt1, vt2, vt3; + v4i32 res0, res1, res2, res3; + const v16i8 zero = { 0 }; + v16i8 dest0, dest1, dest2, dest3; + + LD_SH2(in, 8, input0, input1); + UNPCK_SH_SW(input0, in0, in1); + UNPCK_SH_SW(input1, in2, in3); + IDCT_1D_W(in0, in1, in2, in3, hz0, hz1, hz2, hz3); + TRANSPOSE4x4_SW_SW(hz0, hz1, hz2, hz3, hz0, hz1, hz2, hz3); + IDCT_1D_W(hz0, hz1, hz2, hz3, vt0, vt1, vt2, vt3); + SRARI_W4_SW(vt0, vt1, vt2, vt3, 3); + TRANSPOSE4x4_SW_SW(vt0, vt1, vt2, vt3, vt0, vt1, vt2, vt3); + LD_SB4(dst, BPS, dest0, dest1, dest2, dest3); + ILVR_B4_SW(zero, dest0, zero, dest1, zero, dest2, zero, dest3, + res0, res1, res2, res3); + ILVR_H4_SW(zero, res0, zero, res1, zero, res2, zero, res3, + res0, res1, res2, res3); + ADD4(res0, vt0, res1, vt1, res2, vt2, res3, vt3, res0, res1, res2, res3); + CLIP_SW4_0_255(res0, res1, res2, res3); + PCKEV_B2_SW(res0, res1, res2, res3, vt0, vt1); + res0 = (v4i32)__msa_pckev_b((v16i8)vt0, (v16i8)vt1); + ST4x4_UB(res0, res0, 3, 2, 1, 0, dst, BPS); +} + +static void TransformTwo(const int16_t* in, uint8_t* dst, int do_two) { + TransformOne(in, dst); + if (do_two) { + TransformOne(in + 16, dst + 4); + } +} + +static void TransformWHT(const int16_t* in, int16_t* out) { + v8i16 input0, input1; + const v8i16 mask0 = { 0, 1, 2, 3, 8, 9, 10, 11 }; + const v8i16 mask1 = { 4, 5, 6, 7, 12, 13, 14, 15 }; + const v8i16 mask2 = { 0, 4, 8, 12, 1, 5, 9, 13 }; + const v8i16 mask3 = { 3, 7, 11, 15, 2, 6, 10, 14 }; + v8i16 tmp0, tmp1, tmp2, tmp3; + v8i16 out0, out1; + + LD_SH2(in, 8, input0, input1); + input1 = SLDI_SH(input1, input1, 8); + tmp0 = input0 + input1; + tmp1 = input0 - input1; + VSHF_H2_SH(tmp0, tmp1, tmp0, tmp1, mask0, mask1, tmp2, tmp3); + out0 = tmp2 + tmp3; + out1 = tmp2 - tmp3; + VSHF_H2_SH(out0, out1, out0, out1, mask2, mask3, input0, input1); + tmp0 = input0 + input1; + tmp1 = input0 - input1; + VSHF_H2_SH(tmp0, tmp1, tmp0, tmp1, mask0, mask1, tmp2, tmp3); + tmp0 = tmp2 + tmp3; + tmp1 = tmp2 - tmp3; + ADDVI_H2_SH(tmp0, 3, tmp1, 3, out0, out1); + SRAI_H2_SH(out0, out1, 3); + out[0] = __msa_copy_s_h(out0, 0); + out[16] = __msa_copy_s_h(out0, 4); + out[32] = __msa_copy_s_h(out1, 0); + out[48] = __msa_copy_s_h(out1, 4); + out[64] = __msa_copy_s_h(out0, 1); + out[80] = __msa_copy_s_h(out0, 5); + out[96] = __msa_copy_s_h(out1, 1); + out[112] = __msa_copy_s_h(out1, 5); + out[128] = __msa_copy_s_h(out0, 2); + out[144] = __msa_copy_s_h(out0, 6); + out[160] = __msa_copy_s_h(out1, 2); + out[176] = __msa_copy_s_h(out1, 6); + out[192] = __msa_copy_s_h(out0, 3); + out[208] = __msa_copy_s_h(out0, 7); + out[224] = __msa_copy_s_h(out1, 3); + out[240] = __msa_copy_s_h(out1, 7); +} + +static void TransformDC(const int16_t* in, uint8_t* dst) { + const int DC = (in[0] + 4) >> 3; + const v8i16 tmp0 = __msa_fill_h(DC); + ADDBLK_ST4x4_UB(tmp0, tmp0, tmp0, tmp0, dst, BPS); +} + +static void TransformAC3(const int16_t* in, uint8_t* dst) { + const int a = in[0] + 4; + const int c4 = MULT2(in[4]); + const int d4 = MULT1(in[4]); + const int in2 = MULT2(in[1]); + const int in3 = MULT1(in[1]); + v4i32 tmp0 = { 0 }; + v4i32 out0 = __msa_fill_w(a + d4); + v4i32 out1 = __msa_fill_w(a + c4); + v4i32 out2 = __msa_fill_w(a - c4); + v4i32 out3 = __msa_fill_w(a - d4); + v4i32 res0, res1, res2, res3; + const v4i32 zero = { 0 }; + v16u8 dest0, dest1, dest2, dest3; + + INSERT_W4_SW(in3, in2, -in2, -in3, tmp0); + ADD4(out0, tmp0, out1, tmp0, out2, tmp0, out3, tmp0, + out0, out1, out2, out3); + SRAI_W4_SW(out0, out1, out2, out3, 3); + LD_UB4(dst, BPS, dest0, dest1, dest2, dest3); + ILVR_B4_SW(zero, dest0, zero, dest1, zero, dest2, zero, dest3, + res0, res1, res2, res3); + ILVR_H4_SW(zero, res0, zero, res1, zero, res2, zero, res3, + res0, res1, res2, res3); + ADD4(res0, out0, res1, out1, res2, out2, res3, out3, res0, res1, res2, res3); + CLIP_SW4_0_255(res0, res1, res2, res3); + PCKEV_B2_SW(res0, res1, res2, res3, out0, out1); + res0 = (v4i32)__msa_pckev_b((v16i8)out0, (v16i8)out1); + ST4x4_UB(res0, res0, 3, 2, 1, 0, dst, BPS); +} + +//------------------------------------------------------------------------------ +// Entry point + +extern void VP8DspInitMSA(void); + +WEBP_TSAN_IGNORE_FUNCTION void VP8DspInitMSA(void) { + VP8TransformWHT = TransformWHT; + VP8Transform = TransformTwo; + VP8TransformDC = TransformDC; + VP8TransformAC3 = TransformAC3; +} + +#else // !WEBP_USE_MSA + +WEBP_DSP_INIT_STUB(VP8DspInitMSA) + +#endif // WEBP_USE_MSA diff --git a/src/3rdparty/libwebp/src/dsp/dec_sse2.c b/src/3rdparty/libwebp/src/dsp/dec_sse2.c index 935bf02..f0a8ddc 100644 --- a/src/3rdparty/libwebp/src/dsp/dec_sse2.c +++ b/src/3rdparty/libwebp/src/dsp/dec_sse2.c @@ -21,7 +21,9 @@ // #define USE_TRANSFORM_AC3 #include <emmintrin.h> +#include "./common_sse2.h" #include "../dec/vp8i.h" +#include "../utils/utils.h" //------------------------------------------------------------------------------ // Transforms (Paragraph 14.4) @@ -102,34 +104,7 @@ static void Transform(const int16_t* in, uint8_t* dst, int do_two) { const __m128i tmp3 = _mm_sub_epi16(a, d); // Transpose the two 4x4. - // a00 a01 a02 a03 b00 b01 b02 b03 - // a10 a11 a12 a13 b10 b11 b12 b13 - // a20 a21 a22 a23 b20 b21 b22 b23 - // a30 a31 a32 a33 b30 b31 b32 b33 - const __m128i transpose0_0 = _mm_unpacklo_epi16(tmp0, tmp1); - const __m128i transpose0_1 = _mm_unpacklo_epi16(tmp2, tmp3); - const __m128i transpose0_2 = _mm_unpackhi_epi16(tmp0, tmp1); - const __m128i transpose0_3 = _mm_unpackhi_epi16(tmp2, tmp3); - // a00 a10 a01 a11 a02 a12 a03 a13 - // a20 a30 a21 a31 a22 a32 a23 a33 - // b00 b10 b01 b11 b02 b12 b03 b13 - // b20 b30 b21 b31 b22 b32 b23 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); - const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); - // a00 a10 a20 a30 a01 a11 a21 a31 - // b00 b10 b20 b30 b01 b11 b21 b31 - // a02 a12 a22 a32 a03 a13 a23 a33 - // b02 b12 a22 b32 b03 b13 b23 b33 - T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); - T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); - T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); - T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + VP8Transpose_2_4x4_16b(&tmp0, &tmp1, &tmp2, &tmp3, &T0, &T1, &T2, &T3); } // Horizontal pass and subsequent transpose. @@ -164,34 +139,8 @@ static void Transform(const int16_t* in, uint8_t* dst, int do_two) { const __m128i shifted3 = _mm_srai_epi16(tmp3, 3); // Transpose the two 4x4. - // a00 a01 a02 a03 b00 b01 b02 b03 - // a10 a11 a12 a13 b10 b11 b12 b13 - // a20 a21 a22 a23 b20 b21 b22 b23 - // a30 a31 a32 a33 b30 b31 b32 b33 - const __m128i transpose0_0 = _mm_unpacklo_epi16(shifted0, shifted1); - const __m128i transpose0_1 = _mm_unpacklo_epi16(shifted2, shifted3); - const __m128i transpose0_2 = _mm_unpackhi_epi16(shifted0, shifted1); - const __m128i transpose0_3 = _mm_unpackhi_epi16(shifted2, shifted3); - // a00 a10 a01 a11 a02 a12 a03 a13 - // a20 a30 a21 a31 a22 a32 a23 a33 - // b00 b10 b01 b11 b02 b12 b03 b13 - // b20 b30 b21 b31 b22 b32 b23 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); - const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); - // a00 a10 a20 a30 a01 a11 a21 a31 - // b00 b10 b20 b30 b01 b11 b21 b31 - // a02 a12 a22 a32 a03 a13 a23 a33 - // b02 b12 a22 b32 b03 b13 b23 b33 - T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); - T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); - T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); - T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + VP8Transpose_2_4x4_16b(&shifted0, &shifted1, &shifted2, &shifted3, &T0, &T1, + &T2, &T3); } // Add inverse transform to 'dst' and store. diff --git a/src/3rdparty/libwebp/src/dsp/dec_sse41.c b/src/3rdparty/libwebp/src/dsp/dec_sse41.c index 224c6f8..8d6aed1 100644 --- a/src/3rdparty/libwebp/src/dsp/dec_sse41.c +++ b/src/3rdparty/libwebp/src/dsp/dec_sse41.c @@ -17,6 +17,7 @@ #include <smmintrin.h> #include "../dec/vp8i.h" +#include "../utils/utils.h" static void HE16(uint8_t* dst) { // horizontal int j; diff --git a/src/3rdparty/libwebp/src/dsp/dsp.h b/src/3rdparty/libwebp/src/dsp/dsp.h index 95f1ce0..1faac27 100644 --- a/src/3rdparty/libwebp/src/dsp/dsp.h +++ b/src/3rdparty/libwebp/src/dsp/dsp.h @@ -14,8 +14,11 @@ #ifndef WEBP_DSP_DSP_H_ #define WEBP_DSP_DSP_H_ +#ifdef HAVE_CONFIG_H +#include "../webp/config.h" +#endif + #include "../webp/types.h" -#include "../utils/utils.h" #ifdef __cplusplus extern "C" { @@ -72,7 +75,8 @@ extern "C" { // The intrinsics currently cause compiler errors with arm-nacl-gcc and the // inline assembly would need to be modified for use with Native Client. #if (defined(__ARM_NEON__) || defined(WEBP_ANDROID_NEON) || \ - defined(__aarch64__)) && !defined(__native_client__) + defined(__aarch64__) || defined(WEBP_HAVE_NEON)) && \ + !defined(__native_client__) #define WEBP_USE_NEON #endif @@ -92,6 +96,10 @@ extern "C" { #endif #endif +#if defined(__mips_msa) && defined(__mips_isa_rev) && (__mips_isa_rev >= 5) +#define WEBP_USE_MSA +#endif + // This macro prevents thread_sanitizer from reporting known concurrent writes. #define WEBP_TSAN_IGNORE_FUNCTION #if defined(__has_feature) @@ -101,6 +109,27 @@ extern "C" { #endif #endif +#define WEBP_UBSAN_IGNORE_UNDEF +#define WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW +#if !defined(WEBP_FORCE_ALIGNED) && defined(__clang__) && \ + defined(__has_attribute) +#if __has_attribute(no_sanitize) +// This macro prevents the undefined behavior sanitizer from reporting +// failures. This is only meant to silence unaligned loads on platforms that +// are known to support them. +#undef WEBP_UBSAN_IGNORE_UNDEF +#define WEBP_UBSAN_IGNORE_UNDEF \ + __attribute__((no_sanitize("undefined"))) + +// This macro prevents the undefined behavior sanitizer from reporting +// failures related to unsigned integer overflows. This is only meant to +// silence cases where this well defined behavior is expected. +#undef WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW +#define WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW \ + __attribute__((no_sanitize("unsigned-integer-overflow"))) +#endif +#endif + typedef enum { kSSE2, kSSE3, @@ -109,7 +138,8 @@ typedef enum { kAVX2, kNEON, kMIPS32, - kMIPSdspR2 + kMIPSdspR2, + kMSA } CPUFeature; // returns true if the CPU supports the feature. typedef int (*VP8CPUInfo)(CPUFeature feature); @@ -151,6 +181,8 @@ typedef int (*VP8Metric)(const uint8_t* pix, const uint8_t* ref); extern VP8Metric VP8SSE16x16, VP8SSE16x8, VP8SSE8x8, VP8SSE4x4; typedef int (*VP8WMetric)(const uint8_t* pix, const uint8_t* ref, const uint16_t* const weights); +// The weights for VP8TDisto4x4 and VP8TDisto16x16 contain a row-major +// 4 by 4 symmetric matrix. extern VP8WMetric VP8TDisto4x4, VP8TDisto16x16; typedef void (*VP8BlockCopy)(const uint8_t* src, uint8_t* dst); @@ -214,6 +246,35 @@ extern VP8GetResidualCostFunc VP8GetResidualCost; void VP8EncDspCostInit(void); //------------------------------------------------------------------------------ +// SSIM utils + +// struct for accumulating statistical moments +typedef struct { + double w; // sum(w_i) : sum of weights + double xm, ym; // sum(w_i * x_i), sum(w_i * y_i) + double xxm, xym, yym; // sum(w_i * x_i * x_i), etc. +} VP8DistoStats; + +#define VP8_SSIM_KERNEL 3 // total size of the kernel: 2 * VP8_SSIM_KERNEL + 1 +typedef void (*VP8SSIMAccumulateClippedFunc)(const uint8_t* src1, int stride1, + const uint8_t* src2, int stride2, + int xo, int yo, // center position + int W, int H, // plane dimension + VP8DistoStats* const stats); + +// This version is called with the guarantee that you can load 8 bytes and +// 8 rows at offset src1 and src2 +typedef void (*VP8SSIMAccumulateFunc)(const uint8_t* src1, int stride1, + const uint8_t* src2, int stride2, + VP8DistoStats* const stats); + +extern VP8SSIMAccumulateFunc VP8SSIMAccumulate; // unclipped / unchecked +extern VP8SSIMAccumulateClippedFunc VP8SSIMAccumulateClipped; // with clipping + +// must be called before using any of the above directly +void VP8SSIMDspInit(void); + +//------------------------------------------------------------------------------ // Decoding typedef void (*VP8DecIdct)(const int16_t* coeffs, uint8_t* dst); @@ -265,6 +326,15 @@ extern VP8LumaFilterFunc VP8HFilter16i; extern VP8ChromaFilterFunc VP8VFilter8i; // filtering u and v altogether extern VP8ChromaFilterFunc VP8HFilter8i; +// Dithering. Combines dithering values (centered around 128) with dst[], +// according to: dst[] = clip(dst[] + (((dither[]-128) + 8) >> 4) +#define VP8_DITHER_DESCALE 4 +#define VP8_DITHER_DESCALE_ROUNDER (1 << (VP8_DITHER_DESCALE - 1)) +#define VP8_DITHER_AMP_BITS 7 +#define VP8_DITHER_AMP_CENTER (1 << VP8_DITHER_AMP_BITS) +extern void (*VP8DitherCombine8x8)(const uint8_t* dither, uint8_t* dst, + int dst_stride); + // must be called before anything using the above void VP8DspInit(void); @@ -472,8 +542,10 @@ typedef enum { // Filter types. typedef void (*WebPFilterFunc)(const uint8_t* in, int width, int height, int stride, uint8_t* out); -typedef void (*WebPUnfilterFunc)(int width, int height, int stride, - int row, int num_rows, uint8_t* data); +// In-place un-filtering. +// Warning! 'prev_line' pointer can be equal to 'cur_line' or 'preds'. +typedef void (*WebPUnfilterFunc)(const uint8_t* prev_line, const uint8_t* preds, + uint8_t* cur_line, int width); // Filter the given data using the given predictor. // 'in' corresponds to a 2-dimensional pixel array of size (stride * height) diff --git a/src/3rdparty/libwebp/src/dsp/enc.c b/src/3rdparty/libwebp/src/dsp/enc.c index 8899d50..f639f55 100644 --- a/src/3rdparty/libwebp/src/dsp/enc.c +++ b/src/3rdparty/libwebp/src/dsp/enc.c @@ -69,7 +69,7 @@ static void CollectHistogram(const uint8_t* ref, const uint8_t* pred, // Convert coefficients to bin. for (k = 0; k < 16; ++k) { - const int v = abs(out[k]) >> 3; // TODO(skal): add rounding? + const int v = abs(out[k]) >> 3; const int clipped_value = clip_max(v, MAX_COEFF_THRESH); ++distribution[clipped_value]; } @@ -559,6 +559,7 @@ static int SSE4x4(const uint8_t* a, const uint8_t* b) { // Hadamard transform // Returns the weighted sum of the absolute value of transformed coefficients. +// w[] contains a row-major 4 by 4 symmetric matrix. static int TTransform(const uint8_t* in, const uint16_t* w) { int sum = 0; int tmp[16]; @@ -636,7 +637,7 @@ static int QuantizeBlock(int16_t in[16], int16_t out[16], int level = QUANTDIV(coeff, iQ, B); if (level > MAX_LEVEL) level = MAX_LEVEL; if (sign) level = -level; - in[j] = level * Q; + in[j] = level * (int)Q; out[n] = level; if (level) last = n; } else { @@ -670,7 +671,7 @@ static int QuantizeBlockWHT(int16_t in[16], int16_t out[16], int level = QUANTDIV(coeff, iQ, B); if (level > MAX_LEVEL) level = MAX_LEVEL; if (sign) level = -level; - in[j] = level * Q; + in[j] = level * (int)Q; out[n] = level; if (level) last = n; } else { @@ -702,6 +703,68 @@ static void Copy16x8(const uint8_t* src, uint8_t* dst) { } //------------------------------------------------------------------------------ + +static void SSIMAccumulateClipped(const uint8_t* src1, int stride1, + const uint8_t* src2, int stride2, + int xo, int yo, int W, int H, + VP8DistoStats* const stats) { + const int ymin = (yo - VP8_SSIM_KERNEL < 0) ? 0 : yo - VP8_SSIM_KERNEL; + const int ymax = (yo + VP8_SSIM_KERNEL > H - 1) ? H - 1 + : yo + VP8_SSIM_KERNEL; + const int xmin = (xo - VP8_SSIM_KERNEL < 0) ? 0 : xo - VP8_SSIM_KERNEL; + const int xmax = (xo + VP8_SSIM_KERNEL > W - 1) ? W - 1 + : xo + VP8_SSIM_KERNEL; + int x, y; + src1 += ymin * stride1; + src2 += ymin * stride2; + for (y = ymin; y <= ymax; ++y, src1 += stride1, src2 += stride2) { + for (x = xmin; x <= xmax; ++x) { + const int s1 = src1[x]; + const int s2 = src2[x]; + stats->w += 1; + stats->xm += s1; + stats->ym += s2; + stats->xxm += s1 * s1; + stats->xym += s1 * s2; + stats->yym += s2 * s2; + } + } +} + +static void SSIMAccumulate(const uint8_t* src1, int stride1, + const uint8_t* src2, int stride2, + VP8DistoStats* const stats) { + int x, y; + for (y = 0; y <= 2 * VP8_SSIM_KERNEL; ++y, src1 += stride1, src2 += stride2) { + for (x = 0; x <= 2 * VP8_SSIM_KERNEL; ++x) { + const int s1 = src1[x]; + const int s2 = src2[x]; + stats->w += 1; + stats->xm += s1; + stats->ym += s2; + stats->xxm += s1 * s1; + stats->xym += s1 * s2; + stats->yym += s2 * s2; + } + } +} + +VP8SSIMAccumulateFunc VP8SSIMAccumulate; +VP8SSIMAccumulateClippedFunc VP8SSIMAccumulateClipped; + +static volatile VP8CPUInfo ssim_last_cpuinfo_used = + (VP8CPUInfo)&ssim_last_cpuinfo_used; + +WEBP_TSAN_IGNORE_FUNCTION void VP8SSIMDspInit(void) { + if (ssim_last_cpuinfo_used == VP8GetCPUInfo) return; + + VP8SSIMAccumulate = SSIMAccumulate; + VP8SSIMAccumulateClipped = SSIMAccumulateClipped; + + ssim_last_cpuinfo_used = VP8GetCPUInfo; +} + +//------------------------------------------------------------------------------ // Initialization // Speed-critical function pointers. We have to initialize them to the default diff --git a/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c index 7c814fa..7ab96f6 100644 --- a/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c +++ b/src/3rdparty/libwebp/src/dsp/enc_mips_dsp_r2.c @@ -1393,8 +1393,6 @@ static void FTransformWHT(const int16_t* in, int16_t* out) { "absq_s.ph %[temp1], %[temp1] \n\t" \ "absq_s.ph %[temp2], %[temp2] \n\t" \ "absq_s.ph %[temp3], %[temp3] \n\t" \ - /* TODO(skal): add rounding ? shra_r.ph : shra.ph */ \ - /* for following 4 instructions */ \ "shra.ph %[temp0], %[temp0], 3 \n\t" \ "shra.ph %[temp1], %[temp1], 3 \n\t" \ "shra.ph %[temp2], %[temp2], 3 \n\t" \ diff --git a/src/3rdparty/libwebp/src/dsp/enc_neon.c b/src/3rdparty/libwebp/src/dsp/enc_neon.c index c2aef58..46f6bf9 100644 --- a/src/3rdparty/libwebp/src/dsp/enc_neon.c +++ b/src/3rdparty/libwebp/src/dsp/enc_neon.c @@ -560,21 +560,6 @@ static void FTransformWHT(const int16_t* src, int16_t* out) { // a 26ae, b 26ae // a 37bf, b 37bf // -static WEBP_INLINE uint8x8x4_t DistoTranspose4x4U8(uint8x8x4_t d4_in) { - const uint8x8x2_t d2_tmp0 = vtrn_u8(d4_in.val[0], d4_in.val[1]); - const uint8x8x2_t d2_tmp1 = vtrn_u8(d4_in.val[2], d4_in.val[3]); - const uint16x4x2_t d2_tmp2 = vtrn_u16(vreinterpret_u16_u8(d2_tmp0.val[0]), - vreinterpret_u16_u8(d2_tmp1.val[0])); - const uint16x4x2_t d2_tmp3 = vtrn_u16(vreinterpret_u16_u8(d2_tmp0.val[1]), - vreinterpret_u16_u8(d2_tmp1.val[1])); - - d4_in.val[0] = vreinterpret_u8_u16(d2_tmp2.val[0]); - d4_in.val[2] = vreinterpret_u8_u16(d2_tmp2.val[1]); - d4_in.val[1] = vreinterpret_u8_u16(d2_tmp3.val[0]); - d4_in.val[3] = vreinterpret_u8_u16(d2_tmp3.val[1]); - return d4_in; -} - static WEBP_INLINE int16x8x4_t DistoTranspose4x4S16(int16x8x4_t q4_in) { const int16x8x2_t q2_tmp0 = vtrnq_s16(q4_in.val[0], q4_in.val[1]); const int16x8x2_t q2_tmp1 = vtrnq_s16(q4_in.val[2], q4_in.val[3]); @@ -589,41 +574,40 @@ static WEBP_INLINE int16x8x4_t DistoTranspose4x4S16(int16x8x4_t q4_in) { return q4_in; } -static WEBP_INLINE int16x8x4_t DistoHorizontalPass(const uint8x8x4_t d4_in) { +static WEBP_INLINE int16x8x4_t DistoHorizontalPass(const int16x8x4_t q4_in) { // {a0, a1} = {in[0] + in[2], in[1] + in[3]} // {a3, a2} = {in[0] - in[2], in[1] - in[3]} - const int16x8_t q_a0 = vreinterpretq_s16_u16(vaddl_u8(d4_in.val[0], - d4_in.val[2])); - const int16x8_t q_a1 = vreinterpretq_s16_u16(vaddl_u8(d4_in.val[1], - d4_in.val[3])); - const int16x8_t q_a3 = vreinterpretq_s16_u16(vsubl_u8(d4_in.val[0], - d4_in.val[2])); - const int16x8_t q_a2 = vreinterpretq_s16_u16(vsubl_u8(d4_in.val[1], - d4_in.val[3])); + const int16x8_t q_a0 = vaddq_s16(q4_in.val[0], q4_in.val[2]); + const int16x8_t q_a1 = vaddq_s16(q4_in.val[1], q4_in.val[3]); + const int16x8_t q_a3 = vsubq_s16(q4_in.val[0], q4_in.val[2]); + const int16x8_t q_a2 = vsubq_s16(q4_in.val[1], q4_in.val[3]); int16x8x4_t q4_out; // tmp[0] = a0 + a1 // tmp[1] = a3 + a2 // tmp[2] = a3 - a2 // tmp[3] = a0 - a1 INIT_VECTOR4(q4_out, - vaddq_s16(q_a0, q_a1), vaddq_s16(q_a3, q_a2), - vsubq_s16(q_a3, q_a2), vsubq_s16(q_a0, q_a1)); + vabsq_s16(vaddq_s16(q_a0, q_a1)), + vabsq_s16(vaddq_s16(q_a3, q_a2)), + vabdq_s16(q_a3, q_a2), vabdq_s16(q_a0, q_a1)); return q4_out; } -static WEBP_INLINE int16x8x4_t DistoVerticalPass(int16x8x4_t q4_in) { - const int16x8_t q_a0 = vaddq_s16(q4_in.val[0], q4_in.val[2]); - const int16x8_t q_a1 = vaddq_s16(q4_in.val[1], q4_in.val[3]); - const int16x8_t q_a2 = vsubq_s16(q4_in.val[1], q4_in.val[3]); - const int16x8_t q_a3 = vsubq_s16(q4_in.val[0], q4_in.val[2]); +static WEBP_INLINE int16x8x4_t DistoVerticalPass(const uint8x8x4_t q4_in) { + const int16x8_t q_a0 = vreinterpretq_s16_u16(vaddl_u8(q4_in.val[0], + q4_in.val[2])); + const int16x8_t q_a1 = vreinterpretq_s16_u16(vaddl_u8(q4_in.val[1], + q4_in.val[3])); + const int16x8_t q_a2 = vreinterpretq_s16_u16(vsubl_u8(q4_in.val[1], + q4_in.val[3])); + const int16x8_t q_a3 = vreinterpretq_s16_u16(vsubl_u8(q4_in.val[0], + q4_in.val[2])); + int16x8x4_t q4_out; - q4_in.val[0] = vaddq_s16(q_a0, q_a1); - q4_in.val[1] = vaddq_s16(q_a3, q_a2); - q4_in.val[2] = vabdq_s16(q_a3, q_a2); - q4_in.val[3] = vabdq_s16(q_a0, q_a1); - q4_in.val[0] = vabsq_s16(q4_in.val[0]); - q4_in.val[1] = vabsq_s16(q4_in.val[1]); - return q4_in; + INIT_VECTOR4(q4_out, + vaddq_s16(q_a0, q_a1), vaddq_s16(q_a3, q_a2), + vsubq_s16(q_a3, q_a2), vsubq_s16(q_a0, q_a1)); + return q4_out; } static WEBP_INLINE int16x4x4_t DistoLoadW(const uint16_t* w) { @@ -667,6 +651,7 @@ static WEBP_INLINE int32x2_t DistoSum(const int16x8x4_t q4_in, // Hadamard transform // Returns the weighted sum of the absolute value of transformed coefficients. +// w[] contains a row-major 4 by 4 symmetric matrix. static int Disto4x4(const uint8_t* const a, const uint8_t* const b, const uint16_t* const w) { uint32x2_t d_in_ab_0123 = vdup_n_u32(0); @@ -691,18 +676,19 @@ static int Disto4x4(const uint8_t* const a, const uint8_t* const b, vreinterpret_u8_u32(d_in_ab_cdef)); { - // horizontal pass - const uint8x8x4_t d4_t = DistoTranspose4x4U8(d4_in); - const int16x8x4_t q4_h = DistoHorizontalPass(d4_t); + // Vertical pass first to avoid a transpose (vertical and horizontal passes + // are commutative because w/kWeightY is symmetric) and subsequent + // transpose. + const int16x8x4_t q4_v = DistoVerticalPass(d4_in); const int16x4x4_t d4_w = DistoLoadW(w); - // vertical pass - const int16x8x4_t q4_t = DistoTranspose4x4S16(q4_h); - const int16x8x4_t q4_v = DistoVerticalPass(q4_t); - int32x2_t d_sum = DistoSum(q4_v, d4_w); + // horizontal pass + const int16x8x4_t q4_t = DistoTranspose4x4S16(q4_v); + const int16x8x4_t q4_h = DistoHorizontalPass(q4_t); + int32x2_t d_sum = DistoSum(q4_h, d4_w); // abs(sum2 - sum1) >> 5 d_sum = vabs_s32(d_sum); - d_sum = vshr_n_s32(d_sum, 5); + d_sum = vshr_n_s32(d_sum, 5); return vget_lane_s32(d_sum, 0); } } diff --git a/src/3rdparty/libwebp/src/dsp/enc_sse2.c b/src/3rdparty/libwebp/src/dsp/enc_sse2.c index 2333d2b..4a2e3ce 100644 --- a/src/3rdparty/libwebp/src/dsp/enc_sse2.c +++ b/src/3rdparty/libwebp/src/dsp/enc_sse2.c @@ -17,39 +17,11 @@ #include <stdlib.h> // for abs() #include <emmintrin.h> +#include "./common_sse2.h" #include "../enc/cost.h" #include "../enc/vp8enci.h" //------------------------------------------------------------------------------ -// Quite useful macro for debugging. Left here for convenience. - -#if 0 -#include <stdio.h> -static void PrintReg(const __m128i r, const char* const name, int size) { - int n; - union { - __m128i r; - uint8_t i8[16]; - uint16_t i16[8]; - uint32_t i32[4]; - uint64_t i64[2]; - } tmp; - tmp.r = r; - fprintf(stderr, "%s\t: ", name); - if (size == 8) { - for (n = 0; n < 16; ++n) fprintf(stderr, "%.2x ", tmp.i8[n]); - } else if (size == 16) { - for (n = 0; n < 8; ++n) fprintf(stderr, "%.4x ", tmp.i16[n]); - } else if (size == 32) { - for (n = 0; n < 4; ++n) fprintf(stderr, "%.8x ", tmp.i32[n]); - } else { - for (n = 0; n < 2; ++n) fprintf(stderr, "%.16lx ", tmp.i64[n]); - } - fprintf(stderr, "\n"); -} -#endif - -//------------------------------------------------------------------------------ // Transforms (Paragraph 14.4) // Does one or two inverse transforms. @@ -131,34 +103,7 @@ static void ITransform(const uint8_t* ref, const int16_t* in, uint8_t* dst, const __m128i tmp3 = _mm_sub_epi16(a, d); // Transpose the two 4x4. - // a00 a01 a02 a03 b00 b01 b02 b03 - // a10 a11 a12 a13 b10 b11 b12 b13 - // a20 a21 a22 a23 b20 b21 b22 b23 - // a30 a31 a32 a33 b30 b31 b32 b33 - const __m128i transpose0_0 = _mm_unpacklo_epi16(tmp0, tmp1); - const __m128i transpose0_1 = _mm_unpacklo_epi16(tmp2, tmp3); - const __m128i transpose0_2 = _mm_unpackhi_epi16(tmp0, tmp1); - const __m128i transpose0_3 = _mm_unpackhi_epi16(tmp2, tmp3); - // a00 a10 a01 a11 a02 a12 a03 a13 - // a20 a30 a21 a31 a22 a32 a23 a33 - // b00 b10 b01 b11 b02 b12 b03 b13 - // b20 b30 b21 b31 b22 b32 b23 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); - const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); - // a00 a10 a20 a30 a01 a11 a21 a31 - // b00 b10 b20 b30 b01 b11 b21 b31 - // a02 a12 a22 a32 a03 a13 a23 a33 - // b02 b12 a22 b32 b03 b13 b23 b33 - T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); - T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); - T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); - T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + VP8Transpose_2_4x4_16b(&tmp0, &tmp1, &tmp2, &tmp3, &T0, &T1, &T2, &T3); } // Horizontal pass and subsequent transpose. @@ -193,34 +138,8 @@ static void ITransform(const uint8_t* ref, const int16_t* in, uint8_t* dst, const __m128i shifted3 = _mm_srai_epi16(tmp3, 3); // Transpose the two 4x4. - // a00 a01 a02 a03 b00 b01 b02 b03 - // a10 a11 a12 a13 b10 b11 b12 b13 - // a20 a21 a22 a23 b20 b21 b22 b23 - // a30 a31 a32 a33 b30 b31 b32 b33 - const __m128i transpose0_0 = _mm_unpacklo_epi16(shifted0, shifted1); - const __m128i transpose0_1 = _mm_unpacklo_epi16(shifted2, shifted3); - const __m128i transpose0_2 = _mm_unpackhi_epi16(shifted0, shifted1); - const __m128i transpose0_3 = _mm_unpackhi_epi16(shifted2, shifted3); - // a00 a10 a01 a11 a02 a12 a03 a13 - // a20 a30 a21 a31 a22 a32 a23 a33 - // b00 b10 b01 b11 b02 b12 b03 b13 - // b20 b30 b21 b31 b22 b32 b23 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); - const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); - // a00 a10 a20 a30 a01 a11 a21 a31 - // b00 b10 b20 b30 b01 b11 b21 b31 - // a02 a12 a22 a32 a03 a13 a23 a33 - // b02 b12 a22 b32 b03 b13 b23 b33 - T0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); - T1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); - T2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); - T3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + VP8Transpose_2_4x4_16b(&shifted0, &shifted1, &shifted2, &shifted3, &T0, &T1, + &T2, &T3); } // Add inverse transform to 'ref' and store. @@ -373,42 +292,42 @@ static void FTransformPass2(const __m128i* const v01, const __m128i* const v32, static void FTransform(const uint8_t* src, const uint8_t* ref, int16_t* out) { const __m128i zero = _mm_setzero_si128(); - - // Load src and convert to 16b. + // Load src. const __m128i src0 = _mm_loadl_epi64((const __m128i*)&src[0 * BPS]); const __m128i src1 = _mm_loadl_epi64((const __m128i*)&src[1 * BPS]); const __m128i src2 = _mm_loadl_epi64((const __m128i*)&src[2 * BPS]); const __m128i src3 = _mm_loadl_epi64((const __m128i*)&src[3 * BPS]); - const __m128i src_0 = _mm_unpacklo_epi8(src0, zero); - const __m128i src_1 = _mm_unpacklo_epi8(src1, zero); - const __m128i src_2 = _mm_unpacklo_epi8(src2, zero); - const __m128i src_3 = _mm_unpacklo_epi8(src3, zero); - // Load ref and convert to 16b. + // 00 01 02 03 * + // 10 11 12 13 * + // 20 21 22 23 * + // 30 31 32 33 * + // Shuffle. + const __m128i src_0 = _mm_unpacklo_epi16(src0, src1); + const __m128i src_1 = _mm_unpacklo_epi16(src2, src3); + // 00 01 10 11 02 03 12 13 * * ... + // 20 21 30 31 22 22 32 33 * * ... + + // Load ref. const __m128i ref0 = _mm_loadl_epi64((const __m128i*)&ref[0 * BPS]); const __m128i ref1 = _mm_loadl_epi64((const __m128i*)&ref[1 * BPS]); const __m128i ref2 = _mm_loadl_epi64((const __m128i*)&ref[2 * BPS]); const __m128i ref3 = _mm_loadl_epi64((const __m128i*)&ref[3 * BPS]); - const __m128i ref_0 = _mm_unpacklo_epi8(ref0, zero); - const __m128i ref_1 = _mm_unpacklo_epi8(ref1, zero); - const __m128i ref_2 = _mm_unpacklo_epi8(ref2, zero); - const __m128i ref_3 = _mm_unpacklo_epi8(ref3, zero); - // Compute difference. -> 00 01 02 03 00 00 00 00 - const __m128i diff0 = _mm_sub_epi16(src_0, ref_0); - const __m128i diff1 = _mm_sub_epi16(src_1, ref_1); - const __m128i diff2 = _mm_sub_epi16(src_2, ref_2); - const __m128i diff3 = _mm_sub_epi16(src_3, ref_3); - - // Unpack and shuffle - // 00 01 02 03 0 0 0 0 - // 10 11 12 13 0 0 0 0 - // 20 21 22 23 0 0 0 0 - // 30 31 32 33 0 0 0 0 - const __m128i shuf01 = _mm_unpacklo_epi32(diff0, diff1); - const __m128i shuf23 = _mm_unpacklo_epi32(diff2, diff3); + const __m128i ref_0 = _mm_unpacklo_epi16(ref0, ref1); + const __m128i ref_1 = _mm_unpacklo_epi16(ref2, ref3); + + // Convert both to 16 bit. + const __m128i src_0_16b = _mm_unpacklo_epi8(src_0, zero); + const __m128i src_1_16b = _mm_unpacklo_epi8(src_1, zero); + const __m128i ref_0_16b = _mm_unpacklo_epi8(ref_0, zero); + const __m128i ref_1_16b = _mm_unpacklo_epi8(ref_1, zero); + + // Compute the difference. + const __m128i row01 = _mm_sub_epi16(src_0_16b, ref_0_16b); + const __m128i row23 = _mm_sub_epi16(src_1_16b, ref_1_16b); __m128i v01, v32; // First pass - FTransformPass1(&shuf01, &shuf23, &v01, &v32); + FTransformPass1(&row01, &row23, &v01, &v32); // Second pass FTransformPass2(&v01, &v32, out); @@ -463,8 +382,7 @@ static void FTransform2(const uint8_t* src, const uint8_t* ref, int16_t* out) { } static void FTransformWHTRow(const int16_t* const in, __m128i* const out) { - const __m128i kMult1 = _mm_set_epi16(0, 0, 0, 0, 1, 1, 1, 1); - const __m128i kMult2 = _mm_set_epi16(0, 0, 0, 0, -1, 1, -1, 1); + const __m128i kMult = _mm_set_epi16(-1, 1, -1, 1, 1, 1, 1, 1); const __m128i src0 = _mm_loadl_epi64((__m128i*)&in[0 * 16]); const __m128i src1 = _mm_loadl_epi64((__m128i*)&in[1 * 16]); const __m128i src2 = _mm_loadl_epi64((__m128i*)&in[2 * 16]); @@ -473,33 +391,38 @@ static void FTransformWHTRow(const int16_t* const in, __m128i* const out) { const __m128i A23 = _mm_unpacklo_epi16(src2, src3); // A2 A3 | ... const __m128i B0 = _mm_adds_epi16(A01, A23); // a0 | a1 | ... const __m128i B1 = _mm_subs_epi16(A01, A23); // a3 | a2 | ... - const __m128i C0 = _mm_unpacklo_epi32(B0, B1); // a0 | a1 | a3 | a2 - const __m128i C1 = _mm_unpacklo_epi32(B1, B0); // a3 | a2 | a0 | a1 - const __m128i D0 = _mm_madd_epi16(C0, kMult1); // out0, out1 - const __m128i D1 = _mm_madd_epi16(C1, kMult2); // out2, out3 - *out = _mm_unpacklo_epi64(D0, D1); + const __m128i C0 = _mm_unpacklo_epi32(B0, B1); // a0 | a1 | a3 | a2 | ... + const __m128i C1 = _mm_unpacklo_epi32(B1, B0); // a3 | a2 | a0 | a1 | ... + const __m128i D = _mm_unpacklo_epi64(C0, C1); // a0 a1 a3 a2 a3 a2 a0 a1 + *out = _mm_madd_epi16(D, kMult); } static void FTransformWHT(const int16_t* in, int16_t* out) { + // Input is 12b signed. __m128i row0, row1, row2, row3; + // Rows are 14b signed. FTransformWHTRow(in + 0 * 64, &row0); FTransformWHTRow(in + 1 * 64, &row1); FTransformWHTRow(in + 2 * 64, &row2); FTransformWHTRow(in + 3 * 64, &row3); { + // The a* are 15b signed. const __m128i a0 = _mm_add_epi32(row0, row2); const __m128i a1 = _mm_add_epi32(row1, row3); const __m128i a2 = _mm_sub_epi32(row1, row3); const __m128i a3 = _mm_sub_epi32(row0, row2); - const __m128i b0 = _mm_srai_epi32(_mm_add_epi32(a0, a1), 1); - const __m128i b1 = _mm_srai_epi32(_mm_add_epi32(a3, a2), 1); - const __m128i b2 = _mm_srai_epi32(_mm_sub_epi32(a3, a2), 1); - const __m128i b3 = _mm_srai_epi32(_mm_sub_epi32(a0, a1), 1); - const __m128i out0 = _mm_packs_epi32(b0, b1); - const __m128i out1 = _mm_packs_epi32(b2, b3); - _mm_storeu_si128((__m128i*)&out[0], out0); - _mm_storeu_si128((__m128i*)&out[8], out1); + const __m128i a0a3 = _mm_packs_epi32(a0, a3); + const __m128i a1a2 = _mm_packs_epi32(a1, a2); + + // The b* are 16b signed. + const __m128i b0b1 = _mm_add_epi16(a0a3, a1a2); + const __m128i b3b2 = _mm_sub_epi16(a0a3, a1a2); + const __m128i tmp_b2b3 = _mm_unpackhi_epi64(b3b2, b3b2); + const __m128i b2b3 = _mm_unpacklo_epi64(tmp_b2b3, b3b2); + + _mm_storeu_si128((__m128i*)&out[0], _mm_srai_epi16(b0b1, 1)); + _mm_storeu_si128((__m128i*)&out[8], _mm_srai_epi16(b2b3, 1)); } } @@ -692,12 +615,10 @@ static WEBP_INLINE void TrueMotion(uint8_t* dst, const uint8_t* left, static WEBP_INLINE void DC8uv(uint8_t* dst, const uint8_t* left, const uint8_t* top) { - const __m128i zero = _mm_setzero_si128(); const __m128i top_values = _mm_loadl_epi64((const __m128i*)top); const __m128i left_values = _mm_loadl_epi64((const __m128i*)left); - const __m128i sum_top = _mm_sad_epu8(top_values, zero); - const __m128i sum_left = _mm_sad_epu8(left_values, zero); - const int DC = _mm_cvtsi128_si32(sum_top) + _mm_cvtsi128_si32(sum_left) + 8; + const __m128i combined = _mm_unpacklo_epi64(top_values, left_values); + const int DC = VP8HorizontalAdd8b(&combined) + 8; Put8x8uv(DC >> 4, dst); } @@ -735,27 +656,16 @@ static WEBP_INLINE void DC8uvMode(uint8_t* dst, const uint8_t* left, static WEBP_INLINE void DC16(uint8_t* dst, const uint8_t* left, const uint8_t* top) { - const __m128i zero = _mm_setzero_si128(); const __m128i top_row = _mm_load_si128((const __m128i*)top); const __m128i left_row = _mm_load_si128((const __m128i*)left); - const __m128i sad8x2 = _mm_sad_epu8(top_row, zero); - // sum the two sads: sad8x2[0:1] + sad8x2[8:9] - const __m128i sum_top = _mm_add_epi16(sad8x2, _mm_shuffle_epi32(sad8x2, 2)); - const __m128i sad8x2_left = _mm_sad_epu8(left_row, zero); - // sum the two sads: sad8x2[0:1] + sad8x2[8:9] - const __m128i sum_left = - _mm_add_epi16(sad8x2_left, _mm_shuffle_epi32(sad8x2_left, 2)); - const int DC = _mm_cvtsi128_si32(sum_top) + _mm_cvtsi128_si32(sum_left) + 16; + const int DC = + VP8HorizontalAdd8b(&top_row) + VP8HorizontalAdd8b(&left_row) + 16; Put16(DC >> 5, dst); } static WEBP_INLINE void DC16NoLeft(uint8_t* dst, const uint8_t* top) { - const __m128i zero = _mm_setzero_si128(); const __m128i top_row = _mm_load_si128((const __m128i*)top); - const __m128i sad8x2 = _mm_sad_epu8(top_row, zero); - // sum the two sads: sad8x2[0:1] + sad8x2[8:9] - const __m128i sum = _mm_add_epi16(sad8x2, _mm_shuffle_epi32(sad8x2, 2)); - const int DC = _mm_cvtsi128_si32(sum) + 8; + const int DC = VP8HorizontalAdd8b(&top_row) + 8; Put16(DC >> 4, dst); } @@ -1142,15 +1052,15 @@ static int SSE4x4(const uint8_t* a, const uint8_t* b) { // reconstructed samples. // Hadamard transform -// Returns the difference between the weighted sum of the absolute value of -// transformed coefficients. +// Returns the weighted sum of the absolute value of transformed coefficients. +// w[] contains a row-major 4 by 4 symmetric matrix. static int TTransform(const uint8_t* inA, const uint8_t* inB, const uint16_t* const w) { int32_t sum[4]; __m128i tmp_0, tmp_1, tmp_2, tmp_3; const __m128i zero = _mm_setzero_si128(); - // Load, combine and transpose inputs. + // Load and combine inputs. { const __m128i inA_0 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 0]); const __m128i inA_1 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 1]); @@ -1162,37 +1072,22 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB, const __m128i inB_3 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 3]); // Combine inA and inB (we'll do two transforms in parallel). - const __m128i inAB_0 = _mm_unpacklo_epi8(inA_0, inB_0); - const __m128i inAB_1 = _mm_unpacklo_epi8(inA_1, inB_1); - const __m128i inAB_2 = _mm_unpacklo_epi8(inA_2, inB_2); - const __m128i inAB_3 = _mm_unpacklo_epi8(inA_3, inB_3); - // a00 b00 a01 b01 a02 b03 a03 b03 0 0 0 0 0 0 0 0 - // a10 b10 a11 b11 a12 b12 a13 b13 0 0 0 0 0 0 0 0 - // a20 b20 a21 b21 a22 b22 a23 b23 0 0 0 0 0 0 0 0 - // a30 b30 a31 b31 a32 b32 a33 b33 0 0 0 0 0 0 0 0 - - // Transpose the two 4x4, discarding the filling zeroes. - const __m128i transpose0_0 = _mm_unpacklo_epi8(inAB_0, inAB_2); - const __m128i transpose0_1 = _mm_unpacklo_epi8(inAB_1, inAB_3); - // a00 a20 b00 b20 a01 a21 b01 b21 a02 a22 b02 b22 a03 a23 b03 b23 - // a10 a30 b10 b30 a11 a31 b11 b31 a12 a32 b12 b32 a13 a33 b13 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi8(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpackhi_epi8(transpose0_0, transpose0_1); - // a00 a10 a20 a30 b00 b10 b20 b30 a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 a03 a13 a23 a33 b03 b13 b23 b33 - - // Convert to 16b. - tmp_0 = _mm_unpacklo_epi8(transpose1_0, zero); - tmp_1 = _mm_unpackhi_epi8(transpose1_0, zero); - tmp_2 = _mm_unpacklo_epi8(transpose1_1, zero); - tmp_3 = _mm_unpackhi_epi8(transpose1_1, zero); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + const __m128i inAB_0 = _mm_unpacklo_epi32(inA_0, inB_0); + const __m128i inAB_1 = _mm_unpacklo_epi32(inA_1, inB_1); + const __m128i inAB_2 = _mm_unpacklo_epi32(inA_2, inB_2); + const __m128i inAB_3 = _mm_unpacklo_epi32(inA_3, inB_3); + tmp_0 = _mm_unpacklo_epi8(inAB_0, zero); + tmp_1 = _mm_unpacklo_epi8(inAB_1, zero); + tmp_2 = _mm_unpacklo_epi8(inAB_2, zero); + tmp_3 = _mm_unpacklo_epi8(inAB_3, zero); + // a00 a01 a02 a03 b00 b01 b02 b03 + // a10 a11 a12 a13 b10 b11 b12 b13 + // a20 a21 a22 a23 b20 b21 b22 b23 + // a30 a31 a32 a33 b30 b31 b32 b33 } - // Horizontal pass and subsequent transpose. + // Vertical pass first to avoid a transpose (vertical and horizontal passes + // are commutative because w/kWeightY is symmetric) and subsequent transpose. { // Calculate a and b (two 4x4 at once). const __m128i a0 = _mm_add_epi16(tmp_0, tmp_2); @@ -1209,33 +1104,10 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB, // a30 a31 a32 a33 b30 b31 b32 b33 // Transpose the two 4x4. - const __m128i transpose0_0 = _mm_unpacklo_epi16(b0, b1); - const __m128i transpose0_1 = _mm_unpacklo_epi16(b2, b3); - const __m128i transpose0_2 = _mm_unpackhi_epi16(b0, b1); - const __m128i transpose0_3 = _mm_unpackhi_epi16(b2, b3); - // a00 a10 a01 a11 a02 a12 a03 a13 - // a20 a30 a21 a31 a22 a32 a23 a33 - // b00 b10 b01 b11 b02 b12 b03 b13 - // b20 b30 b21 b31 b22 b32 b23 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); - const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); - // a00 a10 a20 a30 a01 a11 a21 a31 - // b00 b10 b20 b30 b01 b11 b21 b31 - // a02 a12 a22 a32 a03 a13 a23 a33 - // b02 b12 a22 b32 b03 b13 b23 b33 - tmp_0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); - tmp_1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); - tmp_2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); - tmp_3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + VP8Transpose_2_4x4_16b(&b0, &b1, &b2, &b3, &tmp_0, &tmp_1, &tmp_2, &tmp_3); } - // Vertical pass and difference of weighted sums. + // Horizontal pass and difference of weighted sums. { // Load all inputs. const __m128i w_0 = _mm_loadu_si128((const __m128i*)&w[0]); diff --git a/src/3rdparty/libwebp/src/dsp/enc_sse41.c b/src/3rdparty/libwebp/src/dsp/enc_sse41.c index 65c01ae..a178390 100644 --- a/src/3rdparty/libwebp/src/dsp/enc_sse41.c +++ b/src/3rdparty/libwebp/src/dsp/enc_sse41.c @@ -17,6 +17,7 @@ #include <smmintrin.h> #include <stdlib.h> // for abs() +#include "./common_sse2.h" #include "../enc/vp8enci.h" //------------------------------------------------------------------------------ @@ -67,55 +68,45 @@ static void CollectHistogram(const uint8_t* ref, const uint8_t* pred, // reconstructed samples. // Hadamard transform -// Returns the difference between the weighted sum of the absolute value of -// transformed coefficients. +// Returns the weighted sum of the absolute value of transformed coefficients. +// w[] contains a row-major 4 by 4 symmetric matrix. static int TTransform(const uint8_t* inA, const uint8_t* inB, const uint16_t* const w) { + int32_t sum[4]; __m128i tmp_0, tmp_1, tmp_2, tmp_3; - // Load, combine and transpose inputs. + // Load and combine inputs. { - const __m128i inA_0 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 0]); - const __m128i inA_1 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 1]); - const __m128i inA_2 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 2]); + const __m128i inA_0 = _mm_loadu_si128((const __m128i*)&inA[BPS * 0]); + const __m128i inA_1 = _mm_loadu_si128((const __m128i*)&inA[BPS * 1]); + const __m128i inA_2 = _mm_loadu_si128((const __m128i*)&inA[BPS * 2]); + // In SSE4.1, with gcc 4.8 at least (maybe other versions), + // _mm_loadu_si128 is faster than _mm_loadl_epi64. But for the last lump + // of inA and inB, _mm_loadl_epi64 is still used not to have an out of + // bound read. const __m128i inA_3 = _mm_loadl_epi64((const __m128i*)&inA[BPS * 3]); - const __m128i inB_0 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 0]); - const __m128i inB_1 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 1]); - const __m128i inB_2 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 2]); + const __m128i inB_0 = _mm_loadu_si128((const __m128i*)&inB[BPS * 0]); + const __m128i inB_1 = _mm_loadu_si128((const __m128i*)&inB[BPS * 1]); + const __m128i inB_2 = _mm_loadu_si128((const __m128i*)&inB[BPS * 2]); const __m128i inB_3 = _mm_loadl_epi64((const __m128i*)&inB[BPS * 3]); // Combine inA and inB (we'll do two transforms in parallel). - const __m128i inAB_0 = _mm_unpacklo_epi8(inA_0, inB_0); - const __m128i inAB_1 = _mm_unpacklo_epi8(inA_1, inB_1); - const __m128i inAB_2 = _mm_unpacklo_epi8(inA_2, inB_2); - const __m128i inAB_3 = _mm_unpacklo_epi8(inA_3, inB_3); - // a00 b00 a01 b01 a02 b03 a03 b03 0 0 0 0 0 0 0 0 - // a10 b10 a11 b11 a12 b12 a13 b13 0 0 0 0 0 0 0 0 - // a20 b20 a21 b21 a22 b22 a23 b23 0 0 0 0 0 0 0 0 - // a30 b30 a31 b31 a32 b32 a33 b33 0 0 0 0 0 0 0 0 - - // Transpose the two 4x4, discarding the filling zeroes. - const __m128i transpose0_0 = _mm_unpacklo_epi8(inAB_0, inAB_2); - const __m128i transpose0_1 = _mm_unpacklo_epi8(inAB_1, inAB_3); - // a00 a20 b00 b20 a01 a21 b01 b21 a02 a22 b02 b22 a03 a23 b03 b23 - // a10 a30 b10 b30 a11 a31 b11 b31 a12 a32 b12 b32 a13 a33 b13 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi8(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpackhi_epi8(transpose0_0, transpose0_1); - // a00 a10 a20 a30 b00 b10 b20 b30 a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 a03 a13 a23 a33 b03 b13 b23 b33 - - // Convert to 16b. - tmp_0 = _mm_cvtepu8_epi16(transpose1_0); - tmp_1 = _mm_cvtepu8_epi16(_mm_srli_si128(transpose1_0, 8)); - tmp_2 = _mm_cvtepu8_epi16(transpose1_1); - tmp_3 = _mm_cvtepu8_epi16(_mm_srli_si128(transpose1_1, 8)); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + const __m128i inAB_0 = _mm_unpacklo_epi32(inA_0, inB_0); + const __m128i inAB_1 = _mm_unpacklo_epi32(inA_1, inB_1); + const __m128i inAB_2 = _mm_unpacklo_epi32(inA_2, inB_2); + const __m128i inAB_3 = _mm_unpacklo_epi32(inA_3, inB_3); + tmp_0 = _mm_cvtepu8_epi16(inAB_0); + tmp_1 = _mm_cvtepu8_epi16(inAB_1); + tmp_2 = _mm_cvtepu8_epi16(inAB_2); + tmp_3 = _mm_cvtepu8_epi16(inAB_3); + // a00 a01 a02 a03 b00 b01 b02 b03 + // a10 a11 a12 a13 b10 b11 b12 b13 + // a20 a21 a22 a23 b20 b21 b22 b23 + // a30 a31 a32 a33 b30 b31 b32 b33 } - // Horizontal pass and subsequent transpose. + // Vertical pass first to avoid a transpose (vertical and horizontal passes + // are commutative because w/kWeightY is symmetric) and subsequent transpose. { // Calculate a and b (two 4x4 at once). const __m128i a0 = _mm_add_epi16(tmp_0, tmp_2); @@ -132,33 +123,10 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB, // a30 a31 a32 a33 b30 b31 b32 b33 // Transpose the two 4x4. - const __m128i transpose0_0 = _mm_unpacklo_epi16(b0, b1); - const __m128i transpose0_1 = _mm_unpacklo_epi16(b2, b3); - const __m128i transpose0_2 = _mm_unpackhi_epi16(b0, b1); - const __m128i transpose0_3 = _mm_unpackhi_epi16(b2, b3); - // a00 a10 a01 a11 a02 a12 a03 a13 - // a20 a30 a21 a31 a22 a32 a23 a33 - // b00 b10 b01 b11 b02 b12 b03 b13 - // b20 b30 b21 b31 b22 b32 b23 b33 - const __m128i transpose1_0 = _mm_unpacklo_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_1 = _mm_unpacklo_epi32(transpose0_2, transpose0_3); - const __m128i transpose1_2 = _mm_unpackhi_epi32(transpose0_0, transpose0_1); - const __m128i transpose1_3 = _mm_unpackhi_epi32(transpose0_2, transpose0_3); - // a00 a10 a20 a30 a01 a11 a21 a31 - // b00 b10 b20 b30 b01 b11 b21 b31 - // a02 a12 a22 a32 a03 a13 a23 a33 - // b02 b12 a22 b32 b03 b13 b23 b33 - tmp_0 = _mm_unpacklo_epi64(transpose1_0, transpose1_1); - tmp_1 = _mm_unpackhi_epi64(transpose1_0, transpose1_1); - tmp_2 = _mm_unpacklo_epi64(transpose1_2, transpose1_3); - tmp_3 = _mm_unpackhi_epi64(transpose1_2, transpose1_3); - // a00 a10 a20 a30 b00 b10 b20 b30 - // a01 a11 a21 a31 b01 b11 b21 b31 - // a02 a12 a22 a32 b02 b12 b22 b32 - // a03 a13 a23 a33 b03 b13 b23 b33 + VP8Transpose_2_4x4_16b(&b0, &b1, &b2, &b3, &tmp_0, &tmp_1, &tmp_2, &tmp_3); } - // Vertical pass and difference of weighted sums. + // Horizontal pass and difference of weighted sums. { // Load all inputs. const __m128i w_0 = _mm_loadu_si128((const __m128i*)&w[0]); @@ -195,11 +163,9 @@ static int TTransform(const uint8_t* inA, const uint8_t* inB, // difference of weighted sums A_b2 = _mm_sub_epi32(A_b0, B_b0); - // cascading summation of the differences - B_b0 = _mm_hadd_epi32(A_b2, A_b2); - B_b2 = _mm_hadd_epi32(B_b0, B_b0); - return _mm_cvtsi128_si32(B_b2); + _mm_storeu_si128((__m128i*)&sum[0], A_b2); } + return sum[0] + sum[1] + sum[2] + sum[3]; } static int Disto4x4(const uint8_t* const a, const uint8_t* const b, diff --git a/src/3rdparty/libwebp/src/dsp/filters.c b/src/3rdparty/libwebp/src/dsp/filters.c index 5c30f2e..9f04faf 100644 --- a/src/3rdparty/libwebp/src/dsp/filters.c +++ b/src/3rdparty/libwebp/src/dsp/filters.c @@ -184,19 +184,40 @@ static void GradientFilter(const uint8_t* data, int width, int height, //------------------------------------------------------------------------------ -static void VerticalUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoVerticalFilter(data, width, height, stride, row, num_rows, 1, data); +static void HorizontalUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + uint8_t pred = (prev == NULL) ? 0 : prev[0]; + int i; + for (i = 0; i < width; ++i) { + out[i] = pred + in[i]; + pred = out[i]; + } } -static void HorizontalUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoHorizontalFilter(data, width, height, stride, row, num_rows, 1, data); +static void VerticalUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + if (prev == NULL) { + HorizontalUnfilter(NULL, in, out, width); + } else { + int i; + for (i = 0; i < width; ++i) out[i] = prev[i] + in[i]; + } } -static void GradientUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoGradientFilter(data, width, height, stride, row, num_rows, 1, data); +static void GradientUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + if (prev == NULL) { + HorizontalUnfilter(NULL, in, out, width); + } else { + uint8_t top = prev[0], top_left = top, left = top; + int i; + for (i = 0; i < width; ++i) { + top = prev[i]; // need to read this first, in case prev==out + left = in[i] + GradientPredictor(left, top, top_left); + top_left = top; + out[i] = left; + } + } } //------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c index 8134af5..1d82e3c 100644 --- a/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c +++ b/src/3rdparty/libwebp/src/dsp/filters_mips_dsp_r2.c @@ -33,10 +33,6 @@ assert(row >= 0 && num_rows > 0 && row + num_rows <= height); \ (void)height; // Silence unused warning. -// if INVERSE -// preds == &dst[-1] == &src[-1] -// else -// preds == &src[-1] != &dst[-1] #define DO_PREDICT_LINE(SRC, DST, LENGTH, INVERSE) do { \ const uint8_t* psrc = (uint8_t*)(SRC); \ uint8_t* pdst = (uint8_t*)(DST); \ @@ -45,27 +41,28 @@ __asm__ volatile ( \ ".set push \n\t" \ ".set noreorder \n\t" \ - "srl %[temp0], %[length], 0x2 \n\t" \ + "srl %[temp0], %[length], 2 \n\t" \ "beqz %[temp0], 4f \n\t" \ - " andi %[temp6], %[length], 0x3 \n\t" \ + " andi %[temp6], %[length], 3 \n\t" \ ".if " #INVERSE " \n\t" \ - "lbu %[temp1], -1(%[src]) \n\t" \ "1: \n\t" \ + "lbu %[temp1], -1(%[dst]) \n\t" \ "lbu %[temp2], 0(%[src]) \n\t" \ "lbu %[temp3], 1(%[src]) \n\t" \ "lbu %[temp4], 2(%[src]) \n\t" \ "lbu %[temp5], 3(%[src]) \n\t" \ + "addu %[temp1], %[temp1], %[temp2] \n\t" \ + "addu %[temp2], %[temp1], %[temp3] \n\t" \ + "addu %[temp3], %[temp2], %[temp4] \n\t" \ + "addu %[temp4], %[temp3], %[temp5] \n\t" \ + "sb %[temp1], 0(%[dst]) \n\t" \ + "sb %[temp2], 1(%[dst]) \n\t" \ + "sb %[temp3], 2(%[dst]) \n\t" \ + "sb %[temp4], 3(%[dst]) \n\t" \ "addiu %[src], %[src], 4 \n\t" \ "addiu %[temp0], %[temp0], -1 \n\t" \ - "addu %[temp2], %[temp2], %[temp1] \n\t" \ - "addu %[temp3], %[temp3], %[temp2] \n\t" \ - "addu %[temp4], %[temp4], %[temp3] \n\t" \ - "addu %[temp1], %[temp5], %[temp4] \n\t" \ - "sb %[temp2], -4(%[src]) \n\t" \ - "sb %[temp3], -3(%[src]) \n\t" \ - "sb %[temp4], -2(%[src]) \n\t" \ "bnez %[temp0], 1b \n\t" \ - " sb %[temp1], -1(%[src]) \n\t" \ + " addiu %[dst], %[dst], 4 \n\t" \ ".else \n\t" \ "1: \n\t" \ "ulw %[temp1], -1(%[src]) \n\t" \ @@ -81,16 +78,16 @@ "beqz %[temp6], 3f \n\t" \ " nop \n\t" \ "2: \n\t" \ - "lbu %[temp1], -1(%[src]) \n\t" \ "lbu %[temp2], 0(%[src]) \n\t" \ - "addiu %[src], %[src], 1 \n\t" \ ".if " #INVERSE " \n\t" \ + "lbu %[temp1], -1(%[dst]) \n\t" \ "addu %[temp3], %[temp1], %[temp2] \n\t" \ - "sb %[temp3], -1(%[src]) \n\t" \ ".else \n\t" \ + "lbu %[temp1], -1(%[src]) \n\t" \ "subu %[temp3], %[temp1], %[temp2] \n\t" \ - "sb %[temp3], 0(%[dst]) \n\t" \ ".endif \n\t" \ + "addiu %[src], %[src], 1 \n\t" \ + "sb %[temp3], 0(%[dst]) \n\t" \ "addiu %[temp6], %[temp6], -1 \n\t" \ "bnez %[temp6], 2b \n\t" \ " addiu %[dst], %[dst], 1 \n\t" \ @@ -105,12 +102,8 @@ } while (0) static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst, - int length, int inverse) { - if (inverse) { - DO_PREDICT_LINE(src, dst, length, 1); - } else { - DO_PREDICT_LINE(src, dst, length, 0); - } + int length) { + DO_PREDICT_LINE(src, dst, length, 0); } #define DO_PREDICT_LINE_VERTICAL(SRC, PRED, DST, LENGTH, INVERSE) do { \ @@ -172,16 +165,12 @@ static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst, ); \ } while (0) -#define PREDICT_LINE_ONE_PASS(SRC, PRED, DST, INVERSE) do { \ +#define PREDICT_LINE_ONE_PASS(SRC, PRED, DST) do { \ int temp1, temp2, temp3; \ __asm__ volatile ( \ "lbu %[temp1], 0(%[src]) \n\t" \ "lbu %[temp2], 0(%[pred]) \n\t" \ - ".if " #INVERSE " \n\t" \ - "addu %[temp3], %[temp1], %[temp2] \n\t" \ - ".else \n\t" \ "subu %[temp3], %[temp1], %[temp2] \n\t" \ - ".endif \n\t" \ "sb %[temp3], 0(%[dst]) \n\t" \ : [temp1]"=&r"(temp1), [temp2]"=&r"(temp2), [temp3]"=&r"(temp3) \ : [pred]"r"((PRED)), [dst]"r"((DST)), [src]"r"((SRC)) \ @@ -192,10 +181,10 @@ static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst, //------------------------------------------------------------------------------ // Horizontal filter. -#define FILTER_LINE_BY_LINE(INVERSE) do { \ +#define FILTER_LINE_BY_LINE do { \ while (row < last_row) { \ - PREDICT_LINE_ONE_PASS(in, preds - stride, out, INVERSE); \ - DO_PREDICT_LINE(in + 1, out + 1, width - 1, INVERSE); \ + PREDICT_LINE_ONE_PASS(in, preds - stride, out); \ + DO_PREDICT_LINE(in + 1, out + 1, width - 1, 0); \ ++row; \ preds += stride; \ in += stride; \ @@ -206,19 +195,19 @@ static WEBP_INLINE void PredictLine(const uint8_t* src, uint8_t* dst, static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in, int width, int height, int stride, int row, int num_rows, - int inverse, uint8_t* out) { + uint8_t* out) { const uint8_t* preds; const size_t start_offset = row * stride; const int last_row = row + num_rows; SANITY_CHECK(in, out); in += start_offset; out += start_offset; - preds = inverse ? out : in; + preds = in; if (row == 0) { // Leftmost pixel is the same as input for topmost scanline. out[0] = in[0]; - PredictLine(in + 1, out + 1, width - 1, inverse); + PredictLine(in + 1, out + 1, width - 1); row = 1; preds += stride; in += stride; @@ -226,31 +215,21 @@ static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in, } // Filter line-by-line. - if (inverse) { - FILTER_LINE_BY_LINE(1); - } else { - FILTER_LINE_BY_LINE(0); - } + FILTER_LINE_BY_LINE; } - #undef FILTER_LINE_BY_LINE static void HorizontalFilter(const uint8_t* data, int width, int height, int stride, uint8_t* filtered_data) { - DoHorizontalFilter(data, width, height, stride, 0, height, 0, filtered_data); -} - -static void HorizontalUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoHorizontalFilter(data, width, height, stride, row, num_rows, 1, data); + DoHorizontalFilter(data, width, height, stride, 0, height, filtered_data); } //------------------------------------------------------------------------------ // Vertical filter. -#define FILTER_LINE_BY_LINE(INVERSE) do { \ +#define FILTER_LINE_BY_LINE do { \ while (row < last_row) { \ - DO_PREDICT_LINE_VERTICAL(in, preds, out, width, INVERSE); \ + DO_PREDICT_LINE_VERTICAL(in, preds, out, width, 0); \ ++row; \ preds += stride; \ in += stride; \ @@ -260,21 +239,20 @@ static void HorizontalUnfilter(int width, int height, int stride, int row, static WEBP_INLINE void DoVerticalFilter(const uint8_t* in, int width, int height, int stride, - int row, int num_rows, - int inverse, uint8_t* out) { + int row, int num_rows, uint8_t* out) { const uint8_t* preds; const size_t start_offset = row * stride; const int last_row = row + num_rows; SANITY_CHECK(in, out); in += start_offset; out += start_offset; - preds = inverse ? out : in; + preds = in; if (row == 0) { // Very first top-left pixel is copied. out[0] = in[0]; // Rest of top scan-line is left-predicted. - PredictLine(in + 1, out + 1, width - 1, inverse); + PredictLine(in + 1, out + 1, width - 1); row = 1; in += stride; out += stride; @@ -284,24 +262,13 @@ static WEBP_INLINE void DoVerticalFilter(const uint8_t* in, } // Filter line-by-line. - if (inverse) { - FILTER_LINE_BY_LINE(1); - } else { - FILTER_LINE_BY_LINE(0); - } + FILTER_LINE_BY_LINE; } - #undef FILTER_LINE_BY_LINE -#undef DO_PREDICT_LINE_VERTICAL static void VerticalFilter(const uint8_t* data, int width, int height, int stride, uint8_t* filtered_data) { - DoVerticalFilter(data, width, height, stride, 0, height, 0, filtered_data); -} - -static void VerticalUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoVerticalFilter(data, width, height, stride, row, num_rows, 1, data); + DoVerticalFilter(data, width, height, stride, 0, height, filtered_data); } //------------------------------------------------------------------------------ @@ -321,10 +288,10 @@ static WEBP_INLINE int GradientPredictor(uint8_t a, uint8_t b, uint8_t c) { return temp0; } -#define FILTER_LINE_BY_LINE(INVERSE, PREDS, OPERATION) do { \ +#define FILTER_LINE_BY_LINE(PREDS, OPERATION) do { \ while (row < last_row) { \ int w; \ - PREDICT_LINE_ONE_PASS(in, PREDS - stride, out, INVERSE); \ + PREDICT_LINE_ONE_PASS(in, PREDS - stride, out); \ for (w = 1; w < width; ++w) { \ const int pred = GradientPredictor(PREDS[w - 1], \ PREDS[w - stride], \ @@ -339,20 +306,19 @@ static WEBP_INLINE int GradientPredictor(uint8_t a, uint8_t b, uint8_t c) { static WEBP_INLINE void DoGradientFilter(const uint8_t* in, int width, int height, int stride, - int row, int num_rows, - int inverse, uint8_t* out) { + int row, int num_rows, uint8_t* out) { const uint8_t* preds; const size_t start_offset = row * stride; const int last_row = row + num_rows; SANITY_CHECK(in, out); in += start_offset; out += start_offset; - preds = inverse ? out : in; + preds = in; // left prediction for top scan-line if (row == 0) { out[0] = in[0]; - PredictLine(in + 1, out + 1, width - 1, inverse); + PredictLine(in + 1, out + 1, width - 1); row = 1; preds += stride; in += stride; @@ -360,25 +326,49 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in, } // Filter line-by-line. - if (inverse) { - FILTER_LINE_BY_LINE(1, out, +); - } else { - FILTER_LINE_BY_LINE(0, in, -); - } + FILTER_LINE_BY_LINE(in, -); } - #undef FILTER_LINE_BY_LINE static void GradientFilter(const uint8_t* data, int width, int height, int stride, uint8_t* filtered_data) { - DoGradientFilter(data, width, height, stride, 0, height, 0, filtered_data); + DoGradientFilter(data, width, height, stride, 0, height, filtered_data); } -static void GradientUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoGradientFilter(data, width, height, stride, row, num_rows, 1, data); +//------------------------------------------------------------------------------ + +static void HorizontalUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + out[0] = in[0] + (prev == NULL ? 0 : prev[0]); + DO_PREDICT_LINE(in + 1, out + 1, width - 1, 1); } +static void VerticalUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + if (prev == NULL) { + HorizontalUnfilter(NULL, in, out, width); + } else { + DO_PREDICT_LINE_VERTICAL(in, prev, out, width, 1); + } +} + +static void GradientUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + if (prev == NULL) { + HorizontalUnfilter(NULL, in, out, width); + } else { + uint8_t top = prev[0], top_left = top, left = top; + int i; + for (i = 0; i < width; ++i) { + top = prev[i]; // need to read this first, in case prev==dst + left = in[i] + GradientPredictor(left, top, top_left); + top_left = top; + out[i] = left; + } + } +} + +#undef DO_PREDICT_LINE_VERTICAL #undef PREDICT_LINE_ONE_PASS #undef DO_PREDICT_LINE #undef SANITY_CHECK @@ -389,13 +379,13 @@ static void GradientUnfilter(int width, int height, int stride, int row, extern void VP8FiltersInitMIPSdspR2(void); WEBP_TSAN_IGNORE_FUNCTION void VP8FiltersInitMIPSdspR2(void) { - WebPFilters[WEBP_FILTER_HORIZONTAL] = HorizontalFilter; - WebPFilters[WEBP_FILTER_VERTICAL] = VerticalFilter; - WebPFilters[WEBP_FILTER_GRADIENT] = GradientFilter; - WebPUnfilters[WEBP_FILTER_HORIZONTAL] = HorizontalUnfilter; WebPUnfilters[WEBP_FILTER_VERTICAL] = VerticalUnfilter; WebPUnfilters[WEBP_FILTER_GRADIENT] = GradientUnfilter; + + WebPFilters[WEBP_FILTER_HORIZONTAL] = HorizontalFilter; + WebPFilters[WEBP_FILTER_VERTICAL] = VerticalFilter; + WebPFilters[WEBP_FILTER_GRADIENT] = GradientFilter; } #else // !WEBP_USE_MIPS_DSP_R2 diff --git a/src/3rdparty/libwebp/src/dsp/filters_sse2.c b/src/3rdparty/libwebp/src/dsp/filters_sse2.c index bf93342..67f7799 100644 --- a/src/3rdparty/libwebp/src/dsp/filters_sse2.c +++ b/src/3rdparty/libwebp/src/dsp/filters_sse2.c @@ -33,82 +33,39 @@ (void)height; // Silence unused warning. static void PredictLineTop(const uint8_t* src, const uint8_t* pred, - uint8_t* dst, int length, int inverse) { + uint8_t* dst, int length) { int i; const int max_pos = length & ~31; assert(length >= 0); - if (inverse) { - for (i = 0; i < max_pos; i += 32) { - const __m128i A0 = _mm_loadu_si128((const __m128i*)&src[i + 0]); - const __m128i A1 = _mm_loadu_si128((const __m128i*)&src[i + 16]); - const __m128i B0 = _mm_loadu_si128((const __m128i*)&pred[i + 0]); - const __m128i B1 = _mm_loadu_si128((const __m128i*)&pred[i + 16]); - const __m128i C0 = _mm_add_epi8(A0, B0); - const __m128i C1 = _mm_add_epi8(A1, B1); - _mm_storeu_si128((__m128i*)&dst[i + 0], C0); - _mm_storeu_si128((__m128i*)&dst[i + 16], C1); - } - for (; i < length; ++i) dst[i] = src[i] + pred[i]; - } else { - for (i = 0; i < max_pos; i += 32) { - const __m128i A0 = _mm_loadu_si128((const __m128i*)&src[i + 0]); - const __m128i A1 = _mm_loadu_si128((const __m128i*)&src[i + 16]); - const __m128i B0 = _mm_loadu_si128((const __m128i*)&pred[i + 0]); - const __m128i B1 = _mm_loadu_si128((const __m128i*)&pred[i + 16]); - const __m128i C0 = _mm_sub_epi8(A0, B0); - const __m128i C1 = _mm_sub_epi8(A1, B1); - _mm_storeu_si128((__m128i*)&dst[i + 0], C0); - _mm_storeu_si128((__m128i*)&dst[i + 16], C1); - } - for (; i < length; ++i) dst[i] = src[i] - pred[i]; + for (i = 0; i < max_pos; i += 32) { + const __m128i A0 = _mm_loadu_si128((const __m128i*)&src[i + 0]); + const __m128i A1 = _mm_loadu_si128((const __m128i*)&src[i + 16]); + const __m128i B0 = _mm_loadu_si128((const __m128i*)&pred[i + 0]); + const __m128i B1 = _mm_loadu_si128((const __m128i*)&pred[i + 16]); + const __m128i C0 = _mm_sub_epi8(A0, B0); + const __m128i C1 = _mm_sub_epi8(A1, B1); + _mm_storeu_si128((__m128i*)&dst[i + 0], C0); + _mm_storeu_si128((__m128i*)&dst[i + 16], C1); } + for (; i < length; ++i) dst[i] = src[i] - pred[i]; } // Special case for left-based prediction (when preds==dst-1 or preds==src-1). -static void PredictLineLeft(const uint8_t* src, uint8_t* dst, int length, - int inverse) { +static void PredictLineLeft(const uint8_t* src, uint8_t* dst, int length) { int i; - if (length <= 0) return; - if (inverse) { - const int max_pos = length & ~7; - __m128i last = _mm_set_epi32(0, 0, 0, dst[-1]); - for (i = 0; i < max_pos; i += 8) { - const __m128i A0 = _mm_loadl_epi64((const __m128i*)(src + i)); - const __m128i A1 = _mm_add_epi8(A0, last); - const __m128i A2 = _mm_slli_si128(A1, 1); - const __m128i A3 = _mm_add_epi8(A1, A2); - const __m128i A4 = _mm_slli_si128(A3, 2); - const __m128i A5 = _mm_add_epi8(A3, A4); - const __m128i A6 = _mm_slli_si128(A5, 4); - const __m128i A7 = _mm_add_epi8(A5, A6); - _mm_storel_epi64((__m128i*)(dst + i), A7); - last = _mm_srli_epi64(A7, 56); - } - for (; i < length; ++i) dst[i] = src[i] + dst[i - 1]; - } else { - const int max_pos = length & ~31; - for (i = 0; i < max_pos; i += 32) { - const __m128i A0 = _mm_loadu_si128((const __m128i*)(src + i + 0 )); - const __m128i B0 = _mm_loadu_si128((const __m128i*)(src + i + 0 - 1)); - const __m128i A1 = _mm_loadu_si128((const __m128i*)(src + i + 16 )); - const __m128i B1 = _mm_loadu_si128((const __m128i*)(src + i + 16 - 1)); - const __m128i C0 = _mm_sub_epi8(A0, B0); - const __m128i C1 = _mm_sub_epi8(A1, B1); - _mm_storeu_si128((__m128i*)(dst + i + 0), C0); - _mm_storeu_si128((__m128i*)(dst + i + 16), C1); - } - for (; i < length; ++i) dst[i] = src[i] - src[i - 1]; - } -} - -static void PredictLineC(const uint8_t* src, const uint8_t* pred, - uint8_t* dst, int length, int inverse) { - int i; - if (inverse) { - for (i = 0; i < length; ++i) dst[i] = src[i] + pred[i]; - } else { - for (i = 0; i < length; ++i) dst[i] = src[i] - pred[i]; + const int max_pos = length & ~31; + assert(length >= 0); + for (i = 0; i < max_pos; i += 32) { + const __m128i A0 = _mm_loadu_si128((const __m128i*)(src + i + 0 )); + const __m128i B0 = _mm_loadu_si128((const __m128i*)(src + i + 0 - 1)); + const __m128i A1 = _mm_loadu_si128((const __m128i*)(src + i + 16 )); + const __m128i B1 = _mm_loadu_si128((const __m128i*)(src + i + 16 - 1)); + const __m128i C0 = _mm_sub_epi8(A0, B0); + const __m128i C1 = _mm_sub_epi8(A1, B1); + _mm_storeu_si128((__m128i*)(dst + i + 0), C0); + _mm_storeu_si128((__m128i*)(dst + i + 16), C1); } + for (; i < length; ++i) dst[i] = src[i] - src[i - 1]; } //------------------------------------------------------------------------------ @@ -117,21 +74,18 @@ static void PredictLineC(const uint8_t* src, const uint8_t* pred, static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in, int width, int height, int stride, int row, int num_rows, - int inverse, uint8_t* out) { - const uint8_t* preds; + uint8_t* out) { const size_t start_offset = row * stride; const int last_row = row + num_rows; SANITY_CHECK(in, out); in += start_offset; out += start_offset; - preds = inverse ? out : in; if (row == 0) { // Leftmost pixel is the same as input for topmost scanline. out[0] = in[0]; - PredictLineLeft(in + 1, out + 1, width - 1, inverse); + PredictLineLeft(in + 1, out + 1, width - 1); row = 1; - preds += stride; in += stride; out += stride; } @@ -139,10 +93,9 @@ static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in, // Filter line-by-line. while (row < last_row) { // Leftmost pixel is predicted from above. - PredictLineC(in, preds - stride, out, 1, inverse); - PredictLineLeft(in + 1, out + 1, width - 1, inverse); + out[0] = in[0] - in[-stride]; + PredictLineLeft(in + 1, out + 1, width - 1); ++row; - preds += stride; in += stride; out += stride; } @@ -153,34 +106,27 @@ static WEBP_INLINE void DoHorizontalFilter(const uint8_t* in, static WEBP_INLINE void DoVerticalFilter(const uint8_t* in, int width, int height, int stride, - int row, int num_rows, - int inverse, uint8_t* out) { - const uint8_t* preds; + int row, int num_rows, uint8_t* out) { const size_t start_offset = row * stride; const int last_row = row + num_rows; SANITY_CHECK(in, out); in += start_offset; out += start_offset; - preds = inverse ? out : in; if (row == 0) { // Very first top-left pixel is copied. out[0] = in[0]; // Rest of top scan-line is left-predicted. - PredictLineLeft(in + 1, out + 1, width - 1, inverse); + PredictLineLeft(in + 1, out + 1, width - 1); row = 1; in += stride; out += stride; - } else { - // We are starting from in-between. Make sure 'preds' points to prev row. - preds -= stride; } // Filter line-by-line. while (row < last_row) { - PredictLineTop(in, preds, out, width, inverse); + PredictLineTop(in, in - stride, out, width); ++row; - preds += stride; in += stride; out += stride; } @@ -219,49 +165,10 @@ static void GradientPredictDirect(const uint8_t* const row, } } -static void GradientPredictInverse(const uint8_t* const in, - const uint8_t* const top, - uint8_t* const row, int length) { - if (length > 0) { - int i; - const int max_pos = length & ~7; - const __m128i zero = _mm_setzero_si128(); - __m128i A = _mm_set_epi32(0, 0, 0, row[-1]); // left sample - for (i = 0; i < max_pos; i += 8) { - const __m128i tmp0 = _mm_loadl_epi64((const __m128i*)&top[i]); - const __m128i tmp1 = _mm_loadl_epi64((const __m128i*)&top[i - 1]); - const __m128i B = _mm_unpacklo_epi8(tmp0, zero); - const __m128i C = _mm_unpacklo_epi8(tmp1, zero); - const __m128i tmp2 = _mm_loadl_epi64((const __m128i*)&in[i]); - const __m128i D = _mm_unpacklo_epi8(tmp2, zero); // base input - const __m128i E = _mm_sub_epi16(B, C); // unclipped gradient basis B - C - __m128i out = zero; // accumulator for output - __m128i mask_hi = _mm_set_epi32(0, 0, 0, 0xff); - int k = 8; - while (1) { - const __m128i tmp3 = _mm_add_epi16(A, E); // delta = A + B - C - const __m128i tmp4 = _mm_min_epi16(tmp3, mask_hi); - const __m128i tmp5 = _mm_max_epi16(tmp4, zero); // clipped delta - const __m128i tmp6 = _mm_add_epi16(tmp5, D); // add to in[] values - A = _mm_and_si128(tmp6, mask_hi); // 1-complement clip - out = _mm_or_si128(out, A); // accumulate output - if (--k == 0) break; - A = _mm_slli_si128(A, 2); // rotate left sample - mask_hi = _mm_slli_si128(mask_hi, 2); // rotate mask - } - A = _mm_srli_si128(A, 14); // prepare left sample for next iteration - _mm_storel_epi64((__m128i*)&row[i], _mm_packus_epi16(out, zero)); - } - for (; i < length; ++i) { - row[i] = in[i] + GradientPredictorC(row[i - 1], top[i], top[i - 1]); - } - } -} - static WEBP_INLINE void DoGradientFilter(const uint8_t* in, int width, int height, int stride, int row, int num_rows, - int inverse, uint8_t* out) { + uint8_t* out) { const size_t start_offset = row * stride; const int last_row = row + num_rows; SANITY_CHECK(in, out); @@ -271,7 +178,7 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in, // left prediction for top scan-line if (row == 0) { out[0] = in[0]; - PredictLineLeft(in + 1, out + 1, width - 1, inverse); + PredictLineLeft(in + 1, out + 1, width - 1); row = 1; in += stride; out += stride; @@ -279,13 +186,8 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in, // Filter line-by-line. while (row < last_row) { - if (inverse) { - PredictLineC(in, out - stride, out, 1, inverse); // predict from above - GradientPredictInverse(in + 1, out + 1 - stride, out + 1, width - 1); - } else { - PredictLineC(in, in - stride, out, 1, inverse); - GradientPredictDirect(in + 1, in + 1 - stride, out + 1, width - 1); - } + out[0] = in[0] - in[-stride]; + GradientPredictDirect(in + 1, in + 1 - stride, out + 1, width - 1); ++row; in += stride; out += stride; @@ -298,36 +200,112 @@ static WEBP_INLINE void DoGradientFilter(const uint8_t* in, static void HorizontalFilter(const uint8_t* data, int width, int height, int stride, uint8_t* filtered_data) { - DoHorizontalFilter(data, width, height, stride, 0, height, 0, filtered_data); + DoHorizontalFilter(data, width, height, stride, 0, height, filtered_data); } static void VerticalFilter(const uint8_t* data, int width, int height, int stride, uint8_t* filtered_data) { - DoVerticalFilter(data, width, height, stride, 0, height, 0, filtered_data); + DoVerticalFilter(data, width, height, stride, 0, height, filtered_data); } - static void GradientFilter(const uint8_t* data, int width, int height, int stride, uint8_t* filtered_data) { - DoGradientFilter(data, width, height, stride, 0, height, 0, filtered_data); + DoGradientFilter(data, width, height, stride, 0, height, filtered_data); } - //------------------------------------------------------------------------------ +// Inverse transforms -static void VerticalUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoVerticalFilter(data, width, height, stride, row, num_rows, 1, data); +static void HorizontalUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + int i; + __m128i last; + out[0] = in[0] + (prev == NULL ? 0 : prev[0]); + if (width <= 1) return; + last = _mm_set_epi32(0, 0, 0, out[0]); + for (i = 1; i + 8 <= width; i += 8) { + const __m128i A0 = _mm_loadl_epi64((const __m128i*)(in + i)); + const __m128i A1 = _mm_add_epi8(A0, last); + const __m128i A2 = _mm_slli_si128(A1, 1); + const __m128i A3 = _mm_add_epi8(A1, A2); + const __m128i A4 = _mm_slli_si128(A3, 2); + const __m128i A5 = _mm_add_epi8(A3, A4); + const __m128i A6 = _mm_slli_si128(A5, 4); + const __m128i A7 = _mm_add_epi8(A5, A6); + _mm_storel_epi64((__m128i*)(out + i), A7); + last = _mm_srli_epi64(A7, 56); + } + for (; i < width; ++i) out[i] = in[i] + out[i - 1]; } -static void HorizontalUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoHorizontalFilter(data, width, height, stride, row, num_rows, 1, data); +static void VerticalUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + if (prev == NULL) { + HorizontalUnfilter(NULL, in, out, width); + } else { + int i; + const int max_pos = width & ~31; + assert(width >= 0); + for (i = 0; i < max_pos; i += 32) { + const __m128i A0 = _mm_loadu_si128((const __m128i*)&in[i + 0]); + const __m128i A1 = _mm_loadu_si128((const __m128i*)&in[i + 16]); + const __m128i B0 = _mm_loadu_si128((const __m128i*)&prev[i + 0]); + const __m128i B1 = _mm_loadu_si128((const __m128i*)&prev[i + 16]); + const __m128i C0 = _mm_add_epi8(A0, B0); + const __m128i C1 = _mm_add_epi8(A1, B1); + _mm_storeu_si128((__m128i*)&out[i + 0], C0); + _mm_storeu_si128((__m128i*)&out[i + 16], C1); + } + for (; i < width; ++i) out[i] = in[i] + prev[i]; + } } -static void GradientUnfilter(int width, int height, int stride, int row, - int num_rows, uint8_t* data) { - DoGradientFilter(data, width, height, stride, row, num_rows, 1, data); +static void GradientPredictInverse(const uint8_t* const in, + const uint8_t* const top, + uint8_t* const row, int length) { + if (length > 0) { + int i; + const int max_pos = length & ~7; + const __m128i zero = _mm_setzero_si128(); + __m128i A = _mm_set_epi32(0, 0, 0, row[-1]); // left sample + for (i = 0; i < max_pos; i += 8) { + const __m128i tmp0 = _mm_loadl_epi64((const __m128i*)&top[i]); + const __m128i tmp1 = _mm_loadl_epi64((const __m128i*)&top[i - 1]); + const __m128i B = _mm_unpacklo_epi8(tmp0, zero); + const __m128i C = _mm_unpacklo_epi8(tmp1, zero); + const __m128i D = _mm_loadl_epi64((const __m128i*)&in[i]); // base input + const __m128i E = _mm_sub_epi16(B, C); // unclipped gradient basis B - C + __m128i out = zero; // accumulator for output + __m128i mask_hi = _mm_set_epi32(0, 0, 0, 0xff); + int k = 8; + while (1) { + const __m128i tmp3 = _mm_add_epi16(A, E); // delta = A + B - C + const __m128i tmp4 = _mm_packus_epi16(tmp3, zero); // saturate delta + const __m128i tmp5 = _mm_add_epi8(tmp4, D); // add to in[] + A = _mm_and_si128(tmp5, mask_hi); // 1-complement clip + out = _mm_or_si128(out, A); // accumulate output + if (--k == 0) break; + A = _mm_slli_si128(A, 1); // rotate left sample + mask_hi = _mm_slli_si128(mask_hi, 1); // rotate mask + A = _mm_unpacklo_epi8(A, zero); // convert 8b->16b + } + A = _mm_srli_si128(A, 7); // prepare left sample for next iteration + _mm_storel_epi64((__m128i*)&row[i], out); + } + for (; i < length; ++i) { + row[i] = in[i] + GradientPredictorC(row[i - 1], top[i], top[i - 1]); + } + } +} + +static void GradientUnfilter(const uint8_t* prev, const uint8_t* in, + uint8_t* out, int width) { + if (prev == NULL) { + HorizontalUnfilter(NULL, in, out, width); + } else { + out[0] = in[0] + prev[0]; // predict from above + GradientPredictInverse(in + 1, prev + 1, out + 1, width - 1); + } } //------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/dsp/lossless.c b/src/3rdparty/libwebp/src/dsp/lossless.c index 71ae9d4..af913ef 100644 --- a/src/3rdparty/libwebp/src/dsp/lossless.c +++ b/src/3rdparty/libwebp/src/dsp/lossless.c @@ -28,9 +28,7 @@ // In-place sum of each component with mod 256. static WEBP_INLINE void AddPixelsEq(uint32_t* a, uint32_t b) { - const uint32_t alpha_and_green = (*a & 0xff00ff00u) + (b & 0xff00ff00u); - const uint32_t red_and_blue = (*a & 0x00ff00ffu) + (b & 0x00ff00ffu); - *a = (alpha_and_green & 0xff00ff00u) | (red_and_blue & 0x00ff00ffu); + *a = VP8LAddPixels(*a, b); } static WEBP_INLINE uint32_t Average2(uint32_t a0, uint32_t a1) { diff --git a/src/3rdparty/libwebp/src/dsp/lossless.h b/src/3rdparty/libwebp/src/dsp/lossless.h index e063bdd..9f0d7a2 100644 --- a/src/3rdparty/libwebp/src/dsp/lossless.h +++ b/src/3rdparty/libwebp/src/dsp/lossless.h @@ -158,7 +158,8 @@ void VP8LCollectColorBlueTransforms_C(const uint32_t* argb, int stride, void VP8LResidualImage(int width, int height, int bits, int low_effort, uint32_t* const argb, uint32_t* const argb_scratch, - uint32_t* const image, int exact); + uint32_t* const image, int near_lossless, int exact, + int used_subtract_green); void VP8LColorSpaceTransform(int width, int height, int bits, int quality, uint32_t* const argb, uint32_t* image); @@ -172,6 +173,17 @@ static WEBP_INLINE uint32_t VP8LSubSampleSize(uint32_t size, return (size + (1 << sampling_bits) - 1) >> sampling_bits; } +// Converts near lossless quality into max number of bits shaved off. +static WEBP_INLINE int VP8LNearLosslessBits(int near_lossless_quality) { + // 100 -> 0 + // 80..99 -> 1 + // 60..79 -> 2 + // 40..59 -> 3 + // 20..39 -> 4 + // 0..19 -> 5 + return 5 - near_lossless_quality / 20; +} + // ----------------------------------------------------------------------------- // Faster logarithm for integers. Small values use a look-up table. @@ -262,6 +274,11 @@ extern VP8LHistogramAddFunc VP8LHistogramAdd; // ----------------------------------------------------------------------------- // PrefixEncode() +typedef int (*VP8LVectorMismatchFunc)(const uint32_t* const array1, + const uint32_t* const array2, int length); +// Returns the first index where array1 and array2 are different. +extern VP8LVectorMismatchFunc VP8LVectorMismatch; + static WEBP_INLINE int VP8LBitsLog2Ceiling(uint32_t n) { const int log_floor = BitsLog2Floor(n); if (n == (n & ~(n - 1))) // zero or a power of two. @@ -324,7 +341,14 @@ static WEBP_INLINE void VP8LPrefixEncode(int distance, int* const code, } } -// In-place difference of each component with mod 256. +// Sum of each component, mod 256. +static WEBP_INLINE uint32_t VP8LAddPixels(uint32_t a, uint32_t b) { + const uint32_t alpha_and_green = (a & 0xff00ff00u) + (b & 0xff00ff00u); + const uint32_t red_and_blue = (a & 0x00ff00ffu) + (b & 0x00ff00ffu); + return (alpha_and_green & 0xff00ff00u) | (red_and_blue & 0x00ff00ffu); +} + +// Difference of each component, mod 256. static WEBP_INLINE uint32_t VP8LSubPixels(uint32_t a, uint32_t b) { const uint32_t alpha_and_green = 0x00ff00ffu + (a & 0xff00ff00u) - (b & 0xff00ff00u); diff --git a/src/3rdparty/libwebp/src/dsp/lossless_enc.c b/src/3rdparty/libwebp/src/dsp/lossless_enc.c index 2eafa3d..256f6f5 100644 --- a/src/3rdparty/libwebp/src/dsp/lossless_enc.c +++ b/src/3rdparty/libwebp/src/dsp/lossless_enc.c @@ -382,6 +382,7 @@ static float FastLog2Slow(uint32_t v) { // Mostly used to reduce code size + readability static WEBP_INLINE int GetMin(int a, int b) { return (a > b) ? b : a; } +static WEBP_INLINE int GetMax(int a, int b) { return (a < b) ? b : a; } //------------------------------------------------------------------------------ // Methods to calculate Entropy (Shannon). @@ -551,18 +552,204 @@ static WEBP_INLINE uint32_t Predict(VP8LPredictorFunc pred_func, } } +static int MaxDiffBetweenPixels(uint32_t p1, uint32_t p2) { + const int diff_a = abs((int)(p1 >> 24) - (int)(p2 >> 24)); + const int diff_r = abs((int)((p1 >> 16) & 0xff) - (int)((p2 >> 16) & 0xff)); + const int diff_g = abs((int)((p1 >> 8) & 0xff) - (int)((p2 >> 8) & 0xff)); + const int diff_b = abs((int)(p1 & 0xff) - (int)(p2 & 0xff)); + return GetMax(GetMax(diff_a, diff_r), GetMax(diff_g, diff_b)); +} + +static int MaxDiffAroundPixel(uint32_t current, uint32_t up, uint32_t down, + uint32_t left, uint32_t right) { + const int diff_up = MaxDiffBetweenPixels(current, up); + const int diff_down = MaxDiffBetweenPixels(current, down); + const int diff_left = MaxDiffBetweenPixels(current, left); + const int diff_right = MaxDiffBetweenPixels(current, right); + return GetMax(GetMax(diff_up, diff_down), GetMax(diff_left, diff_right)); +} + +static uint32_t AddGreenToBlueAndRed(uint32_t argb) { + const uint32_t green = (argb >> 8) & 0xff; + uint32_t red_blue = argb & 0x00ff00ffu; + red_blue += (green << 16) | green; + red_blue &= 0x00ff00ffu; + return (argb & 0xff00ff00u) | red_blue; +} + +static void MaxDiffsForRow(int width, int stride, const uint32_t* const argb, + uint8_t* const max_diffs, int used_subtract_green) { + uint32_t current, up, down, left, right; + int x; + if (width <= 2) return; + current = argb[0]; + right = argb[1]; + if (used_subtract_green) { + current = AddGreenToBlueAndRed(current); + right = AddGreenToBlueAndRed(right); + } + // max_diffs[0] and max_diffs[width - 1] are never used. + for (x = 1; x < width - 1; ++x) { + up = argb[-stride + x]; + down = argb[stride + x]; + left = current; + current = right; + right = argb[x + 1]; + if (used_subtract_green) { + up = AddGreenToBlueAndRed(up); + down = AddGreenToBlueAndRed(down); + right = AddGreenToBlueAndRed(right); + } + max_diffs[x] = MaxDiffAroundPixel(current, up, down, left, right); + } +} + +// Quantize the difference between the actual component value and its prediction +// to a multiple of quantization, working modulo 256, taking care not to cross +// a boundary (inclusive upper limit). +static uint8_t NearLosslessComponent(uint8_t value, uint8_t predict, + uint8_t boundary, int quantization) { + const int residual = (value - predict) & 0xff; + const int boundary_residual = (boundary - predict) & 0xff; + const int lower = residual & ~(quantization - 1); + const int upper = lower + quantization; + // Resolve ties towards a value closer to the prediction (i.e. towards lower + // if value comes after prediction and towards upper otherwise). + const int bias = ((boundary - value) & 0xff) < boundary_residual; + if (residual - lower < upper - residual + bias) { + // lower is closer to residual than upper. + if (residual > boundary_residual && lower <= boundary_residual) { + // Halve quantization step to avoid crossing boundary. This midpoint is + // on the same side of boundary as residual because midpoint >= residual + // (since lower is closer than upper) and residual is above the boundary. + return lower + (quantization >> 1); + } + return lower; + } else { + // upper is closer to residual than lower. + if (residual <= boundary_residual && upper > boundary_residual) { + // Halve quantization step to avoid crossing boundary. This midpoint is + // on the same side of boundary as residual because midpoint <= residual + // (since upper is closer than lower) and residual is below the boundary. + return lower + (quantization >> 1); + } + return upper & 0xff; + } +} + +// Quantize every component of the difference between the actual pixel value and +// its prediction to a multiple of a quantization (a power of 2, not larger than +// max_quantization which is a power of 2, smaller than max_diff). Take care if +// value and predict have undergone subtract green, which means that red and +// blue are represented as offsets from green. +static uint32_t NearLossless(uint32_t value, uint32_t predict, + int max_quantization, int max_diff, + int used_subtract_green) { + int quantization; + uint8_t new_green = 0; + uint8_t green_diff = 0; + uint8_t a, r, g, b; + if (max_diff <= 2) { + return VP8LSubPixels(value, predict); + } + quantization = max_quantization; + while (quantization >= max_diff) { + quantization >>= 1; + } + if ((value >> 24) == 0 || (value >> 24) == 0xff) { + // Preserve transparency of fully transparent or fully opaque pixels. + a = ((value >> 24) - (predict >> 24)) & 0xff; + } else { + a = NearLosslessComponent(value >> 24, predict >> 24, 0xff, quantization); + } + g = NearLosslessComponent((value >> 8) & 0xff, (predict >> 8) & 0xff, 0xff, + quantization); + if (used_subtract_green) { + // The green offset will be added to red and blue components during decoding + // to obtain the actual red and blue values. + new_green = ((predict >> 8) + g) & 0xff; + // The amount by which green has been adjusted during quantization. It is + // subtracted from red and blue for compensation, to avoid accumulating two + // quantization errors in them. + green_diff = (new_green - (value >> 8)) & 0xff; + } + r = NearLosslessComponent(((value >> 16) - green_diff) & 0xff, + (predict >> 16) & 0xff, 0xff - new_green, + quantization); + b = NearLosslessComponent((value - green_diff) & 0xff, predict & 0xff, + 0xff - new_green, quantization); + return ((uint32_t)a << 24) | ((uint32_t)r << 16) | ((uint32_t)g << 8) | b; +} + +// Returns the difference between the pixel and its prediction. In case of a +// lossy encoding, updates the source image to avoid propagating the deviation +// further to pixels which depend on the current pixel for their predictions. +static WEBP_INLINE uint32_t GetResidual(int width, int height, + uint32_t* const upper_row, + uint32_t* const current_row, + const uint8_t* const max_diffs, + int mode, VP8LPredictorFunc pred_func, + int x, int y, int max_quantization, + int exact, int used_subtract_green) { + const uint32_t predict = Predict(pred_func, x, y, current_row, upper_row); + uint32_t residual; + if (max_quantization == 1 || mode == 0 || y == 0 || y == height - 1 || + x == 0 || x == width - 1) { + residual = VP8LSubPixels(current_row[x], predict); + } else { + residual = NearLossless(current_row[x], predict, max_quantization, + max_diffs[x], used_subtract_green); + // Update the source image. + current_row[x] = VP8LAddPixels(predict, residual); + // x is never 0 here so we do not need to update upper_row like below. + } + if (!exact && (current_row[x] & kMaskAlpha) == 0) { + // If alpha is 0, cleanup RGB. We can choose the RGB values of the residual + // for best compression. The prediction of alpha itself can be non-zero and + // must be kept though. We choose RGB of the residual to be 0. + residual &= kMaskAlpha; + // Update the source image. + current_row[x] = predict & ~kMaskAlpha; + // The prediction for the rightmost pixel in a row uses the leftmost pixel + // in that row as its top-right context pixel. Hence if we change the + // leftmost pixel of current_row, the corresponding change must be applied + // to upper_row as well where top-right context is being read from. + if (x == 0 && y != 0) upper_row[width] = current_row[0]; + } + return residual; +} + // Returns best predictor and updates the accumulated histogram. +// If max_quantization > 1, assumes that near lossless processing will be +// applied, quantizing residuals to multiples of quantization levels up to +// max_quantization (the actual quantization level depends on smoothness near +// the given pixel). static int GetBestPredictorForTile(int width, int height, int tile_x, int tile_y, int bits, int accumulated[4][256], - const uint32_t* const argb_scratch, - int exact) { + uint32_t* const argb_scratch, + const uint32_t* const argb, + int max_quantization, + int exact, int used_subtract_green) { const int kNumPredModes = 14; - const int col_start = tile_x << bits; - const int row_start = tile_y << bits; + const int start_x = tile_x << bits; + const int start_y = tile_y << bits; const int tile_size = 1 << bits; - const int max_y = GetMin(tile_size, height - row_start); - const int max_x = GetMin(tile_size, width - col_start); + const int max_y = GetMin(tile_size, height - start_y); + const int max_x = GetMin(tile_size, width - start_x); + // Whether there exist columns just outside the tile. + const int have_left = (start_x > 0); + const int have_right = (max_x < width - start_x); + // Position and size of the strip covering the tile and adjacent columns if + // they exist. + const int context_start_x = start_x - have_left; + const int context_width = max_x + have_left + have_right; + // The width of upper_row and current_row is one pixel larger than image width + // to allow the top right pixel to point to the leftmost pixel of the next row + // when at the right edge. + uint32_t* upper_row = argb_scratch; + uint32_t* current_row = upper_row + width + 1; + uint8_t* const max_diffs = (uint8_t*)(current_row + width + 1); float best_diff = MAX_DIFF_COST; int best_mode = 0; int mode; @@ -571,28 +758,46 @@ static int GetBestPredictorForTile(int width, int height, // Need pointers to be able to swap arrays. int (*histo_argb)[256] = histo_stack_1; int (*best_histo)[256] = histo_stack_2; - int i, j; + for (mode = 0; mode < kNumPredModes; ++mode) { - const uint32_t* current_row = argb_scratch; const VP8LPredictorFunc pred_func = VP8LPredictors[mode]; float cur_diff; - int y; + int relative_y; memset(histo_argb, 0, sizeof(histo_stack_1)); - for (y = 0; y < max_y; ++y) { - int x; - const int row = row_start + y; - const uint32_t* const upper_row = current_row; - current_row = upper_row + width; - for (x = 0; x < max_x; ++x) { - const int col = col_start + x; - const uint32_t predict = - Predict(pred_func, col, row, current_row, upper_row); - uint32_t residual = VP8LSubPixels(current_row[col], predict); - if (!exact && (current_row[col] & kMaskAlpha) == 0) { - residual &= kMaskAlpha; // See CopyTileWithPrediction. - } - UpdateHisto(histo_argb, residual); + if (start_y > 0) { + // Read the row above the tile which will become the first upper_row. + // Include a pixel to the left if it exists; include a pixel to the right + // in all cases (wrapping to the leftmost pixel of the next row if it does + // not exist). + memcpy(current_row + context_start_x, + argb + (start_y - 1) * width + context_start_x, + sizeof(*argb) * (max_x + have_left + 1)); + } + for (relative_y = 0; relative_y < max_y; ++relative_y) { + const int y = start_y + relative_y; + int relative_x; + uint32_t* tmp = upper_row; + upper_row = current_row; + current_row = tmp; + // Read current_row. Include a pixel to the left if it exists; include a + // pixel to the right in all cases except at the bottom right corner of + // the image (wrapping to the leftmost pixel of the next row if it does + // not exist in the current row). + memcpy(current_row + context_start_x, + argb + y * width + context_start_x, + sizeof(*argb) * (max_x + have_left + (y + 1 < height))); + if (max_quantization > 1 && y >= 1 && y + 1 < height) { + MaxDiffsForRow(context_width, width, argb + y * width + context_start_x, + max_diffs + context_start_x, used_subtract_green); + } + + for (relative_x = 0; relative_x < max_x; ++relative_x) { + const int x = start_x + relative_x; + UpdateHisto(histo_argb, + GetResidual(width, height, upper_row, current_row, + max_diffs, mode, pred_func, x, y, + max_quantization, exact, used_subtract_green)); } } cur_diff = PredictionCostSpatialHistogram( @@ -615,71 +820,82 @@ static int GetBestPredictorForTile(int width, int height, return best_mode; } +// Converts pixels of the image to residuals with respect to predictions. +// If max_quantization > 1, applies near lossless processing, quantizing +// residuals to multiples of quantization levels up to max_quantization +// (the actual quantization level depends on smoothness near the given pixel). static void CopyImageWithPrediction(int width, int height, int bits, uint32_t* const modes, uint32_t* const argb_scratch, uint32_t* const argb, - int low_effort, int exact) { + int low_effort, int max_quantization, + int exact, int used_subtract_green) { const int tiles_per_row = VP8LSubSampleSize(width, bits); const int mask = (1 << bits) - 1; - // The row size is one pixel longer to allow the top right pixel to point to - // the leftmost pixel of the next row when at the right edge. - uint32_t* current_row = argb_scratch; - uint32_t* upper_row = argb_scratch + width + 1; + // The width of upper_row and current_row is one pixel larger than image width + // to allow the top right pixel to point to the leftmost pixel of the next row + // when at the right edge. + uint32_t* upper_row = argb_scratch; + uint32_t* current_row = upper_row + width + 1; + uint8_t* current_max_diffs = (uint8_t*)(current_row + width + 1); + uint8_t* lower_max_diffs = current_max_diffs + width; int y; - VP8LPredictorFunc pred_func = - low_effort ? VP8LPredictors[kPredLowEffort] : NULL; + int mode = 0; + VP8LPredictorFunc pred_func = NULL; for (y = 0; y < height; ++y) { int x; - uint32_t* tmp = upper_row; + uint32_t* const tmp32 = upper_row; upper_row = current_row; - current_row = tmp; - memcpy(current_row, argb + y * width, sizeof(*current_row) * width); - current_row[width] = (y + 1 < height) ? argb[(y + 1) * width] : ARGB_BLACK; + current_row = tmp32; + memcpy(current_row, argb + y * width, + sizeof(*argb) * (width + (y + 1 < height))); if (low_effort) { for (x = 0; x < width; ++x) { - const uint32_t predict = - Predict(pred_func, x, y, current_row, upper_row); + const uint32_t predict = Predict(VP8LPredictors[kPredLowEffort], x, y, + current_row, upper_row); argb[y * width + x] = VP8LSubPixels(current_row[x], predict); } } else { + if (max_quantization > 1) { + // Compute max_diffs for the lower row now, because that needs the + // contents of argb for the current row, which we will overwrite with + // residuals before proceeding with the next row. + uint8_t* const tmp8 = current_max_diffs; + current_max_diffs = lower_max_diffs; + lower_max_diffs = tmp8; + if (y + 2 < height) { + MaxDiffsForRow(width, width, argb + (y + 1) * width, lower_max_diffs, + used_subtract_green); + } + } for (x = 0; x < width; ++x) { - uint32_t predict, residual; if ((x & mask) == 0) { - const int mode = - (modes[(y >> bits) * tiles_per_row + (x >> bits)] >> 8) & 0xff; + mode = (modes[(y >> bits) * tiles_per_row + (x >> bits)] >> 8) & 0xff; pred_func = VP8LPredictors[mode]; } - predict = Predict(pred_func, x, y, current_row, upper_row); - residual = VP8LSubPixels(current_row[x], predict); - if (!exact && (current_row[x] & kMaskAlpha) == 0) { - // If alpha is 0, cleanup RGB. We can choose the RGB values of the - // residual for best compression. The prediction of alpha itself can - // be non-zero and must be kept though. We choose RGB of the residual - // to be 0. - residual &= kMaskAlpha; - // Update input image so that next predictions use correct RGB value. - current_row[x] = predict & ~kMaskAlpha; - if (x == 0 && y != 0) upper_row[width] = current_row[x]; - } - argb[y * width + x] = residual; + argb[y * width + x] = GetResidual( + width, height, upper_row, current_row, current_max_diffs, mode, + pred_func, x, y, max_quantization, exact, used_subtract_green); } } } } +// Finds the best predictor for each tile, and converts the image to residuals +// with respect to predictions. If near_lossless_quality < 100, applies +// near lossless processing, shaving off more bits of residuals for lower +// qualities. void VP8LResidualImage(int width, int height, int bits, int low_effort, uint32_t* const argb, uint32_t* const argb_scratch, - uint32_t* const image, int exact) { - const int max_tile_size = 1 << bits; + uint32_t* const image, int near_lossless_quality, + int exact, int used_subtract_green) { const int tiles_per_row = VP8LSubSampleSize(width, bits); const int tiles_per_col = VP8LSubSampleSize(height, bits); - uint32_t* const upper_row = argb_scratch; - uint32_t* const current_tile_rows = argb_scratch + width; int tile_y; int histo[4][256]; + const int max_quantization = 1 << VP8LNearLosslessBits(near_lossless_quality); if (low_effort) { int i; for (i = 0; i < tiles_per_row * tiles_per_col; ++i) { @@ -688,26 +904,19 @@ void VP8LResidualImage(int width, int height, int bits, int low_effort, } else { memset(histo, 0, sizeof(histo)); for (tile_y = 0; tile_y < tiles_per_col; ++tile_y) { - const int tile_y_offset = tile_y * max_tile_size; - const int this_tile_height = - (tile_y < tiles_per_col - 1) ? max_tile_size : height - tile_y_offset; int tile_x; - if (tile_y > 0) { - memcpy(upper_row, current_tile_rows + (max_tile_size - 1) * width, - width * sizeof(*upper_row)); - } - memcpy(current_tile_rows, &argb[tile_y_offset * width], - this_tile_height * width * sizeof(*current_tile_rows)); for (tile_x = 0; tile_x < tiles_per_row; ++tile_x) { const int pred = GetBestPredictorForTile(width, height, tile_x, tile_y, - bits, (int (*)[256])histo, argb_scratch, exact); + bits, histo, argb_scratch, argb, max_quantization, exact, + used_subtract_green); image[tile_y * tiles_per_row + tile_x] = ARGB_BLACK | (pred << 8); } } } - CopyImageWithPrediction(width, height, bits, - image, argb_scratch, argb, low_effort, exact); + CopyImageWithPrediction(width, height, bits, image, argb_scratch, argb, + low_effort, max_quantization, exact, + used_subtract_green); } void VP8LSubtractGreenFromBlueAndRed_C(uint32_t* argb_data, int num_pixels) { @@ -1053,6 +1262,17 @@ void VP8LColorSpaceTransform(int width, int height, int bits, int quality, } //------------------------------------------------------------------------------ + +static int VectorMismatch(const uint32_t* const array1, + const uint32_t* const array2, int length) { + int match_len = 0; + + while (match_len < length && array1[match_len] == array2[match_len]) { + ++match_len; + } + return match_len; +} + // Bundles multiple (1, 2, 4 or 8) pixels into a single pixel. void VP8LBundleColorMap(const uint8_t* const row, int width, int xbits, uint32_t* const dst) { @@ -1149,6 +1369,8 @@ GetEntropyUnrefinedHelperFunc VP8LGetEntropyUnrefinedHelper; VP8LHistogramAddFunc VP8LHistogramAdd; +VP8LVectorMismatchFunc VP8LVectorMismatch; + extern void VP8LEncDspInitSSE2(void); extern void VP8LEncDspInitSSE41(void); extern void VP8LEncDspInitNEON(void); @@ -1181,6 +1403,8 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInit(void) { VP8LHistogramAdd = HistogramAdd; + VP8LVectorMismatch = VectorMismatch; + // If defined, use CPUInfo() to overwrite some pointers with faster versions. if (VP8GetCPUInfo != NULL) { #if defined(WEBP_USE_SSE2) diff --git a/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c b/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c index e8c9834..7c894e7 100644 --- a/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c +++ b/src/3rdparty/libwebp/src/dsp/lossless_enc_sse2.c @@ -325,6 +325,57 @@ static float CombinedShannonEntropy(const int X[256], const int Y[256]) { #undef ANALYZE_XY //------------------------------------------------------------------------------ + +static int VectorMismatch(const uint32_t* const array1, + const uint32_t* const array2, int length) { + int match_len; + + if (length >= 12) { + __m128i A0 = _mm_loadu_si128((const __m128i*)&array1[0]); + __m128i A1 = _mm_loadu_si128((const __m128i*)&array2[0]); + match_len = 0; + do { + // Loop unrolling and early load both provide a speedup of 10% for the + // current function. Also, max_limit can be MAX_LENGTH=4096 at most. + const __m128i cmpA = _mm_cmpeq_epi32(A0, A1); + const __m128i B0 = + _mm_loadu_si128((const __m128i*)&array1[match_len + 4]); + const __m128i B1 = + _mm_loadu_si128((const __m128i*)&array2[match_len + 4]); + if (_mm_movemask_epi8(cmpA) != 0xffff) break; + match_len += 4; + + { + const __m128i cmpB = _mm_cmpeq_epi32(B0, B1); + A0 = _mm_loadu_si128((const __m128i*)&array1[match_len + 4]); + A1 = _mm_loadu_si128((const __m128i*)&array2[match_len + 4]); + if (_mm_movemask_epi8(cmpB) != 0xffff) break; + match_len += 4; + } + } while (match_len + 12 < length); + } else { + match_len = 0; + // Unroll the potential first two loops. + if (length >= 4 && + _mm_movemask_epi8(_mm_cmpeq_epi32( + _mm_loadu_si128((const __m128i*)&array1[0]), + _mm_loadu_si128((const __m128i*)&array2[0]))) == 0xffff) { + match_len = 4; + if (length >= 8 && + _mm_movemask_epi8(_mm_cmpeq_epi32( + _mm_loadu_si128((const __m128i*)&array1[4]), + _mm_loadu_si128((const __m128i*)&array2[4]))) == 0xffff) + match_len = 8; + } + } + + while (match_len < length && array1[match_len] == array2[match_len]) { + ++match_len; + } + return match_len; +} + +//------------------------------------------------------------------------------ // Entry point extern void VP8LEncDspInitSSE2(void); @@ -336,6 +387,7 @@ WEBP_TSAN_IGNORE_FUNCTION void VP8LEncDspInitSSE2(void) { VP8LCollectColorRedTransforms = CollectColorRedTransforms; VP8LHistogramAdd = HistogramAdd; VP8LCombinedShannonEntropy = CombinedShannonEntropy; + VP8LVectorMismatch = VectorMismatch; } #else // !WEBP_USE_SSE2 diff --git a/src/3rdparty/libwebp/src/dsp/msa_macro.h b/src/3rdparty/libwebp/src/dsp/msa_macro.h new file mode 100644 index 0000000..5c707f4 --- /dev/null +++ b/src/3rdparty/libwebp/src/dsp/msa_macro.h @@ -0,0 +1,555 @@ +// Copyright 2016 Google Inc. All Rights Reserved. +// +// Use of this source code is governed by a BSD-style license +// that can be found in the COPYING file in the root of the source +// tree. An additional intellectual property rights grant can be found +// in the file PATENTS. All contributing project authors may +// be found in the AUTHORS file in the root of the source tree. +// ----------------------------------------------------------------------------- +// +// MSA common macros +// +// Author(s): Prashant Patil (prashant.patil@imgtec.com) + +#ifndef WEBP_DSP_MSA_MACRO_H_ +#define WEBP_DSP_MSA_MACRO_H_ + +#include <stdint.h> +#include <msa.h> + +#if defined(__clang__) + #define CLANG_BUILD +#endif + +#ifdef CLANG_BUILD + #define ADDVI_H(a, b) __msa_addvi_h((v8i16)a, b) + #define SRAI_H(a, b) __msa_srai_h((v8i16)a, b) + #define SRAI_W(a, b) __msa_srai_w((v4i32)a, b) +#else + #define ADDVI_H(a, b) (a + b) + #define SRAI_H(a, b) (a >> b) + #define SRAI_W(a, b) (a >> b) +#endif + +#define LD_B(RTYPE, psrc) *((RTYPE*)(psrc)) +#define LD_UB(...) LD_B(v16u8, __VA_ARGS__) +#define LD_SB(...) LD_B(v16i8, __VA_ARGS__) + +#define LD_H(RTYPE, psrc) *((RTYPE*)(psrc)) +#define LD_UH(...) LD_H(v8u16, __VA_ARGS__) +#define LD_SH(...) LD_H(v8i16, __VA_ARGS__) + +#define LD_W(RTYPE, psrc) *((RTYPE*)(psrc)) +#define LD_UW(...) LD_W(v4u32, __VA_ARGS__) +#define LD_SW(...) LD_W(v4i32, __VA_ARGS__) + +#define ST_B(RTYPE, in, pdst) *((RTYPE*)(pdst)) = in +#define ST_UB(...) ST_B(v16u8, __VA_ARGS__) +#define ST_SB(...) ST_B(v16i8, __VA_ARGS__) + +#define ST_H(RTYPE, in, pdst) *((RTYPE*)(pdst)) = in +#define ST_UH(...) ST_H(v8u16, __VA_ARGS__) +#define ST_SH(...) ST_H(v8i16, __VA_ARGS__) + +#define ST_W(RTYPE, in, pdst) *((RTYPE*)(pdst)) = in +#define ST_UW(...) ST_W(v4u32, __VA_ARGS__) +#define ST_SW(...) ST_W(v4i32, __VA_ARGS__) + +#define MSA_LOAD_FUNC(TYPE, INSTR, FUNC_NAME) \ + static inline TYPE FUNC_NAME(const void* const psrc) { \ + const uint8_t* const psrc_m = (const uint8_t*)psrc; \ + TYPE val_m; \ + asm volatile ( \ + "" #INSTR " %[val_m], %[psrc_m] \n\t" \ + : [val_m] "=r" (val_m) \ + : [psrc_m] "m" (*psrc_m)); \ + return val_m; \ + } + +#define MSA_LOAD(psrc, FUNC_NAME) FUNC_NAME(psrc) + +#define MSA_STORE_FUNC(TYPE, INSTR, FUNC_NAME) \ + static inline void FUNC_NAME(TYPE val, void* const pdst) { \ + uint8_t* const pdst_m = (uint8_t*)pdst; \ + TYPE val_m = val; \ + asm volatile ( \ + " " #INSTR " %[val_m], %[pdst_m] \n\t" \ + : [pdst_m] "=m" (*pdst_m) \ + : [val_m] "r" (val_m)); \ + } + +#define MSA_STORE(val, pdst, FUNC_NAME) FUNC_NAME(val, pdst) + +#if (__mips_isa_rev >= 6) + MSA_LOAD_FUNC(uint16_t, lh, msa_lh); + #define LH(psrc) MSA_LOAD(psrc, msa_lh) + MSA_LOAD_FUNC(uint32_t, lw, msa_lw); + #define LW(psrc) MSA_LOAD(psrc, msa_lw) + #if (__mips == 64) + MSA_LOAD_FUNC(uint64_t, ld, msa_ld); + #define LD(psrc) MSA_LOAD(psrc, msa_ld) + #else // !(__mips == 64) + #define LD(psrc) ((((uint64_t)MSA_LOAD(psrc + 4, msa_lw)) << 32) | \ + MSA_LOAD(psrc, msa_lw)) + #endif // (__mips == 64) + + MSA_STORE_FUNC(uint16_t, sh, msa_sh); + #define SH(val, pdst) MSA_STORE(val, pdst, msa_sh) + MSA_STORE_FUNC(uint32_t, sw, msa_sw); + #define SW(val, pdst) MSA_STORE(val, pdst, msa_sw) + MSA_STORE_FUNC(uint64_t, sd, msa_sd); + #define SD(val, pdst) MSA_STORE(val, pdst, msa_sd) +#else // !(__mips_isa_rev >= 6) + MSA_LOAD_FUNC(uint16_t, ulh, msa_ulh); + #define LH(psrc) MSA_LOAD(psrc, msa_ulh) + MSA_LOAD_FUNC(uint32_t, ulw, msa_ulw); + #define LW(psrc) MSA_LOAD(psrc, msa_ulw) + #if (__mips == 64) + MSA_LOAD_FUNC(uint64_t, uld, msa_uld); + #define LD(psrc) MSA_LOAD(psrc, msa_uld) + #else // !(__mips == 64) + #define LD(psrc) ((((uint64_t)MSA_LOAD(psrc + 4, msa_ulw)) << 32) | \ + MSA_LOAD(psrc, msa_ulw)) + #endif // (__mips == 64) + + MSA_STORE_FUNC(uint16_t, ush, msa_ush); + #define SH(val, pdst) MSA_STORE(val, pdst, msa_ush) + MSA_STORE_FUNC(uint32_t, usw, msa_usw); + #define SW(val, pdst) MSA_STORE(val, pdst, msa_usw) + #define SD(val, pdst) { \ + uint8_t* const pdst_sd_m = (uint8_t*)(pdst); \ + const uint32_t val0_m = (uint32_t)(val & 0x00000000FFFFFFFF); \ + const uint32_t val1_m = (uint32_t)((val >> 32) & 0x00000000FFFFFFFF); \ + SW(val0_m, pdst_sd_m); \ + SW(val1_m, pdst_sd_m + 4); \ + } +#endif // (__mips_isa_rev >= 6) + +/* Description : Load 4 words with stride + * Arguments : Inputs - psrc, stride + * Outputs - out0, out1, out2, out3 + * Details : Load word in 'out0' from (psrc) + * Load word in 'out1' from (psrc + stride) + * Load word in 'out2' from (psrc + 2 * stride) + * Load word in 'out3' from (psrc + 3 * stride) + */ +#define LW4(psrc, stride, out0, out1, out2, out3) { \ + const uint8_t* ptmp = (const uint8_t*)psrc; \ + out0 = LW(ptmp); \ + ptmp += stride; \ + out1 = LW(ptmp); \ + ptmp += stride; \ + out2 = LW(ptmp); \ + ptmp += stride; \ + out3 = LW(ptmp); \ +} + +/* Description : Store 4 words with stride + * Arguments : Inputs - in0, in1, in2, in3, pdst, stride + * Details : Store word from 'in0' to (pdst) + * Store word from 'in1' to (pdst + stride) + * Store word from 'in2' to (pdst + 2 * stride) + * Store word from 'in3' to (pdst + 3 * stride) + */ +#define SW4(in0, in1, in2, in3, pdst, stride) { \ + uint8_t* ptmp = (uint8_t*)pdst; \ + SW(in0, ptmp); \ + ptmp += stride; \ + SW(in1, ptmp); \ + ptmp += stride; \ + SW(in2, ptmp); \ + ptmp += stride; \ + SW(in3, ptmp); \ +} + +/* Description : Load vectors with 16 byte elements with stride + * Arguments : Inputs - psrc, stride + * Outputs - out0, out1 + * Return Type - as per RTYPE + * Details : Load 16 byte elements in 'out0' from (psrc) + * Load 16 byte elements in 'out1' from (psrc + stride) + */ +#define LD_B2(RTYPE, psrc, stride, out0, out1) { \ + out0 = LD_B(RTYPE, psrc); \ + out1 = LD_B(RTYPE, psrc + stride); \ +} +#define LD_UB2(...) LD_B2(v16u8, __VA_ARGS__) +#define LD_SB2(...) LD_B2(v16i8, __VA_ARGS__) + +#define LD_B4(RTYPE, psrc, stride, out0, out1, out2, out3) { \ + LD_B2(RTYPE, psrc, stride, out0, out1); \ + LD_B2(RTYPE, psrc + 2 * stride , stride, out2, out3); \ +} +#define LD_UB4(...) LD_B4(v16u8, __VA_ARGS__) +#define LD_SB4(...) LD_B4(v16i8, __VA_ARGS__) + +/* Description : Load vectors with 8 halfword elements with stride + * Arguments : Inputs - psrc, stride + * Outputs - out0, out1 + * Details : Load 8 halfword elements in 'out0' from (psrc) + * Load 8 halfword elements in 'out1' from (psrc + stride) + */ +#define LD_H2(RTYPE, psrc, stride, out0, out1) { \ + out0 = LD_H(RTYPE, psrc); \ + out1 = LD_H(RTYPE, psrc + stride); \ +} +#define LD_UH2(...) LD_H2(v8u16, __VA_ARGS__) +#define LD_SH2(...) LD_H2(v8i16, __VA_ARGS__) + +/* Description : Store 4x4 byte block to destination memory from input vector + * Arguments : Inputs - in0, in1, pdst, stride + * Details : 'Idx0' word element from input vector 'in0' is copied to the + * GP register and stored to (pdst) + * 'Idx1' word element from input vector 'in0' is copied to the + * GP register and stored to (pdst + stride) + * 'Idx2' word element from input vector 'in0' is copied to the + * GP register and stored to (pdst + 2 * stride) + * 'Idx3' word element from input vector 'in0' is copied to the + * GP register and stored to (pdst + 3 * stride) + */ +#define ST4x4_UB(in0, in1, idx0, idx1, idx2, idx3, pdst, stride) { \ + uint8_t* const pblk_4x4_m = (uint8_t*)pdst; \ + const uint32_t out0_m = __msa_copy_s_w((v4i32)in0, idx0); \ + const uint32_t out1_m = __msa_copy_s_w((v4i32)in0, idx1); \ + const uint32_t out2_m = __msa_copy_s_w((v4i32)in1, idx2); \ + const uint32_t out3_m = __msa_copy_s_w((v4i32)in1, idx3); \ + SW4(out0_m, out1_m, out2_m, out3_m, pblk_4x4_m, stride); \ +} + +/* Description : Immediate number of elements to slide + * Arguments : Inputs - in0, in1, slide_val + * Outputs - out + * Return Type - as per RTYPE + * Details : Byte elements from 'in1' vector are slid into 'in0' by + * value specified in the 'slide_val' + */ +#define SLDI_B(RTYPE, in0, in1, slide_val) \ + (RTYPE)__msa_sldi_b((v16i8)in0, (v16i8)in1, slide_val) \ + +#define SLDI_UB(...) SLDI_B(v16u8, __VA_ARGS__) +#define SLDI_SB(...) SLDI_B(v16i8, __VA_ARGS__) +#define SLDI_SH(...) SLDI_B(v8i16, __VA_ARGS__) + +/* Description : Shuffle halfword vector elements as per mask vector + * Arguments : Inputs - in0, in1, in2, in3, mask0, mask1 + * Outputs - out0, out1 + * Return Type - as per RTYPE + * Details : halfword elements from 'in0' & 'in1' are copied selectively to + * 'out0' as per control vector 'mask0' + */ +#define VSHF_H2(RTYPE, in0, in1, in2, in3, mask0, mask1, out0, out1) { \ + out0 = (RTYPE)__msa_vshf_h((v8i16)mask0, (v8i16)in1, (v8i16)in0); \ + out1 = (RTYPE)__msa_vshf_h((v8i16)mask1, (v8i16)in3, (v8i16)in2); \ +} +#define VSHF_H2_UH(...) VSHF_H2(v8u16, __VA_ARGS__) +#define VSHF_H2_SH(...) VSHF_H2(v8i16, __VA_ARGS__) + +/* Description : Clips all signed halfword elements of input vector + * between 0 & 255 + * Arguments : Input/output - val + * Return Type - signed halfword + */ +#define CLIP_SH_0_255(val) { \ + const v8i16 max_m = __msa_ldi_h(255); \ + val = __msa_maxi_s_h((v8i16)val, 0); \ + val = __msa_min_s_h(max_m, (v8i16)val); \ +} +#define CLIP_SH2_0_255(in0, in1) { \ + CLIP_SH_0_255(in0); \ + CLIP_SH_0_255(in1); \ +} + +/* Description : Clips all signed word elements of input vector + * between 0 & 255 + * Arguments : Input/output - val + * Return Type - signed word + */ +#define CLIP_SW_0_255(val) { \ + const v4i32 max_m = __msa_ldi_w(255); \ + val = __msa_maxi_s_w((v4i32)val, 0); \ + val = __msa_min_s_w(max_m, (v4i32)val); \ +} +#define CLIP_SW4_0_255(in0, in1, in2, in3) { \ + CLIP_SW_0_255(in0); \ + CLIP_SW_0_255(in1); \ + CLIP_SW_0_255(in2); \ + CLIP_SW_0_255(in3); \ +} + +/* Description : Set element n input vector to GPR value + * Arguments : Inputs - in0, in1, in2, in3 + * Output - out + * Return Type - as per RTYPE + * Details : Set element 0 in vector 'out' to value specified in 'in0' + */ +#define INSERT_W2(RTYPE, in0, in1, out) { \ + out = (RTYPE)__msa_insert_w((v4i32)out, 0, in0); \ + out = (RTYPE)__msa_insert_w((v4i32)out, 1, in1); \ +} +#define INSERT_W2_UB(...) INSERT_W2(v16u8, __VA_ARGS__) +#define INSERT_W2_SB(...) INSERT_W2(v16i8, __VA_ARGS__) + +#define INSERT_W4(RTYPE, in0, in1, in2, in3, out) { \ + out = (RTYPE)__msa_insert_w((v4i32)out, 0, in0); \ + out = (RTYPE)__msa_insert_w((v4i32)out, 1, in1); \ + out = (RTYPE)__msa_insert_w((v4i32)out, 2, in2); \ + out = (RTYPE)__msa_insert_w((v4i32)out, 3, in3); \ +} +#define INSERT_W4_UB(...) INSERT_W4(v16u8, __VA_ARGS__) +#define INSERT_W4_SB(...) INSERT_W4(v16i8, __VA_ARGS__) +#define INSERT_W4_SW(...) INSERT_W4(v4i32, __VA_ARGS__) + +/* Description : Interleave right half of byte elements from vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1 + * Return Type - as per RTYPE + * Details : Right half of byte elements of 'in0' and 'in1' are interleaved + * and written to out0. + */ +#define ILVR_B2(RTYPE, in0, in1, in2, in3, out0, out1) { \ + out0 = (RTYPE)__msa_ilvr_b((v16i8)in0, (v16i8)in1); \ + out1 = (RTYPE)__msa_ilvr_b((v16i8)in2, (v16i8)in3); \ +} +#define ILVR_B2_UB(...) ILVR_B2(v16u8, __VA_ARGS__) +#define ILVR_B2_SB(...) ILVR_B2(v16i8, __VA_ARGS__) +#define ILVR_B2_UH(...) ILVR_B2(v8u16, __VA_ARGS__) +#define ILVR_B2_SH(...) ILVR_B2(v8i16, __VA_ARGS__) +#define ILVR_B2_SW(...) ILVR_B2(v4i32, __VA_ARGS__) + +#define ILVR_B4(RTYPE, in0, in1, in2, in3, in4, in5, in6, in7, \ + out0, out1, out2, out3) { \ + ILVR_B2(RTYPE, in0, in1, in2, in3, out0, out1); \ + ILVR_B2(RTYPE, in4, in5, in6, in7, out2, out3); \ +} +#define ILVR_B4_UB(...) ILVR_B4(v16u8, __VA_ARGS__) +#define ILVR_B4_SB(...) ILVR_B4(v16i8, __VA_ARGS__) +#define ILVR_B4_UH(...) ILVR_B4(v8u16, __VA_ARGS__) +#define ILVR_B4_SH(...) ILVR_B4(v8i16, __VA_ARGS__) +#define ILVR_B4_SW(...) ILVR_B4(v4i32, __VA_ARGS__) + +/* Description : Interleave right half of halfword elements from vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1 + * Return Type - as per RTYPE + * Details : Right half of halfword elements of 'in0' and 'in1' are + * interleaved and written to 'out0'. + */ +#define ILVR_H2(RTYPE, in0, in1, in2, in3, out0, out1) { \ + out0 = (RTYPE)__msa_ilvr_h((v8i16)in0, (v8i16)in1); \ + out1 = (RTYPE)__msa_ilvr_h((v8i16)in2, (v8i16)in3); \ +} +#define ILVR_H2_UB(...) ILVR_H2(v16u8, __VA_ARGS__) +#define ILVR_H2_SH(...) ILVR_H2(v8i16, __VA_ARGS__) +#define ILVR_H2_SW(...) ILVR_H2(v4i32, __VA_ARGS__) + +#define ILVR_H4(RTYPE, in0, in1, in2, in3, in4, in5, in6, in7, \ + out0, out1, out2, out3) { \ + ILVR_H2(RTYPE, in0, in1, in2, in3, out0, out1); \ + ILVR_H2(RTYPE, in4, in5, in6, in7, out2, out3); \ +} +#define ILVR_H4_UB(...) ILVR_H4(v16u8, __VA_ARGS__) +#define ILVR_H4_SH(...) ILVR_H4(v8i16, __VA_ARGS__) +#define ILVR_H4_SW(...) ILVR_H4(v4i32, __VA_ARGS__) + +/* Description : Interleave right half of double word elements from vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1 + * Return Type - as per RTYPE + * Details : Right half of double word elements of 'in0' and 'in1' are + * interleaved and written to 'out0'. + */ +#define ILVR_D2(RTYPE, in0, in1, in2, in3, out0, out1) { \ + out0 = (RTYPE)__msa_ilvr_d((v2i64)in0, (v2i64)in1); \ + out1 = (RTYPE)__msa_ilvr_d((v2i64)in2, (v2i64)in3); \ +} +#define ILVR_D2_UB(...) ILVR_D2(v16u8, __VA_ARGS__) +#define ILVR_D2_SB(...) ILVR_D2(v16i8, __VA_ARGS__) +#define ILVR_D2_SH(...) ILVR_D2(v8i16, __VA_ARGS__) + +#define ILVRL_H2(RTYPE, in0, in1, out0, out1) { \ + out0 = (RTYPE)__msa_ilvr_h((v8i16)in0, (v8i16)in1); \ + out1 = (RTYPE)__msa_ilvl_h((v8i16)in0, (v8i16)in1); \ +} +#define ILVRL_H2_UB(...) ILVRL_H2(v16u8, __VA_ARGS__) +#define ILVRL_H2_SB(...) ILVRL_H2(v16i8, __VA_ARGS__) +#define ILVRL_H2_SH(...) ILVRL_H2(v8i16, __VA_ARGS__) +#define ILVRL_H2_SW(...) ILVRL_H2(v4i32, __VA_ARGS__) +#define ILVRL_H2_UW(...) ILVRL_H2(v4u32, __VA_ARGS__) + +#define ILVRL_W2(RTYPE, in0, in1, out0, out1) { \ + out0 = (RTYPE)__msa_ilvr_w((v4i32)in0, (v4i32)in1); \ + out1 = (RTYPE)__msa_ilvl_w((v4i32)in0, (v4i32)in1); \ +} +#define ILVRL_W2_UB(...) ILVRL_W2(v16u8, __VA_ARGS__) +#define ILVRL_W2_SH(...) ILVRL_W2(v8i16, __VA_ARGS__) +#define ILVRL_W2_SW(...) ILVRL_W2(v4i32, __VA_ARGS__) + +/* Description : Pack even byte elements of vector pairs + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1 + * Return Type - as per RTYPE + * Details : Even byte elements of 'in0' are copied to the left half of + * 'out0' & even byte elements of 'in1' are copied to the right + * half of 'out0'. + */ +#define PCKEV_B2(RTYPE, in0, in1, in2, in3, out0, out1) { \ + out0 = (RTYPE)__msa_pckev_b((v16i8)in0, (v16i8)in1); \ + out1 = (RTYPE)__msa_pckev_b((v16i8)in2, (v16i8)in3); \ +} +#define PCKEV_B2_SB(...) PCKEV_B2(v16i8, __VA_ARGS__) +#define PCKEV_B2_UB(...) PCKEV_B2(v16u8, __VA_ARGS__) +#define PCKEV_B2_SH(...) PCKEV_B2(v8i16, __VA_ARGS__) +#define PCKEV_B2_SW(...) PCKEV_B2(v4i32, __VA_ARGS__) + +/* Description : Arithmetic immediate shift right all elements of word vector + * Arguments : Inputs - in0, in1, shift + * Outputs - in place operation + * Return Type - as per input vector RTYPE + * Details : Each element of vector 'in0' is right shifted by 'shift' and + * the result is written in-place. 'shift' is a GP variable. + */ +#define SRAI_W2(RTYPE, in0, in1, shift_val) { \ + in0 = (RTYPE)SRAI_W(in0, shift_val); \ + in1 = (RTYPE)SRAI_W(in1, shift_val); \ +} +#define SRAI_W2_SW(...) SRAI_W2(v4i32, __VA_ARGS__) +#define SRAI_W2_UW(...) SRAI_W2(v4u32, __VA_ARGS__) + +#define SRAI_W4(RTYPE, in0, in1, in2, in3, shift_val) { \ + SRAI_W2(RTYPE, in0, in1, shift_val); \ + SRAI_W2(RTYPE, in2, in3, shift_val); \ +} +#define SRAI_W4_SW(...) SRAI_W4(v4i32, __VA_ARGS__) +#define SRAI_W4_UW(...) SRAI_W4(v4u32, __VA_ARGS__) + +/* Description : Arithmetic shift right all elements of half-word vector + * Arguments : Inputs - in0, in1, shift + * Outputs - in place operation + * Return Type - as per input vector RTYPE + * Details : Each element of vector 'in0' is right shifted by 'shift' and + * the result is written in-place. 'shift' is a GP variable. + */ +#define SRAI_H2(RTYPE, in0, in1, shift_val) { \ + in0 = (RTYPE)SRAI_H(in0, shift_val); \ + in1 = (RTYPE)SRAI_H(in1, shift_val); \ +} +#define SRAI_H2_SH(...) SRAI_H2(v8i16, __VA_ARGS__) +#define SRAI_H2_UH(...) SRAI_H2(v8u16, __VA_ARGS__) + +/* Description : Arithmetic rounded shift right all elements of word vector + * Arguments : Inputs - in0, in1, shift + * Outputs - in place operation + * Return Type - as per input vector RTYPE + * Details : Each element of vector 'in0' is right shifted by 'shift' and + * the result is written in-place. 'shift' is a GP variable. + */ +#define SRARI_W2(RTYPE, in0, in1, shift) { \ + in0 = (RTYPE)__msa_srari_w((v4i32)in0, shift); \ + in1 = (RTYPE)__msa_srari_w((v4i32)in1, shift); \ +} +#define SRARI_W2_SW(...) SRARI_W2(v4i32, __VA_ARGS__) + +#define SRARI_W4(RTYPE, in0, in1, in2, in3, shift) { \ + SRARI_W2(RTYPE, in0, in1, shift); \ + SRARI_W2(RTYPE, in2, in3, shift); \ +} +#define SRARI_W4_SH(...) SRARI_W4(v8i16, __VA_ARGS__) +#define SRARI_W4_UW(...) SRARI_W4(v4u32, __VA_ARGS__) +#define SRARI_W4_SW(...) SRARI_W4(v4i32, __VA_ARGS__) + +/* Description : Addition of 2 pairs of half-word vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1 + * Details : Each element in 'in0' is added to 'in1' and result is written + * to 'out0'. + */ +#define ADDVI_H2(RTYPE, in0, in1, in2, in3, out0, out1) { \ + out0 = (RTYPE)ADDVI_H(in0, in1); \ + out1 = (RTYPE)ADDVI_H(in2, in3); \ +} +#define ADDVI_H2_SH(...) ADDVI_H2(v8i16, __VA_ARGS__) +#define ADDVI_H2_UH(...) ADDVI_H2(v8u16, __VA_ARGS__) + +/* Description : Addition of 2 pairs of vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1 + * Details : Each element in 'in0' is added to 'in1' and result is written + * to 'out0'. + */ +#define ADD2(in0, in1, in2, in3, out0, out1) { \ + out0 = in0 + in1; \ + out1 = in2 + in3; \ +} +#define ADD4(in0, in1, in2, in3, in4, in5, in6, in7, \ + out0, out1, out2, out3) { \ + ADD2(in0, in1, in2, in3, out0, out1); \ + ADD2(in4, in5, in6, in7, out2, out3); \ +} + +/* Description : Sign extend halfword elements from input vector and return + * the result in pair of vectors + * Arguments : Input - in (halfword vector) + * Outputs - out0, out1 (sign extended word vectors) + * Return Type - signed word + * Details : Sign bit of halfword elements from input vector 'in' is + * extracted and interleaved right with same vector 'in0' to + * generate 4 signed word elements in 'out0' + * Then interleaved left with same vector 'in0' to + * generate 4 signed word elements in 'out1' + */ +#define UNPCK_SH_SW(in, out0, out1) { \ + const v8i16 tmp_m = __msa_clti_s_h((v8i16)in, 0); \ + ILVRL_H2_SW(tmp_m, in, out0, out1); \ +} + +/* Description : Butterfly of 4 input vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1, out2, out3 + * Details : Butterfly operation + */ +#define BUTTERFLY_4(in0, in1, in2, in3, out0, out1, out2, out3) { \ + out0 = in0 + in3; \ + out1 = in1 + in2; \ + out2 = in1 - in2; \ + out3 = in0 - in3; \ +} + +/* Description : Transpose 4x4 block with word elements in vectors + * Arguments : Inputs - in0, in1, in2, in3 + * Outputs - out0, out1, out2, out3 + * Return Type - as per RTYPE + */ +#define TRANSPOSE4x4_W(RTYPE, in0, in1, in2, in3, out0, out1, out2, out3) { \ + v4i32 s0_m, s1_m, s2_m, s3_m; \ + ILVRL_W2_SW(in1, in0, s0_m, s1_m); \ + ILVRL_W2_SW(in3, in2, s2_m, s3_m); \ + out0 = (RTYPE)__msa_ilvr_d((v2i64)s2_m, (v2i64)s0_m); \ + out1 = (RTYPE)__msa_ilvl_d((v2i64)s2_m, (v2i64)s0_m); \ + out2 = (RTYPE)__msa_ilvr_d((v2i64)s3_m, (v2i64)s1_m); \ + out3 = (RTYPE)__msa_ilvl_d((v2i64)s3_m, (v2i64)s1_m); \ +} +#define TRANSPOSE4x4_SW_SW(...) TRANSPOSE4x4_W(v4i32, __VA_ARGS__) + +/* Description : Add block 4x4 + * Arguments : Inputs - in0, in1, in2, in3, pdst, stride + * Details : Least significant 4 bytes from each input vector are added to + * the destination bytes, clipped between 0-255 and stored. + */ +#define ADDBLK_ST4x4_UB(in0, in1, in2, in3, pdst, stride) { \ + uint32_t src0_m, src1_m, src2_m, src3_m; \ + v8i16 inp0_m, inp1_m, res0_m, res1_m; \ + v16i8 dst0_m = { 0 }; \ + v16i8 dst1_m = { 0 }; \ + const v16i8 zero_m = { 0 }; \ + ILVR_D2_SH(in1, in0, in3, in2, inp0_m, inp1_m); \ + LW4(pdst, stride, src0_m, src1_m, src2_m, src3_m); \ + INSERT_W2_SB(src0_m, src1_m, dst0_m); \ + INSERT_W2_SB(src2_m, src3_m, dst1_m); \ + ILVR_B2_SH(zero_m, dst0_m, zero_m, dst1_m, res0_m, res1_m); \ + ADD2(res0_m, inp0_m, res1_m, inp1_m, res0_m, res1_m); \ + CLIP_SH2_0_255(res0_m, res1_m); \ + PCKEV_B2_SB(res0_m, res0_m, res1_m, res1_m, dst0_m, dst1_m); \ + ST4x4_UB(dst0_m, dst1_m, 0, 1, 0, 1, pdst, stride); \ +} + +#endif /* WEBP_DSP_MSA_MACRO_H_ */ diff --git a/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c b/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c index 5ea4ddb..5b97028 100644 --- a/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c +++ b/src/3rdparty/libwebp/src/dsp/rescaler_sse2.c @@ -18,6 +18,7 @@ #include <assert.h> #include "../utils/rescaler.h" +#include "../utils/utils.h" //------------------------------------------------------------------------------ // Implementations of critical functions ImportRow / ExportRow diff --git a/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c index d4ccbe0..ed2eb74 100644 --- a/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c +++ b/src/3rdparty/libwebp/src/dsp/upsampling_mips_dsp_r2.c @@ -14,9 +14,7 @@ #include "./dsp.h" -// Code is disabled for now, in favor of the plain-C version -// TODO(djordje.pesut): adapt the code to reflect the C-version. -#if 0 // defined(WEBP_USE_MIPS_DSP_R2) +#if defined(WEBP_USE_MIPS_DSP_R2) #include <assert.h> #include "./yuv.h" @@ -24,21 +22,21 @@ #if !defined(WEBP_YUV_USE_TABLE) #define YUV_TO_RGB(Y, U, V, R, G, B) do { \ - const int t1 = kYScale * Y; \ - const int t2 = kVToG * V; \ - R = kVToR * V; \ - G = kUToG * U; \ - B = kUToB * U; \ + const int t1 = MultHi(Y, 19077); \ + const int t2 = MultHi(V, 13320); \ + R = MultHi(V, 26149); \ + G = MultHi(U, 6419); \ + B = MultHi(U, 33050); \ R = t1 + R; \ G = t1 - G; \ B = t1 + B; \ - R = R + kRCst; \ - G = G - t2 + kGCst; \ - B = B + kBCst; \ + R = R - 14234; \ + G = G - t2 + 8708; \ + B = B - 17685; \ __asm__ volatile ( \ - "shll_s.w %[" #R "], %[" #R "], 9 \n\t" \ - "shll_s.w %[" #G "], %[" #G "], 9 \n\t" \ - "shll_s.w %[" #B "], %[" #B "], 9 \n\t" \ + "shll_s.w %[" #R "], %[" #R "], 17 \n\t" \ + "shll_s.w %[" #G "], %[" #G "], 17 \n\t" \ + "shll_s.w %[" #B "], %[" #B "], 17 \n\t" \ "precrqu_s.qb.ph %[" #R "], %[" #R "], $zero \n\t" \ "precrqu_s.qb.ph %[" #G "], %[" #G "], $zero \n\t" \ "precrqu_s.qb.ph %[" #B "], %[" #B "], $zero \n\t" \ @@ -279,6 +277,6 @@ WEBP_DSP_INIT_STUB(WebPInitYUV444ConvertersMIPSdspR2) #endif // WEBP_USE_MIPS_DSP_R2 -#if 1 // !(defined(FANCY_UPSAMPLING) && defined(WEBP_USE_MIPS_DSP_R2)) +#if !(defined(FANCY_UPSAMPLING) && defined(WEBP_USE_MIPS_DSP_R2)) WEBP_DSP_INIT_STUB(WebPInitUpsamplersMIPSdspR2) #endif diff --git a/src/3rdparty/libwebp/src/dsp/yuv_mips32.c b/src/3rdparty/libwebp/src/dsp/yuv_mips32.c index b8fe512..e61aac5 100644 --- a/src/3rdparty/libwebp/src/dsp/yuv_mips32.c +++ b/src/3rdparty/libwebp/src/dsp/yuv_mips32.c @@ -14,8 +14,7 @@ #include "./dsp.h" -// Code is disabled for now, in favor of the plain-C version -#if 0 // defined(WEBP_USE_MIPS32) +#if defined(WEBP_USE_MIPS32) #include "./yuv.h" @@ -29,19 +28,19 @@ static void FUNC_NAME(const uint8_t* y, \ int i, r, g, b; \ int temp0, temp1, temp2, temp3, temp4; \ for (i = 0; i < (len >> 1); i++) { \ - temp1 = kVToR * v[0]; \ - temp3 = kVToG * v[0]; \ - temp2 = kUToG * u[0]; \ - temp4 = kUToB * u[0]; \ - temp0 = kYScale * y[0]; \ - temp1 += kRCst; \ - temp3 -= kGCst; \ + temp1 = MultHi(v[0], 26149); \ + temp3 = MultHi(v[0], 13320); \ + temp2 = MultHi(u[0], 6419); \ + temp4 = MultHi(u[0], 33050); \ + temp0 = MultHi(y[0], 19077); \ + temp1 -= 14234; \ + temp3 -= 8708; \ temp2 += temp3; \ - temp4 += kBCst; \ + temp4 -= 17685; \ r = VP8Clip8(temp0 + temp1); \ g = VP8Clip8(temp0 - temp2); \ b = VP8Clip8(temp0 + temp4); \ - temp0 = kYScale * y[1]; \ + temp0 = MultHi(y[1], 19077); \ dst[R] = r; \ dst[G] = g; \ dst[B] = b; \ @@ -59,15 +58,15 @@ static void FUNC_NAME(const uint8_t* y, \ dst += 2 * XSTEP; \ } \ if (len & 1) { \ - temp1 = kVToR * v[0]; \ - temp3 = kVToG * v[0]; \ - temp2 = kUToG * u[0]; \ - temp4 = kUToB * u[0]; \ - temp0 = kYScale * y[0]; \ - temp1 += kRCst; \ - temp3 -= kGCst; \ + temp1 = MultHi(v[0], 26149); \ + temp3 = MultHi(v[0], 13320); \ + temp2 = MultHi(u[0], 6419); \ + temp4 = MultHi(u[0], 33050); \ + temp0 = MultHi(y[0], 19077); \ + temp1 -= 14234; \ + temp3 -= 8708; \ temp2 += temp3; \ - temp4 += kBCst; \ + temp4 -= 17685; \ r = VP8Clip8(temp0 + temp1); \ g = VP8Clip8(temp0 - temp2); \ b = VP8Clip8(temp0 + temp4); \ diff --git a/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c b/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c index dea0fdb..1720d41 100644 --- a/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c +++ b/src/3rdparty/libwebp/src/dsp/yuv_mips_dsp_r2.c @@ -14,8 +14,7 @@ #include "./dsp.h" -// Code is disabled for now, in favor of the plain-C version -#if 0 // defined(WEBP_USE_MIPS_DSP_R2) +#if defined(WEBP_USE_MIPS_DSP_R2) #include "./yuv.h" @@ -31,10 +30,10 @@ "mul %[temp2], %[t_con_3], %[temp4] \n\t" \ "mul %[temp4], %[t_con_4], %[temp4] \n\t" \ "mul %[temp0], %[t_con_5], %[temp0] \n\t" \ - "addu %[temp1], %[temp1], %[t_con_6] \n\t" \ + "subu %[temp1], %[temp1], %[t_con_6] \n\t" \ "subu %[temp3], %[temp3], %[t_con_7] \n\t" \ "addu %[temp2], %[temp2], %[temp3] \n\t" \ - "addu %[temp4], %[temp4], %[t_con_8] \n\t" \ + "subu %[temp4], %[temp4], %[t_con_8] \n\t" \ #define ROW_FUNC_PART_2(R, G, B, K) \ "addu %[temp5], %[temp0], %[temp1] \n\t" \ @@ -43,12 +42,12 @@ ".if " #K " \n\t" \ "lbu %[temp0], 1(%[y]) \n\t" \ ".endif \n\t" \ - "shll_s.w %[temp5], %[temp5], 9 \n\t" \ - "shll_s.w %[temp6], %[temp6], 9 \n\t" \ + "shll_s.w %[temp5], %[temp5], 17 \n\t" \ + "shll_s.w %[temp6], %[temp6], 17 \n\t" \ ".if " #K " \n\t" \ "mul %[temp0], %[t_con_5], %[temp0] \n\t" \ ".endif \n\t" \ - "shll_s.w %[temp7], %[temp7], 9 \n\t" \ + "shll_s.w %[temp7], %[temp7], 17 \n\t" \ "precrqu_s.qb.ph %[temp5], %[temp5], $zero \n\t" \ "precrqu_s.qb.ph %[temp6], %[temp6], $zero \n\t" \ "precrqu_s.qb.ph %[temp7], %[temp7], $zero \n\t" \ @@ -75,14 +74,14 @@ static void FUNC_NAME(const uint8_t* y, \ uint8_t* dst, int len) { \ int i; \ uint32_t temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7; \ - const int t_con_1 = kVToR; \ - const int t_con_2 = kVToG; \ - const int t_con_3 = kUToG; \ - const int t_con_4 = kUToB; \ - const int t_con_5 = kYScale; \ - const int t_con_6 = kRCst; \ - const int t_con_7 = kGCst; \ - const int t_con_8 = kBCst; \ + const int t_con_1 = 26149; \ + const int t_con_2 = 13320; \ + const int t_con_3 = 6419; \ + const int t_con_4 = 33050; \ + const int t_con_5 = 19077; \ + const int t_con_6 = 14234; \ + const int t_con_7 = 8708; \ + const int t_con_8 = 17685; \ for (i = 0; i < (len >> 1); i++) { \ __asm__ volatile ( \ ROW_FUNC_PART_1() \ diff --git a/src/3rdparty/libwebp/src/dsp/yuv_sse2.c b/src/3rdparty/libwebp/src/dsp/yuv_sse2.c index f72fe32..e19bddf 100644 --- a/src/3rdparty/libwebp/src/dsp/yuv_sse2.c +++ b/src/3rdparty/libwebp/src/dsp/yuv_sse2.c @@ -33,7 +33,8 @@ static void ConvertYUV444ToRGB(const __m128i* const Y0, const __m128i k19077 = _mm_set1_epi16(19077); const __m128i k26149 = _mm_set1_epi16(26149); const __m128i k14234 = _mm_set1_epi16(14234); - const __m128i k33050 = _mm_set1_epi16(33050); + // 33050 doesn't fit in a signed short: only use this with unsigned arithmetic + const __m128i k33050 = _mm_set1_epi16((short)33050); const __m128i k17685 = _mm_set1_epi16(17685); const __m128i k6419 = _mm_set1_epi16(6419); const __m128i k13320 = _mm_set1_epi16(13320); diff --git a/src/3rdparty/libwebp/src/enc/alpha.c b/src/3rdparty/libwebp/src/enc/alpha.c index 3c970b0..03e3ad0 100644 --- a/src/3rdparty/libwebp/src/enc/alpha.c +++ b/src/3rdparty/libwebp/src/enc/alpha.c @@ -79,7 +79,11 @@ static int EncodeLossless(const uint8_t* const data, int width, int height, config.quality = 8.f * effort_level; assert(config.quality >= 0 && config.quality <= 100.f); - ok = (VP8LEncodeStream(&config, &picture, bw) == VP8_ENC_OK); + // TODO(urvang): Temporary fix to avoid generating images that trigger + // a decoder bug related to alpha with color cache. + // See: https://code.google.com/p/webp/issues/detail?id=239 + // Need to re-enable this later. + ok = (VP8LEncodeStream(&config, &picture, bw, 0 /*use_cache*/) == VP8_ENC_OK); WebPPictureFree(&picture); ok = ok && !bw->error_; if (!ok) { @@ -118,7 +122,6 @@ static int EncodeAlphaInternal(const uint8_t* const data, int width, int height, assert(method >= ALPHA_NO_COMPRESSION); assert(method <= ALPHA_LOSSLESS_COMPRESSION); assert(sizeof(header) == ALPHA_HEADER_LEN); - // TODO(skal): have a common function and #define's to validate alpha params. filter_func = WebPFilters[filter]; if (filter_func != NULL) { diff --git a/src/3rdparty/libwebp/src/enc/backward_references.c b/src/3rdparty/libwebp/src/enc/backward_references.c index c39437d..136a24a 100644 --- a/src/3rdparty/libwebp/src/enc/backward_references.c +++ b/src/3rdparty/libwebp/src/enc/backward_references.c @@ -27,11 +27,19 @@ #define MAX_ENTROPY (1e30f) // 1M window (4M bytes) minus 120 special codes for short distances. -#define WINDOW_SIZE ((1 << 20) - 120) +#define WINDOW_SIZE_BITS 20 +#define WINDOW_SIZE ((1 << WINDOW_SIZE_BITS) - 120) // Bounds for the match length. #define MIN_LENGTH 2 -#define MAX_LENGTH 4096 +// If you change this, you need MAX_LENGTH_BITS + WINDOW_SIZE_BITS <= 32 as it +// is used in VP8LHashChain. +#define MAX_LENGTH_BITS 12 +// We want the max value to be attainable and stored in MAX_LENGTH_BITS bits. +#define MAX_LENGTH ((1 << MAX_LENGTH_BITS) - 1) +#if MAX_LENGTH_BITS + WINDOW_SIZE_BITS > 32 +#error "MAX_LENGTH_BITS + WINDOW_SIZE_BITS > 32" +#endif // ----------------------------------------------------------------------------- @@ -57,32 +65,19 @@ static int DistanceToPlaneCode(int xsize, int dist) { return dist + 120; } -// Returns the exact index where array1 and array2 are different if this -// index is strictly superior to best_len_match. Otherwise, it returns 0. +// Returns the exact index where array1 and array2 are different. For an index +// inferior or equal to best_len_match, the return value just has to be strictly +// inferior to best_len_match. The current behavior is to return 0 if this index +// is best_len_match, and the index itself otherwise. // If no two elements are the same, it returns max_limit. static WEBP_INLINE int FindMatchLength(const uint32_t* const array1, const uint32_t* const array2, - int best_len_match, - int max_limit) { - int match_len; - + int best_len_match, int max_limit) { // Before 'expensive' linear match, check if the two arrays match at the // current best length index. if (array1[best_len_match] != array2[best_len_match]) return 0; -#if defined(WEBP_USE_SSE2) - // Check if anything is different up to best_len_match excluded. - // memcmp seems to be slower on ARM so it is disabled for now. - if (memcmp(array1, array2, best_len_match * sizeof(*array1))) return 0; - match_len = best_len_match + 1; -#else - match_len = 0; -#endif - - while (match_len < max_limit && array1[match_len] == array2[match_len]) { - ++match_len; - } - return match_len; + return VP8LVectorMismatch(array1, array2, max_limit); } // ----------------------------------------------------------------------------- @@ -194,31 +189,24 @@ int VP8LBackwardRefsCopy(const VP8LBackwardRefs* const src, // ----------------------------------------------------------------------------- // Hash chains -// initialize as empty -static void HashChainReset(VP8LHashChain* const p) { - assert(p != NULL); - // Set the int32_t arrays to -1. - memset(p->chain_, 0xff, p->size_ * sizeof(*p->chain_)); - memset(p->hash_to_first_index_, 0xff, - HASH_SIZE * sizeof(*p->hash_to_first_index_)); -} - int VP8LHashChainInit(VP8LHashChain* const p, int size) { assert(p->size_ == 0); - assert(p->chain_ == NULL); + assert(p->offset_length_ == NULL); assert(size > 0); - p->chain_ = (int*)WebPSafeMalloc(size, sizeof(*p->chain_)); - if (p->chain_ == NULL) return 0; + p->offset_length_ = + (uint32_t*)WebPSafeMalloc(size, sizeof(*p->offset_length_)); + if (p->offset_length_ == NULL) return 0; p->size_ = size; - HashChainReset(p); + return 1; } void VP8LHashChainClear(VP8LHashChain* const p) { assert(p != NULL); - WebPSafeFree(p->chain_); + WebPSafeFree(p->offset_length_); + p->size_ = 0; - p->chain_ = NULL; + p->offset_length_ = NULL; } // ----------------------------------------------------------------------------- @@ -234,18 +222,10 @@ static WEBP_INLINE uint32_t GetPixPairHash64(const uint32_t* const argb) { return key; } -// Insertion of two pixels at a time. -static void HashChainInsert(VP8LHashChain* const p, - const uint32_t* const argb, int pos) { - const uint32_t hash_code = GetPixPairHash64(argb); - p->chain_[pos] = p->hash_to_first_index_[hash_code]; - p->hash_to_first_index_[hash_code] = pos; -} - // Returns the maximum number of hash chain lookups to do for a -// given compression quality. Return value in range [6, 86]. -static int GetMaxItersForQuality(int quality, int low_effort) { - return (low_effort ? 6 : 8) + (quality * quality) / 128; +// given compression quality. Return value in range [8, 86]. +static int GetMaxItersForQuality(int quality) { + return 8 + (quality * quality) / 128; } static int GetWindowSizeForHashChain(int quality, int xsize) { @@ -261,63 +241,120 @@ static WEBP_INLINE int MaxFindCopyLength(int len) { return (len < MAX_LENGTH) ? len : MAX_LENGTH; } -static void HashChainFindOffset(const VP8LHashChain* const p, int base_position, - const uint32_t* const argb, int len, - int window_size, int* const distance_ptr) { - const uint32_t* const argb_start = argb + base_position; - const int min_pos = - (base_position > window_size) ? base_position - window_size : 0; +int VP8LHashChainFill(VP8LHashChain* const p, int quality, + const uint32_t* const argb, int xsize, int ysize) { + const int size = xsize * ysize; + const int iter_max = GetMaxItersForQuality(quality); + const int iter_min = iter_max - quality / 10; + const uint32_t window_size = GetWindowSizeForHashChain(quality, xsize); int pos; - assert(len <= MAX_LENGTH); - for (pos = p->hash_to_first_index_[GetPixPairHash64(argb_start)]; - pos >= min_pos; - pos = p->chain_[pos]) { - const int curr_length = - FindMatchLength(argb + pos, argb_start, len - 1, len); - if (curr_length == len) break; + uint32_t base_position; + int32_t* hash_to_first_index; + // Temporarily use the p->offset_length_ as a hash chain. + int32_t* chain = (int32_t*)p->offset_length_; + assert(p->size_ != 0); + assert(p->offset_length_ != NULL); + + hash_to_first_index = + (int32_t*)WebPSafeMalloc(HASH_SIZE, sizeof(*hash_to_first_index)); + if (hash_to_first_index == NULL) return 0; + + // Set the int32_t array to -1. + memset(hash_to_first_index, 0xff, HASH_SIZE * sizeof(*hash_to_first_index)); + // Fill the chain linking pixels with the same hash. + for (pos = 0; pos < size - 1; ++pos) { + const uint32_t hash_code = GetPixPairHash64(argb + pos); + chain[pos] = hash_to_first_index[hash_code]; + hash_to_first_index[hash_code] = pos; } - *distance_ptr = base_position - pos; -} - -static int HashChainFindCopy(const VP8LHashChain* const p, - int base_position, - const uint32_t* const argb, int max_len, - int window_size, int iter_max, - int* const distance_ptr, - int* const length_ptr) { - const uint32_t* const argb_start = argb + base_position; - int iter = iter_max; - int best_length = 0; - int best_distance = 0; - const int min_pos = - (base_position > window_size) ? base_position - window_size : 0; - int pos; - int length_max = 256; - if (max_len < length_max) { - length_max = max_len; - } - for (pos = p->hash_to_first_index_[GetPixPairHash64(argb_start)]; - pos >= min_pos; - pos = p->chain_[pos]) { - int curr_length; - int distance; - if (--iter < 0) { - break; + WebPSafeFree(hash_to_first_index); + + // Find the best match interval at each pixel, defined by an offset to the + // pixel and a length. The right-most pixel cannot match anything to the right + // (hence a best length of 0) and the left-most pixel nothing to the left + // (hence an offset of 0). + p->offset_length_[0] = p->offset_length_[size - 1] = 0; + for (base_position = size - 2 < 0 ? 0 : size - 2; base_position > 0;) { + const int max_len = MaxFindCopyLength(size - 1 - base_position); + const uint32_t* const argb_start = argb + base_position; + int iter = iter_max; + int best_length = 0; + uint32_t best_distance = 0; + const int min_pos = + (base_position > window_size) ? base_position - window_size : 0; + const int length_max = (max_len < 256) ? max_len : 256; + uint32_t max_base_position; + + for (pos = chain[base_position]; pos >= min_pos; pos = chain[pos]) { + int curr_length; + if (--iter < 0) { + break; + } + assert(base_position > (uint32_t)pos); + + curr_length = + FindMatchLength(argb + pos, argb_start, best_length, max_len); + if (best_length < curr_length) { + best_length = curr_length; + best_distance = base_position - pos; + // Stop if we have reached the maximum length. Otherwise, make sure + // we have executed a minimum number of iterations depending on the + // quality. + if ((best_length == MAX_LENGTH) || + (curr_length >= length_max && iter < iter_min)) { + break; + } + } } - - curr_length = FindMatchLength(argb + pos, argb_start, best_length, max_len); - if (best_length < curr_length) { - distance = base_position - pos; - best_length = curr_length; - best_distance = distance; - if (curr_length >= length_max) { + // We have the best match but in case the two intervals continue matching + // to the left, we have the best matches for the left-extended pixels. + max_base_position = base_position; + while (1) { + assert(best_length <= MAX_LENGTH); + assert(best_distance <= WINDOW_SIZE); + p->offset_length_[base_position] = + (best_distance << MAX_LENGTH_BITS) | (uint32_t)best_length; + --base_position; + // Stop if we don't have a match or if we are out of bounds. + if (best_distance == 0 || base_position == 0) break; + // Stop if we cannot extend the matching intervals to the left. + if (base_position < best_distance || + argb[base_position - best_distance] != argb[base_position]) { break; } + // Stop if we are matching at its limit because there could be a closer + // matching interval with the same maximum length. Then again, if the + // matching interval is as close as possible (best_distance == 1), we will + // never find anything better so let's continue. + if (best_length == MAX_LENGTH && best_distance != 1 && + base_position + MAX_LENGTH < max_base_position) { + break; + } + if (best_length < MAX_LENGTH) { + ++best_length; + max_base_position = base_position; + } } } - *distance_ptr = best_distance; - *length_ptr = best_length; - return (best_length >= MIN_LENGTH); + return 1; +} + +static WEBP_INLINE int HashChainFindOffset(const VP8LHashChain* const p, + const int base_position) { + return p->offset_length_[base_position] >> MAX_LENGTH_BITS; +} + +static WEBP_INLINE int HashChainFindLength(const VP8LHashChain* const p, + const int base_position) { + return p->offset_length_[base_position] & ((1U << MAX_LENGTH_BITS) - 1); +} + +static WEBP_INLINE void HashChainFindCopy(const VP8LHashChain* const p, + int base_position, + int* const offset_ptr, + int* const length_ptr) { + *offset_ptr = HashChainFindOffset(p, base_position); + *length_ptr = HashChainFindLength(p, base_position); } static WEBP_INLINE void AddSingleLiteral(uint32_t pixel, int use_color_cache, @@ -384,84 +421,62 @@ static int BackwardReferencesRle(int xsize, int ysize, static int BackwardReferencesLz77(int xsize, int ysize, const uint32_t* const argb, int cache_bits, - int quality, int low_effort, - VP8LHashChain* const hash_chain, + const VP8LHashChain* const hash_chain, VP8LBackwardRefs* const refs) { int i; + int i_last_check = -1; int ok = 0; int cc_init = 0; const int use_color_cache = (cache_bits > 0); const int pix_count = xsize * ysize; VP8LColorCache hashers; - int iter_max = GetMaxItersForQuality(quality, low_effort); - const int window_size = GetWindowSizeForHashChain(quality, xsize); - int min_matches = 32; if (use_color_cache) { cc_init = VP8LColorCacheInit(&hashers, cache_bits); if (!cc_init) goto Error; } ClearBackwardRefs(refs); - HashChainReset(hash_chain); - for (i = 0; i < pix_count - 2; ) { + for (i = 0; i < pix_count;) { // Alternative#1: Code the pixels starting at 'i' using backward reference. int offset = 0; int len = 0; - const int max_len = MaxFindCopyLength(pix_count - i); - HashChainFindCopy(hash_chain, i, argb, max_len, window_size, - iter_max, &offset, &len); - if (len > MIN_LENGTH || (len == MIN_LENGTH && offset <= 512)) { - int offset2 = 0; - int len2 = 0; - int k; - min_matches = 8; - HashChainInsert(hash_chain, &argb[i], i); - if ((len < (max_len >> 2)) && !low_effort) { - // Evaluate Alternative#2: Insert the pixel at 'i' as literal, and code - // the pixels starting at 'i + 1' using backward reference. - HashChainFindCopy(hash_chain, i + 1, argb, max_len - 1, - window_size, iter_max, &offset2, - &len2); - if (len2 > len + 1) { - AddSingleLiteral(argb[i], use_color_cache, &hashers, refs); - i++; // Backward reference to be done for next pixel. - len = len2; - offset = offset2; + int j; + HashChainFindCopy(hash_chain, i, &offset, &len); + if (len > MIN_LENGTH + 1) { + const int len_ini = len; + int max_reach = 0; + assert(i + len < pix_count); + // Only start from what we have not checked already. + i_last_check = (i > i_last_check) ? i : i_last_check; + // We know the best match for the current pixel but we try to find the + // best matches for the current pixel AND the next one combined. + // The naive method would use the intervals: + // [i,i+len) + [i+len, length of best match at i+len) + // while we check if we can use: + // [i,j) (where j<=i+len) + [j, length of best match at j) + for (j = i_last_check + 1; j <= i + len_ini; ++j) { + const int len_j = HashChainFindLength(hash_chain, j); + const int reach = + j + (len_j > MIN_LENGTH + 1 ? len_j : 1); // 1 for single literal. + if (reach > max_reach) { + len = j - i; + max_reach = reach; } } - BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len)); - if (use_color_cache) { - for (k = 0; k < len; ++k) { - VP8LColorCacheInsert(&hashers, argb[i + k]); - } - } - // Add to the hash_chain (but cannot add the last pixel). - if (offset >= 3 && offset != xsize) { - const int last = (len < pix_count - 1 - i) ? len : pix_count - 1 - i; - for (k = 2; k < last - 8; k += 2) { - HashChainInsert(hash_chain, &argb[i + k], i + k); - } - for (; k < last; ++k) { - HashChainInsert(hash_chain, &argb[i + k], i + k); - } - } - i += len; } else { + len = 1; + } + // Go with literal or backward reference. + assert(len > 0); + if (len == 1) { AddSingleLiteral(argb[i], use_color_cache, &hashers, refs); - HashChainInsert(hash_chain, &argb[i], i); - ++i; - --min_matches; - if (min_matches <= 0) { - AddSingleLiteral(argb[i], use_color_cache, &hashers, refs); - HashChainInsert(hash_chain, &argb[i], i); - ++i; + } else { + BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len)); + if (use_color_cache) { + for (j = i; j < i + len; ++j) VP8LColorCacheInsert(&hashers, argb[j]); } } - } - while (i < pix_count) { - // Handle the last pixel(s). - AddSingleLiteral(argb[i], use_color_cache, &hashers, refs); - ++i; + i += len; } ok = !refs->error_; @@ -482,7 +497,7 @@ typedef struct { static int BackwardReferencesTraceBackwards( int xsize, int ysize, const uint32_t* const argb, int quality, - int cache_bits, VP8LHashChain* const hash_chain, + int cache_bits, const VP8LHashChain* const hash_chain, VP8LBackwardRefs* const refs); static void ConvertPopulationCountTableToBitEstimates( @@ -558,16 +573,14 @@ static WEBP_INLINE double GetDistanceCost(const CostModel* const m, return m->distance_[code] + extra_bits; } -static void AddSingleLiteralWithCostModel( - const uint32_t* const argb, VP8LHashChain* const hash_chain, - VP8LColorCache* const hashers, const CostModel* const cost_model, int idx, - int is_last, int use_color_cache, double prev_cost, float* const cost, - uint16_t* const dist_array) { +static void AddSingleLiteralWithCostModel(const uint32_t* const argb, + VP8LColorCache* const hashers, + const CostModel* const cost_model, + int idx, int use_color_cache, + double prev_cost, float* const cost, + uint16_t* const dist_array) { double cost_val = prev_cost; const uint32_t color = argb[0]; - if (!is_last) { - HashChainInsert(hash_chain, argb, idx); - } if (use_color_cache && VP8LColorCacheContains(hashers, color)) { const double mul0 = 0.68; const int ix = VP8LColorCacheGetIndex(hashers, color); @@ -583,30 +596,598 @@ static void AddSingleLiteralWithCostModel( } } +// ----------------------------------------------------------------------------- +// CostManager and interval handling + +// Empirical value to avoid high memory consumption but good for performance. +#define COST_CACHE_INTERVAL_SIZE_MAX 100 + +// To perform backward reference every pixel at index index_ is considered and +// the cost for the MAX_LENGTH following pixels computed. Those following pixels +// at index index_ + k (k from 0 to MAX_LENGTH) have a cost of: +// distance_cost_ at index_ + GetLengthCost(cost_model, k) +// (named cost) (named cached cost) +// and the minimum value is kept. GetLengthCost(cost_model, k) is cached in an +// array of size MAX_LENGTH. +// Instead of performing MAX_LENGTH comparisons per pixel, we keep track of the +// minimal values using intervals, for which lower_ and upper_ bounds are kept. +// An interval is defined by the index_ of the pixel that generated it and +// is only useful in a range of indices from start_ to end_ (exclusive), i.e. +// it contains the minimum value for pixels between start_ and end_. +// Intervals are stored in a linked list and ordered by start_. When a new +// interval has a better minimum, old intervals are split or removed. +typedef struct CostInterval CostInterval; +struct CostInterval { + double lower_; + double upper_; + int start_; + int end_; + double distance_cost_; + int index_; + CostInterval* previous_; + CostInterval* next_; +}; + +// The GetLengthCost(cost_model, k) part of the costs is also bounded for +// efficiency in a set of intervals of a different type. +// If those intervals are small enough, they are not used for comparison and +// written into the costs right away. +typedef struct { + double lower_; // Lower bound of the interval. + double upper_; // Upper bound of the interval. + int start_; + int end_; // Exclusive. + int do_write_; // If !=0, the interval is saved to cost instead of being kept + // for comparison. +} CostCacheInterval; + +// This structure is in charge of managing intervals and costs. +// It caches the different CostCacheInterval, caches the different +// GetLengthCost(cost_model, k) in cost_cache_ and the CostInterval's (whose +// count_ is limited by COST_CACHE_INTERVAL_SIZE_MAX). +#define COST_MANAGER_MAX_FREE_LIST 10 +typedef struct { + CostInterval* head_; + int count_; // The number of stored intervals. + CostCacheInterval* cache_intervals_; + size_t cache_intervals_size_; + double cost_cache_[MAX_LENGTH]; // Contains the GetLengthCost(cost_model, k). + double min_cost_cache_; // The minimum value in cost_cache_[1:]. + double max_cost_cache_; // The maximum value in cost_cache_[1:]. + float* costs_; + uint16_t* dist_array_; + // Most of the time, we only need few intervals -> use a free-list, to avoid + // fragmentation with small allocs in most common cases. + CostInterval intervals_[COST_MANAGER_MAX_FREE_LIST]; + CostInterval* free_intervals_; + // These are regularly malloc'd remains. This list can't grow larger than than + // size COST_CACHE_INTERVAL_SIZE_MAX - COST_MANAGER_MAX_FREE_LIST, note. + CostInterval* recycled_intervals_; + // Buffer used in BackwardReferencesHashChainDistanceOnly to store the ends + // of the intervals that can have impacted the cost at a pixel. + int* interval_ends_; + int interval_ends_size_; +} CostManager; + +static int IsCostCacheIntervalWritable(int start, int end) { + // 100 is the length for which we consider an interval for comparison, and not + // for writing. + // The first intervals are very small and go in increasing size. This constant + // helps merging them into one big interval (up to index 150/200 usually from + // which intervals start getting much bigger). + // This value is empirical. + return (end - start + 1 < 100); +} + +static void CostIntervalAddToFreeList(CostManager* const manager, + CostInterval* const interval) { + interval->next_ = manager->free_intervals_; + manager->free_intervals_ = interval; +} + +static int CostIntervalIsInFreeList(const CostManager* const manager, + const CostInterval* const interval) { + return (interval >= &manager->intervals_[0] && + interval <= &manager->intervals_[COST_MANAGER_MAX_FREE_LIST - 1]); +} + +static void CostManagerInitFreeList(CostManager* const manager) { + int i; + manager->free_intervals_ = NULL; + for (i = 0; i < COST_MANAGER_MAX_FREE_LIST; ++i) { + CostIntervalAddToFreeList(manager, &manager->intervals_[i]); + } +} + +static void DeleteIntervalList(CostManager* const manager, + const CostInterval* interval) { + while (interval != NULL) { + const CostInterval* const next = interval->next_; + if (!CostIntervalIsInFreeList(manager, interval)) { + WebPSafeFree((void*)interval); + } // else: do nothing + interval = next; + } +} + +static void CostManagerClear(CostManager* const manager) { + if (manager == NULL) return; + + WebPSafeFree(manager->costs_); + WebPSafeFree(manager->cache_intervals_); + WebPSafeFree(manager->interval_ends_); + + // Clear the interval lists. + DeleteIntervalList(manager, manager->head_); + manager->head_ = NULL; + DeleteIntervalList(manager, manager->recycled_intervals_); + manager->recycled_intervals_ = NULL; + + // Reset pointers, count_ and cache_intervals_size_. + memset(manager, 0, sizeof(*manager)); + CostManagerInitFreeList(manager); +} + +static int CostManagerInit(CostManager* const manager, + uint16_t* const dist_array, int pix_count, + const CostModel* const cost_model) { + int i; + const int cost_cache_size = (pix_count > MAX_LENGTH) ? MAX_LENGTH : pix_count; + // This constant is tied to the cost_model we use. + // Empirically, differences between intervals is usually of more than 1. + const double min_cost_diff = 0.1; + + manager->costs_ = NULL; + manager->cache_intervals_ = NULL; + manager->interval_ends_ = NULL; + manager->head_ = NULL; + manager->recycled_intervals_ = NULL; + manager->count_ = 0; + manager->dist_array_ = dist_array; + CostManagerInitFreeList(manager); + + // Fill in the cost_cache_. + manager->cache_intervals_size_ = 1; + manager->cost_cache_[0] = 0; + for (i = 1; i < cost_cache_size; ++i) { + manager->cost_cache_[i] = GetLengthCost(cost_model, i); + // Get an approximation of the number of bound intervals. + if (fabs(manager->cost_cache_[i] - manager->cost_cache_[i - 1]) > + min_cost_diff) { + ++manager->cache_intervals_size_; + } + // Compute the minimum of cost_cache_. + if (i == 1) { + manager->min_cost_cache_ = manager->cost_cache_[1]; + manager->max_cost_cache_ = manager->cost_cache_[1]; + } else if (manager->cost_cache_[i] < manager->min_cost_cache_) { + manager->min_cost_cache_ = manager->cost_cache_[i]; + } else if (manager->cost_cache_[i] > manager->max_cost_cache_) { + manager->max_cost_cache_ = manager->cost_cache_[i]; + } + } + + // With the current cost models, we have 15 intervals, so we are safe by + // setting a maximum of COST_CACHE_INTERVAL_SIZE_MAX. + if (manager->cache_intervals_size_ > COST_CACHE_INTERVAL_SIZE_MAX) { + manager->cache_intervals_size_ = COST_CACHE_INTERVAL_SIZE_MAX; + } + manager->cache_intervals_ = (CostCacheInterval*)WebPSafeMalloc( + manager->cache_intervals_size_, sizeof(*manager->cache_intervals_)); + if (manager->cache_intervals_ == NULL) { + CostManagerClear(manager); + return 0; + } + + // Fill in the cache_intervals_. + { + double cost_prev = -1e38f; // unprobably low initial value + CostCacheInterval* prev = NULL; + CostCacheInterval* cur = manager->cache_intervals_; + const CostCacheInterval* const end = + manager->cache_intervals_ + manager->cache_intervals_size_; + + // Consecutive values in cost_cache_ are compared and if a big enough + // difference is found, a new interval is created and bounded. + for (i = 0; i < cost_cache_size; ++i) { + const double cost_val = manager->cost_cache_[i]; + if (i == 0 || + (fabs(cost_val - cost_prev) > min_cost_diff && cur + 1 < end)) { + if (i > 1) { + const int is_writable = + IsCostCacheIntervalWritable(cur->start_, cur->end_); + // Merge with the previous interval if both are writable. + if (is_writable && cur != manager->cache_intervals_ && + prev->do_write_) { + // Update the previous interval. + prev->end_ = cur->end_; + if (cur->lower_ < prev->lower_) { + prev->lower_ = cur->lower_; + } else if (cur->upper_ > prev->upper_) { + prev->upper_ = cur->upper_; + } + } else { + cur->do_write_ = is_writable; + prev = cur; + ++cur; + } + } + // Initialize an interval. + cur->start_ = i; + cur->do_write_ = 0; + cur->lower_ = cost_val; + cur->upper_ = cost_val; + } else { + // Update the current interval bounds. + if (cost_val < cur->lower_) { + cur->lower_ = cost_val; + } else if (cost_val > cur->upper_) { + cur->upper_ = cost_val; + } + } + cur->end_ = i + 1; + cost_prev = cost_val; + } + manager->cache_intervals_size_ = cur + 1 - manager->cache_intervals_; + } + + manager->costs_ = (float*)WebPSafeMalloc(pix_count, sizeof(*manager->costs_)); + if (manager->costs_ == NULL) { + CostManagerClear(manager); + return 0; + } + // Set the initial costs_ high for every pixel as we will keep the minimum. + for (i = 0; i < pix_count; ++i) manager->costs_[i] = 1e38f; + + // The cost at pixel is influenced by the cost intervals from previous pixels. + // Let us take the specific case where the offset is the same (which actually + // happens a lot in case of uniform regions). + // pixel i contributes to j>i a cost of: offset cost + cost_cache_[j-i] + // pixel i+1 contributes to j>i a cost of: 2*offset cost + cost_cache_[j-i-1] + // pixel i+2 contributes to j>i a cost of: 3*offset cost + cost_cache_[j-i-2] + // and so on. + // A pixel i influences the following length(j) < MAX_LENGTH pixels. What is + // the value of j such that pixel i + j cannot influence any of those pixels? + // This value is such that: + // max of cost_cache_ < j*offset cost + min of cost_cache_ + // (pixel i + j 's cost cannot beat the worst cost given by pixel i). + // This value will be used to optimize the cost computation in + // BackwardReferencesHashChainDistanceOnly. + { + // The offset cost is computed in GetDistanceCost and has a minimum value of + // the minimum in cost_model->distance_. The case where the offset cost is 0 + // will be dealt with differently later so we are only interested in the + // minimum non-zero offset cost. + double offset_cost_min = 0.; + int size; + for (i = 0; i < NUM_DISTANCE_CODES; ++i) { + if (cost_model->distance_[i] != 0) { + if (offset_cost_min == 0.) { + offset_cost_min = cost_model->distance_[i]; + } else if (cost_model->distance_[i] < offset_cost_min) { + offset_cost_min = cost_model->distance_[i]; + } + } + } + // In case all the cost_model->distance_ is 0, the next non-zero cost we + // can have is from the extra bit in GetDistanceCost, hence 1. + if (offset_cost_min < 1.) offset_cost_min = 1.; + + size = 1 + (int)ceil((manager->max_cost_cache_ - manager->min_cost_cache_) / + offset_cost_min); + // Empirically, we usually end up with a value below 100. + if (size > MAX_LENGTH) size = MAX_LENGTH; + + manager->interval_ends_ = + (int*)WebPSafeMalloc(size, sizeof(*manager->interval_ends_)); + if (manager->interval_ends_ == NULL) { + CostManagerClear(manager); + return 0; + } + manager->interval_ends_size_ = size; + } + + return 1; +} + +// Given the distance_cost for pixel 'index', update the cost at pixel 'i' if it +// is smaller than the previously computed value. +static WEBP_INLINE void UpdateCost(CostManager* const manager, int i, int index, + double distance_cost) { + int k = i - index; + double cost_tmp; + assert(k >= 0 && k < MAX_LENGTH); + cost_tmp = distance_cost + manager->cost_cache_[k]; + + if (manager->costs_[i] > cost_tmp) { + manager->costs_[i] = (float)cost_tmp; + manager->dist_array_[i] = k + 1; + } +} + +// Given the distance_cost for pixel 'index', update the cost for all the pixels +// between 'start' and 'end' excluded. +static WEBP_INLINE void UpdateCostPerInterval(CostManager* const manager, + int start, int end, int index, + double distance_cost) { + int i; + for (i = start; i < end; ++i) UpdateCost(manager, i, index, distance_cost); +} + +// Given two intervals, make 'prev' be the previous one of 'next' in 'manager'. +static WEBP_INLINE void ConnectIntervals(CostManager* const manager, + CostInterval* const prev, + CostInterval* const next) { + if (prev != NULL) { + prev->next_ = next; + } else { + manager->head_ = next; + } + + if (next != NULL) next->previous_ = prev; +} + +// Pop an interval in the manager. +static WEBP_INLINE void PopInterval(CostManager* const manager, + CostInterval* const interval) { + CostInterval* const next = interval->next_; + + if (interval == NULL) return; + + ConnectIntervals(manager, interval->previous_, next); + if (CostIntervalIsInFreeList(manager, interval)) { + CostIntervalAddToFreeList(manager, interval); + } else { // recycle regularly malloc'd intervals too + interval->next_ = manager->recycled_intervals_; + manager->recycled_intervals_ = interval; + } + --manager->count_; + assert(manager->count_ >= 0); +} + +// Update the cost at index i by going over all the stored intervals that +// overlap with i. +static WEBP_INLINE void UpdateCostPerIndex(CostManager* const manager, int i) { + CostInterval* current = manager->head_; + + while (current != NULL && current->start_ <= i) { + if (current->end_ <= i) { + // We have an outdated interval, remove it. + CostInterval* next = current->next_; + PopInterval(manager, current); + current = next; + } else { + UpdateCost(manager, i, current->index_, current->distance_cost_); + current = current->next_; + } + } +} + +// Given a current orphan interval and its previous interval, before +// it was orphaned (which can be NULL), set it at the right place in the list +// of intervals using the start_ ordering and the previous interval as a hint. +static WEBP_INLINE void PositionOrphanInterval(CostManager* const manager, + CostInterval* const current, + CostInterval* previous) { + assert(current != NULL); + + if (previous == NULL) previous = manager->head_; + while (previous != NULL && current->start_ < previous->start_) { + previous = previous->previous_; + } + while (previous != NULL && previous->next_ != NULL && + previous->next_->start_ < current->start_) { + previous = previous->next_; + } + + if (previous != NULL) { + ConnectIntervals(manager, current, previous->next_); + } else { + ConnectIntervals(manager, current, manager->head_); + } + ConnectIntervals(manager, previous, current); +} + +// Insert an interval in the list contained in the manager by starting at +// interval_in as a hint. The intervals are sorted by start_ value. +static WEBP_INLINE void InsertInterval(CostManager* const manager, + CostInterval* const interval_in, + double distance_cost, double lower, + double upper, int index, int start, + int end) { + CostInterval* interval_new; + + if (IsCostCacheIntervalWritable(start, end) || + manager->count_ >= COST_CACHE_INTERVAL_SIZE_MAX) { + // Write down the interval if it is too small. + UpdateCostPerInterval(manager, start, end, index, distance_cost); + return; + } + if (manager->free_intervals_ != NULL) { + interval_new = manager->free_intervals_; + manager->free_intervals_ = interval_new->next_; + } else if (manager->recycled_intervals_ != NULL) { + interval_new = manager->recycled_intervals_; + manager->recycled_intervals_ = interval_new->next_; + } else { // malloc for good + interval_new = (CostInterval*)WebPSafeMalloc(1, sizeof(*interval_new)); + if (interval_new == NULL) { + // Write down the interval if we cannot create it. + UpdateCostPerInterval(manager, start, end, index, distance_cost); + return; + } + } + + interval_new->distance_cost_ = distance_cost; + interval_new->lower_ = lower; + interval_new->upper_ = upper; + interval_new->index_ = index; + interval_new->start_ = start; + interval_new->end_ = end; + PositionOrphanInterval(manager, interval_new, interval_in); + + ++manager->count_; +} + +// When an interval has its start_ or end_ modified, it needs to be +// repositioned in the linked list. +static WEBP_INLINE void RepositionInterval(CostManager* const manager, + CostInterval* const interval) { + if (IsCostCacheIntervalWritable(interval->start_, interval->end_)) { + // Maybe interval has been resized and is small enough to be removed. + UpdateCostPerInterval(manager, interval->start_, interval->end_, + interval->index_, interval->distance_cost_); + PopInterval(manager, interval); + return; + } + + // Early exit if interval is at the right spot. + if ((interval->previous_ == NULL || + interval->previous_->start_ <= interval->start_) && + (interval->next_ == NULL || + interval->start_ <= interval->next_->start_)) { + return; + } + + ConnectIntervals(manager, interval->previous_, interval->next_); + PositionOrphanInterval(manager, interval, interval->previous_); +} + +// Given a new cost interval defined by its start at index, its last value and +// distance_cost, add its contributions to the previous intervals and costs. +// If handling the interval or one of its subintervals becomes to heavy, its +// contribution is added to the costs right away. +static WEBP_INLINE void PushInterval(CostManager* const manager, + double distance_cost, int index, + int last) { + size_t i; + CostInterval* interval = manager->head_; + CostInterval* interval_next; + const CostCacheInterval* const cost_cache_intervals = + manager->cache_intervals_; + + for (i = 0; i < manager->cache_intervals_size_ && + cost_cache_intervals[i].start_ < last; + ++i) { + // Define the intersection of the ith interval with the new one. + int start = index + cost_cache_intervals[i].start_; + const int end = index + (cost_cache_intervals[i].end_ > last + ? last + : cost_cache_intervals[i].end_); + const double lower_in = cost_cache_intervals[i].lower_; + const double upper_in = cost_cache_intervals[i].upper_; + const double lower_full_in = distance_cost + lower_in; + const double upper_full_in = distance_cost + upper_in; + + if (cost_cache_intervals[i].do_write_) { + UpdateCostPerInterval(manager, start, end, index, distance_cost); + continue; + } + + for (; interval != NULL && interval->start_ < end && start < end; + interval = interval_next) { + const double lower_full_interval = + interval->distance_cost_ + interval->lower_; + const double upper_full_interval = + interval->distance_cost_ + interval->upper_; + + interval_next = interval->next_; + + // Make sure we have some overlap + if (start >= interval->end_) continue; + + if (lower_full_in >= upper_full_interval) { + // When intervals are represented, the lower, the better. + // [**********************************************************] + // start end + // [----------------------------------] + // interval->start_ interval->end_ + // If we are worse than what we already have, add whatever we have so + // far up to interval. + const int start_new = interval->end_; + InsertInterval(manager, interval, distance_cost, lower_in, upper_in, + index, start, interval->start_); + start = start_new; + continue; + } + + // We know the two intervals intersect. + if (upper_full_in >= lower_full_interval) { + // There is no clear cut on which is best, so let's keep both. + // [*********[*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*]***********] + // start interval->start_ interval->end_ end + // OR + // [*********[*-*-*-*-*-*-*-*-*-*-*-]----------------------] + // start interval->start_ end interval->end_ + const int end_new = (interval->end_ <= end) ? interval->end_ : end; + InsertInterval(manager, interval, distance_cost, lower_in, upper_in, + index, start, end_new); + start = end_new; + } else if (start <= interval->start_ && interval->end_ <= end) { + // [----------------------------------] + // interval->start_ interval->end_ + // [**************************************************************] + // start end + // We can safely remove the old interval as it is fully included. + PopInterval(manager, interval); + } else { + if (interval->start_ <= start && end <= interval->end_) { + // [--------------------------------------------------------------] + // interval->start_ interval->end_ + // [*****************************] + // start end + // We have to split the old interval as it fully contains the new one. + const int end_original = interval->end_; + interval->end_ = start; + InsertInterval(manager, interval, interval->distance_cost_, + interval->lower_, interval->upper_, interval->index_, + end, end_original); + } else if (interval->start_ < start) { + // [------------------------------------] + // interval->start_ interval->end_ + // [*****************************] + // start end + interval->end_ = start; + } else { + // [------------------------------------] + // interval->start_ interval->end_ + // [*****************************] + // start end + interval->start_ = end; + } + + // The interval has been modified, we need to reposition it or write it. + RepositionInterval(manager, interval); + } + } + // Insert the remaining interval from start to end. + InsertInterval(manager, interval, distance_cost, lower_in, upper_in, index, + start, end); + } +} + static int BackwardReferencesHashChainDistanceOnly( - int xsize, int ysize, const uint32_t* const argb, - int quality, int cache_bits, VP8LHashChain* const hash_chain, + int xsize, int ysize, const uint32_t* const argb, int quality, + int cache_bits, const VP8LHashChain* const hash_chain, VP8LBackwardRefs* const refs, uint16_t* const dist_array) { int i; int ok = 0; int cc_init = 0; const int pix_count = xsize * ysize; const int use_color_cache = (cache_bits > 0); - float* const cost = - (float*)WebPSafeMalloc(pix_count, sizeof(*cost)); const size_t literal_array_size = sizeof(double) * (NUM_LITERAL_CODES + NUM_LENGTH_CODES + ((cache_bits > 0) ? (1 << cache_bits) : 0)); const size_t cost_model_size = sizeof(CostModel) + literal_array_size; CostModel* const cost_model = - (CostModel*)WebPSafeMalloc(1ULL, cost_model_size); + (CostModel*)WebPSafeCalloc(1ULL, cost_model_size); VP8LColorCache hashers; const int skip_length = 32 + quality; const int skip_min_distance_code = 2; - int iter_max = GetMaxItersForQuality(quality, 0); - const int window_size = GetWindowSizeForHashChain(quality, xsize); + CostManager* cost_manager = + (CostManager*)WebPSafeMalloc(1ULL, sizeof(*cost_manager)); - if (cost == NULL || cost_model == NULL) goto Error; + if (cost_model == NULL || cost_manager == NULL) goto Error; cost_model->literal_ = (double*)(cost_model + 1); if (use_color_cache) { @@ -618,34 +1199,91 @@ static int BackwardReferencesHashChainDistanceOnly( goto Error; } - for (i = 0; i < pix_count; ++i) cost[i] = 1e38f; + if (!CostManagerInit(cost_manager, dist_array, pix_count, cost_model)) { + goto Error; + } // We loop one pixel at a time, but store all currently best points to // non-processed locations from this point. dist_array[0] = 0; - HashChainReset(hash_chain); // Add first pixel as literal. - AddSingleLiteralWithCostModel(argb + 0, hash_chain, &hashers, cost_model, 0, - 0, use_color_cache, 0.0, cost, dist_array); + AddSingleLiteralWithCostModel(argb + 0, &hashers, cost_model, 0, + use_color_cache, 0.0, cost_manager->costs_, + dist_array); + for (i = 1; i < pix_count - 1; ++i) { - int offset = 0; - int len = 0; - double prev_cost = cost[i - 1]; - const int max_len = MaxFindCopyLength(pix_count - i); - HashChainFindCopy(hash_chain, i, argb, max_len, window_size, - iter_max, &offset, &len); + int offset = 0, len = 0; + double prev_cost = cost_manager->costs_[i - 1]; + HashChainFindCopy(hash_chain, i, &offset, &len); if (len >= MIN_LENGTH) { const int code = DistanceToPlaneCode(xsize, offset); - const double distance_cost = - prev_cost + GetDistanceCost(cost_model, code); - int k; - for (k = 1; k < len; ++k) { - const double cost_val = distance_cost + GetLengthCost(cost_model, k); - if (cost[i + k] > cost_val) { - cost[i + k] = (float)cost_val; - dist_array[i + k] = k + 1; + const double offset_cost = GetDistanceCost(cost_model, code); + const int first_i = i; + int j_max = 0, interval_ends_index = 0; + const int is_offset_zero = (offset_cost == 0.); + + if (!is_offset_zero) { + j_max = (int)ceil( + (cost_manager->max_cost_cache_ - cost_manager->min_cost_cache_) / + offset_cost); + if (j_max < 1) { + j_max = 1; + } else if (j_max > cost_manager->interval_ends_size_ - 1) { + // This could only happen in the case of MAX_LENGTH. + j_max = cost_manager->interval_ends_size_ - 1; + } + } // else j_max is unused anyway. + + // Instead of considering all contributions from a pixel i by calling: + // PushInterval(cost_manager, prev_cost + offset_cost, i, len); + // we optimize these contributions in case offset_cost stays the same for + // consecutive pixels. This describes a set of pixels similar to a + // previous set (e.g. constant color regions). + for (; i < pix_count - 1; ++i) { + int offset_next, len_next; + prev_cost = cost_manager->costs_[i - 1]; + + if (is_offset_zero) { + // No optimization can be made so we just push all of the + // contributions from i. + PushInterval(cost_manager, prev_cost, i, len); + } else { + // j_max is chosen as the smallest j such that: + // max of cost_cache_ < j*offset cost + min of cost_cache_ + // Therefore, the pixel influenced by i-j_max, cannot be influenced + // by i. Only the costs after the end of what i contributed need to be + // updated. cost_manager->interval_ends_ is a circular buffer that + // stores those ends. + const double distance_cost = prev_cost + offset_cost; + int j = cost_manager->interval_ends_[interval_ends_index]; + if (i - first_i <= j_max || + !IsCostCacheIntervalWritable(j, i + len)) { + PushInterval(cost_manager, distance_cost, i, len); + } else { + for (; j < i + len; ++j) { + UpdateCost(cost_manager, j, i, distance_cost); + } + } + // Store the new end in the circular buffer. + assert(interval_ends_index < cost_manager->interval_ends_size_); + cost_manager->interval_ends_[interval_ends_index] = i + len; + if (++interval_ends_index > j_max) interval_ends_index = 0; } + + // Check whether i is the last pixel to consider, as it is handled + // differently. + if (i + 1 >= pix_count - 1) break; + HashChainFindCopy(hash_chain, i + 1, &offset_next, &len_next); + if (offset_next != offset) break; + len = len_next; + UpdateCostPerIndex(cost_manager, i); + AddSingleLiteralWithCostModel(argb + i, &hashers, cost_model, i, + use_color_cache, prev_cost, + cost_manager->costs_, dist_array); } + // Submit the last pixel. + UpdateCostPerIndex(cost_manager, i + 1); + // This if is for speedup only. It roughly doubles the speed, and // makes compression worse by .1 %. if (len >= skip_length && code <= skip_min_distance_code) { @@ -653,53 +1291,55 @@ static int BackwardReferencesHashChainDistanceOnly( // lookups for better copies. // 1) insert the hashes. if (use_color_cache) { + int k; for (k = 0; k < len; ++k) { VP8LColorCacheInsert(&hashers, argb[i + k]); } } - // 2) Add to the hash_chain (but cannot add the last pixel) + // 2) jump. { - const int last = (len + i < pix_count - 1) ? len + i - : pix_count - 1; - for (k = i; k < last; ++k) { - HashChainInsert(hash_chain, &argb[k], k); - } + const int i_next = i + len - 1; // for loop does ++i, thus -1 here. + for (; i <= i_next; ++i) UpdateCostPerIndex(cost_manager, i + 1); + i = i_next; } - // 3) jump. - i += len - 1; // for loop does ++i, thus -1 here. goto next_symbol; } - if (len != MIN_LENGTH) { + if (len > MIN_LENGTH) { int code_min_length; double cost_total; - HashChainFindOffset(hash_chain, i, argb, MIN_LENGTH, window_size, - &offset); + offset = HashChainFindOffset(hash_chain, i); code_min_length = DistanceToPlaneCode(xsize, offset); cost_total = prev_cost + GetDistanceCost(cost_model, code_min_length) + GetLengthCost(cost_model, 1); - if (cost[i + 1] > cost_total) { - cost[i + 1] = (float)cost_total; + if (cost_manager->costs_[i + 1] > cost_total) { + cost_manager->costs_[i + 1] = (float)cost_total; dist_array[i + 1] = 2; } } + } else { // len < MIN_LENGTH + UpdateCostPerIndex(cost_manager, i + 1); } - AddSingleLiteralWithCostModel(argb + i, hash_chain, &hashers, cost_model, i, - 0, use_color_cache, prev_cost, cost, - dist_array); + + AddSingleLiteralWithCostModel(argb + i, &hashers, cost_model, i, + use_color_cache, prev_cost, + cost_manager->costs_, dist_array); + next_symbol: ; } // Handle the last pixel. if (i == (pix_count - 1)) { - AddSingleLiteralWithCostModel(argb + i, hash_chain, &hashers, cost_model, i, - 1, use_color_cache, cost[pix_count - 2], cost, - dist_array); + AddSingleLiteralWithCostModel( + argb + i, &hashers, cost_model, i, use_color_cache, + cost_manager->costs_[pix_count - 2], cost_manager->costs_, dist_array); } + ok = !refs->error_; Error: if (cc_init) VP8LColorCacheClear(&hashers); + CostManagerClear(cost_manager); WebPSafeFree(cost_model); - WebPSafeFree(cost); + WebPSafeFree(cost_manager); return ok; } @@ -723,18 +1363,14 @@ static void TraceBackwards(uint16_t* const dist_array, } static int BackwardReferencesHashChainFollowChosenPath( - int xsize, int ysize, const uint32_t* const argb, - int quality, int cache_bits, + const uint32_t* const argb, int cache_bits, const uint16_t* const chosen_path, int chosen_path_size, - VP8LHashChain* const hash_chain, - VP8LBackwardRefs* const refs) { - const int pix_count = xsize * ysize; + const VP8LHashChain* const hash_chain, VP8LBackwardRefs* const refs) { const int use_color_cache = (cache_bits > 0); int ix; int i = 0; int ok = 0; int cc_init = 0; - const int window_size = GetWindowSizeForHashChain(quality, xsize); VP8LColorCache hashers; if (use_color_cache) { @@ -743,25 +1379,17 @@ static int BackwardReferencesHashChainFollowChosenPath( } ClearBackwardRefs(refs); - HashChainReset(hash_chain); for (ix = 0; ix < chosen_path_size; ++ix) { - int offset = 0; const int len = chosen_path[ix]; if (len != 1) { int k; - HashChainFindOffset(hash_chain, i, argb, len, window_size, &offset); + const int offset = HashChainFindOffset(hash_chain, i); BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len)); if (use_color_cache) { for (k = 0; k < len; ++k) { VP8LColorCacheInsert(&hashers, argb[i + k]); } } - { - const int last = (len < pix_count - 1 - i) ? len : pix_count - 1 - i; - for (k = 0; k < last; ++k) { - HashChainInsert(hash_chain, &argb[i + k], i + k); - } - } i += len; } else { PixOrCopy v; @@ -774,9 +1402,6 @@ static int BackwardReferencesHashChainFollowChosenPath( v = PixOrCopyCreateLiteral(argb[i]); } BackwardRefsCursorAdd(refs, v); - if (i + 1 < pix_count) { - HashChainInsert(hash_chain, &argb[i], i); - } ++i; } } @@ -787,11 +1412,10 @@ static int BackwardReferencesHashChainFollowChosenPath( } // Returns 1 on success. -static int BackwardReferencesTraceBackwards(int xsize, int ysize, - const uint32_t* const argb, - int quality, int cache_bits, - VP8LHashChain* const hash_chain, - VP8LBackwardRefs* const refs) { +static int BackwardReferencesTraceBackwards( + int xsize, int ysize, const uint32_t* const argb, int quality, + int cache_bits, const VP8LHashChain* const hash_chain, + VP8LBackwardRefs* const refs) { int ok = 0; const int dist_array_size = xsize * ysize; uint16_t* chosen_path = NULL; @@ -808,8 +1432,7 @@ static int BackwardReferencesTraceBackwards(int xsize, int ysize, } TraceBackwards(dist_array, dist_array_size, &chosen_path, &chosen_path_size); if (!BackwardReferencesHashChainFollowChosenPath( - xsize, ysize, argb, quality, cache_bits, chosen_path, chosen_path_size, - hash_chain, refs)) { + argb, cache_bits, chosen_path, chosen_path_size, hash_chain, refs)) { goto Error; } ok = 1; @@ -897,7 +1520,7 @@ static double ComputeCacheEntropy(const uint32_t* argb, // Returns 0 in case of memory error. static int CalculateBestCacheSize(const uint32_t* const argb, int xsize, int ysize, int quality, - VP8LHashChain* const hash_chain, + const VP8LHashChain* const hash_chain, VP8LBackwardRefs* const refs, int* const lz77_computed, int* const best_cache_bits) { @@ -917,8 +1540,8 @@ static int CalculateBestCacheSize(const uint32_t* const argb, // Local color cache is disabled. return 1; } - if (!BackwardReferencesLz77(xsize, ysize, argb, cache_bits_low, quality, 0, - hash_chain, refs)) { + if (!BackwardReferencesLz77(xsize, ysize, argb, cache_bits_low, hash_chain, + refs)) { return 0; } // Do a binary search to find the optimal entropy for cache_bits. @@ -983,13 +1606,12 @@ static int BackwardRefsWithLocalCache(const uint32_t* const argb, } static VP8LBackwardRefs* GetBackwardReferencesLowEffort( - int width, int height, const uint32_t* const argb, int quality, - int* const cache_bits, VP8LHashChain* const hash_chain, + int width, int height, const uint32_t* const argb, + int* const cache_bits, const VP8LHashChain* const hash_chain, VP8LBackwardRefs refs_array[2]) { VP8LBackwardRefs* refs_lz77 = &refs_array[0]; *cache_bits = 0; - if (!BackwardReferencesLz77(width, height, argb, 0, quality, - 1 /* Low effort. */, hash_chain, refs_lz77)) { + if (!BackwardReferencesLz77(width, height, argb, 0, hash_chain, refs_lz77)) { return NULL; } BackwardReferences2DLocality(width, refs_lz77); @@ -998,7 +1620,7 @@ static VP8LBackwardRefs* GetBackwardReferencesLowEffort( static VP8LBackwardRefs* GetBackwardReferences( int width, int height, const uint32_t* const argb, int quality, - int* const cache_bits, VP8LHashChain* const hash_chain, + int* const cache_bits, const VP8LHashChain* const hash_chain, VP8LBackwardRefs refs_array[2]) { int lz77_is_useful; int lz77_computed; @@ -1021,8 +1643,8 @@ static VP8LBackwardRefs* GetBackwardReferences( } } } else { - if (!BackwardReferencesLz77(width, height, argb, *cache_bits, quality, - 0 /* Low effort. */, hash_chain, refs_lz77)) { + if (!BackwardReferencesLz77(width, height, argb, *cache_bits, hash_chain, + refs_lz77)) { goto Error; } } @@ -1081,11 +1703,11 @@ static VP8LBackwardRefs* GetBackwardReferences( VP8LBackwardRefs* VP8LGetBackwardReferences( int width, int height, const uint32_t* const argb, int quality, - int low_effort, int* const cache_bits, VP8LHashChain* const hash_chain, - VP8LBackwardRefs refs_array[2]) { + int low_effort, int* const cache_bits, + const VP8LHashChain* const hash_chain, VP8LBackwardRefs refs_array[2]) { if (low_effort) { - return GetBackwardReferencesLowEffort(width, height, argb, quality, - cache_bits, hash_chain, refs_array); + return GetBackwardReferencesLowEffort(width, height, argb, cache_bits, + hash_chain, refs_array); } else { return GetBackwardReferences(width, height, argb, quality, cache_bits, hash_chain, refs_array); diff --git a/src/3rdparty/libwebp/src/enc/backward_references.h b/src/3rdparty/libwebp/src/enc/backward_references.h index daa084d..0cadb11 100644 --- a/src/3rdparty/libwebp/src/enc/backward_references.h +++ b/src/3rdparty/libwebp/src/enc/backward_references.h @@ -115,11 +115,12 @@ static WEBP_INLINE uint32_t PixOrCopyDistance(const PixOrCopy* const p) { typedef struct VP8LHashChain VP8LHashChain; struct VP8LHashChain { - // Stores the most recently added position with the given hash value. - int32_t hash_to_first_index_[HASH_SIZE]; - // chain_[pos] stores the previous position with the same hash value - // for every pixel in the image. - int32_t* chain_; + // The 20 most significant bits contain the offset at which the best match + // is found. These 20 bits are the limit defined by GetWindowSizeForHashChain + // (through WINDOW_SIZE = 1<<20). + // The lower 12 bits contain the length of the match. The 12 bit limit is + // defined in MaxFindCopyLength with MAX_LENGTH=4096. + uint32_t* offset_length_; // This is the maximum size of the hash_chain that can be constructed. // Typically this is the pixel count (width x height) for a given image. int size_; @@ -127,6 +128,9 @@ struct VP8LHashChain { // Must be called first, to set size. int VP8LHashChainInit(VP8LHashChain* const p, int size); +// Pre-compute the best matches for argb. +int VP8LHashChainFill(VP8LHashChain* const p, int quality, + const uint32_t* const argb, int xsize, int ysize); void VP8LHashChainClear(VP8LHashChain* const p); // release memory // ----------------------------------------------------------------------------- @@ -192,8 +196,8 @@ static WEBP_INLINE void VP8LRefsCursorNext(VP8LRefsCursor* const c) { // refs[0] or refs[1]. VP8LBackwardRefs* VP8LGetBackwardReferences( int width, int height, const uint32_t* const argb, int quality, - int low_effort, int* const cache_bits, VP8LHashChain* const hash_chain, - VP8LBackwardRefs refs[2]); + int low_effort, int* const cache_bits, + const VP8LHashChain* const hash_chain, VP8LBackwardRefs refs[2]); #ifdef __cplusplus } diff --git a/src/3rdparty/libwebp/src/enc/filter.c b/src/3rdparty/libwebp/src/enc/filter.c index 41813cf..e8ea8b4 100644 --- a/src/3rdparty/libwebp/src/enc/filter.c +++ b/src/3rdparty/libwebp/src/enc/filter.c @@ -107,10 +107,9 @@ static void DoFilter(const VP8EncIterator* const it, int level) { //------------------------------------------------------------------------------ // SSIM metric -enum { KERNEL = 3 }; static const double kMinValue = 1.e-10; // minimal threshold -void VP8SSIMAddStats(const DistoStats* const src, DistoStats* const dst) { +void VP8SSIMAddStats(const VP8DistoStats* const src, VP8DistoStats* const dst) { dst->w += src->w; dst->xm += src->xm; dst->ym += src->ym; @@ -119,32 +118,7 @@ void VP8SSIMAddStats(const DistoStats* const src, DistoStats* const dst) { dst->yym += src->yym; } -static void VP8SSIMAccumulate(const uint8_t* src1, int stride1, - const uint8_t* src2, int stride2, - int xo, int yo, int W, int H, - DistoStats* const stats) { - const int ymin = (yo - KERNEL < 0) ? 0 : yo - KERNEL; - const int ymax = (yo + KERNEL > H - 1) ? H - 1 : yo + KERNEL; - const int xmin = (xo - KERNEL < 0) ? 0 : xo - KERNEL; - const int xmax = (xo + KERNEL > W - 1) ? W - 1 : xo + KERNEL; - int x, y; - src1 += ymin * stride1; - src2 += ymin * stride2; - for (y = ymin; y <= ymax; ++y, src1 += stride1, src2 += stride2) { - for (x = xmin; x <= xmax; ++x) { - const int s1 = src1[x]; - const int s2 = src2[x]; - stats->w += 1; - stats->xm += s1; - stats->ym += s2; - stats->xxm += s1 * s1; - stats->xym += s1 * s2; - stats->yym += s2 * s2; - } - } -} - -double VP8SSIMGet(const DistoStats* const stats) { +double VP8SSIMGet(const VP8DistoStats* const stats) { const double xmxm = stats->xm * stats->xm; const double ymym = stats->ym * stats->ym; const double xmym = stats->xm * stats->ym; @@ -165,7 +139,7 @@ double VP8SSIMGet(const DistoStats* const stats) { return (fden != 0.) ? fnum / fden : kMinValue; } -double VP8SSIMGetSquaredError(const DistoStats* const s) { +double VP8SSIMGetSquaredError(const VP8DistoStats* const s) { if (s->w > 0.) { const double iw2 = 1. / (s->w * s->w); const double sxx = s->xxm * s->w - s->xm * s->xm; @@ -177,34 +151,66 @@ double VP8SSIMGetSquaredError(const DistoStats* const s) { return kMinValue; } +#define LIMIT(A, M) ((A) > (M) ? (M) : (A)) +static void VP8SSIMAccumulateRow(const uint8_t* src1, int stride1, + const uint8_t* src2, int stride2, + int y, int W, int H, + VP8DistoStats* const stats) { + int x = 0; + const int w0 = LIMIT(VP8_SSIM_KERNEL, W); + for (x = 0; x < w0; ++x) { + VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats); + } + for (; x <= W - 8 + VP8_SSIM_KERNEL; ++x) { + VP8SSIMAccumulate( + src1 + (y - VP8_SSIM_KERNEL) * stride1 + (x - VP8_SSIM_KERNEL), stride1, + src2 + (y - VP8_SSIM_KERNEL) * stride2 + (x - VP8_SSIM_KERNEL), stride2, + stats); + } + for (; x < W; ++x) { + VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats); + } +} + void VP8SSIMAccumulatePlane(const uint8_t* src1, int stride1, const uint8_t* src2, int stride2, - int W, int H, DistoStats* const stats) { + int W, int H, VP8DistoStats* const stats) { int x, y; - for (y = 0; y < H; ++y) { + const int h0 = LIMIT(VP8_SSIM_KERNEL, H); + const int h1 = LIMIT(VP8_SSIM_KERNEL, H - VP8_SSIM_KERNEL); + for (y = 0; y < h0; ++y) { + for (x = 0; x < W; ++x) { + VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats); + } + } + for (; y < h1; ++y) { + VP8SSIMAccumulateRow(src1, stride1, src2, stride2, y, W, H, stats); + } + for (; y < H; ++y) { for (x = 0; x < W; ++x) { - VP8SSIMAccumulate(src1, stride1, src2, stride2, x, y, W, H, stats); + VP8SSIMAccumulateClipped(src1, stride1, src2, stride2, x, y, W, H, stats); } } } +#undef LIMIT static double GetMBSSIM(const uint8_t* yuv1, const uint8_t* yuv2) { int x, y; - DistoStats s = { .0, .0, .0, .0, .0, .0 }; + VP8DistoStats s = { .0, .0, .0, .0, .0, .0 }; // compute SSIM in a 10 x 10 window - for (x = 3; x < 13; x++) { - for (y = 3; y < 13; y++) { - VP8SSIMAccumulate(yuv1 + Y_OFF_ENC, BPS, yuv2 + Y_OFF_ENC, BPS, - x, y, 16, 16, &s); + for (y = VP8_SSIM_KERNEL; y < 16 - VP8_SSIM_KERNEL; y++) { + for (x = VP8_SSIM_KERNEL; x < 16 - VP8_SSIM_KERNEL; x++) { + VP8SSIMAccumulateClipped(yuv1 + Y_OFF_ENC, BPS, yuv2 + Y_OFF_ENC, BPS, + x, y, 16, 16, &s); } } for (x = 1; x < 7; x++) { for (y = 1; y < 7; y++) { - VP8SSIMAccumulate(yuv1 + U_OFF_ENC, BPS, yuv2 + U_OFF_ENC, BPS, - x, y, 8, 8, &s); - VP8SSIMAccumulate(yuv1 + V_OFF_ENC, BPS, yuv2 + V_OFF_ENC, BPS, - x, y, 8, 8, &s); + VP8SSIMAccumulateClipped(yuv1 + U_OFF_ENC, BPS, yuv2 + U_OFF_ENC, BPS, + x, y, 8, 8, &s); + VP8SSIMAccumulateClipped(yuv1 + V_OFF_ENC, BPS, yuv2 + V_OFF_ENC, BPS, + x, y, 8, 8, &s); } } return VP8SSIMGet(&s); @@ -222,6 +228,7 @@ void VP8InitFilter(VP8EncIterator* const it) { (*it->lf_stats_)[s][i] = 0; } } + VP8SSIMDspInit(); } } diff --git a/src/3rdparty/libwebp/src/enc/histogram.c b/src/3rdparty/libwebp/src/enc/histogram.c index 869882d..395372b 100644 --- a/src/3rdparty/libwebp/src/enc/histogram.c +++ b/src/3rdparty/libwebp/src/enc/histogram.c @@ -386,29 +386,27 @@ static void UpdateHistogramCost(VP8LHistogram* const h) { } static int GetBinIdForEntropy(double min, double max, double val) { - const double range = max - min + 1e-6; - const double delta = val - min; - return (int)(NUM_PARTITIONS * delta / range); + const double range = max - min; + if (range > 0.) { + const double delta = val - min; + return (int)((NUM_PARTITIONS - 1e-6) * delta / range); + } else { + return 0; + } } -static int GetHistoBinIndexLowEffort( - const VP8LHistogram* const h, const DominantCostRange* const c) { - const int bin_id = GetBinIdForEntropy(c->literal_min_, c->literal_max_, - h->literal_cost_); +static int GetHistoBinIndex(const VP8LHistogram* const h, + const DominantCostRange* const c, int low_effort) { + int bin_id = GetBinIdForEntropy(c->literal_min_, c->literal_max_, + h->literal_cost_); assert(bin_id < NUM_PARTITIONS); - return bin_id; -} - -static int GetHistoBinIndex( - const VP8LHistogram* const h, const DominantCostRange* const c) { - const int bin_id = - GetBinIdForEntropy(c->blue_min_, c->blue_max_, h->blue_cost_) + - NUM_PARTITIONS * GetBinIdForEntropy(c->red_min_, c->red_max_, - h->red_cost_) + - NUM_PARTITIONS * NUM_PARTITIONS * GetBinIdForEntropy(c->literal_min_, - c->literal_max_, - h->literal_cost_); - assert(bin_id < BIN_SIZE); + if (!low_effort) { + bin_id = bin_id * NUM_PARTITIONS + + GetBinIdForEntropy(c->red_min_, c->red_max_, h->red_cost_); + bin_id = bin_id * NUM_PARTITIONS + + GetBinIdForEntropy(c->blue_min_, c->blue_max_, h->blue_cost_); + assert(bin_id < BIN_SIZE); + } return bin_id; } @@ -469,16 +467,13 @@ static void HistogramAnalyzeEntropyBin(VP8LHistogramSet* const image_histo, // bin-hash histograms on three of the dominant (literal, red and blue) // symbol costs. for (i = 0; i < histo_size; ++i) { - int num_histos; - VP8LHistogram* const histo = histograms[i]; - const int16_t bin_id = low_effort ? - (int16_t)GetHistoBinIndexLowEffort(histo, &cost_range) : - (int16_t)GetHistoBinIndex(histo, &cost_range); + const VP8LHistogram* const histo = histograms[i]; + const int bin_id = GetHistoBinIndex(histo, &cost_range, low_effort); const int bin_offset = bin_id * bin_depth; // bin_map[n][0] for every bin 'n' maintains the counter for the number of // histograms in that bin. // Get and increment the num_histos in that bin. - num_histos = ++bin_map[bin_offset]; + const int num_histos = ++bin_map[bin_offset]; assert(bin_offset + num_histos < bin_depth * BIN_SIZE); // Add histogram i'th index at num_histos (last) position in the bin_map. bin_map[bin_offset + num_histos] = i; @@ -636,8 +631,11 @@ static void UpdateQueueFront(HistoQueue* const histo_queue) { // ----------------------------------------------------------------------------- static void PreparePair(VP8LHistogram** histograms, int idx1, int idx2, - HistogramPair* const pair, - VP8LHistogram* const histos) { + HistogramPair* const pair) { + VP8LHistogram* h1; + VP8LHistogram* h2; + double sum_cost; + if (idx1 > idx2) { const int tmp = idx2; idx2 = idx1; @@ -645,15 +643,17 @@ static void PreparePair(VP8LHistogram** histograms, int idx1, int idx2, } pair->idx1 = idx1; pair->idx2 = idx2; - pair->cost_diff = - HistogramAddEval(histograms[idx1], histograms[idx2], histos, 0); - pair->cost_combo = histos->bit_cost_; + h1 = histograms[idx1]; + h2 = histograms[idx2]; + sum_cost = h1->bit_cost_ + h2->bit_cost_; + pair->cost_combo = 0.; + GetCombinedHistogramEntropy(h1, h2, sum_cost, &pair->cost_combo); + pair->cost_diff = pair->cost_combo - sum_cost; } // Combines histograms by continuously choosing the one with the highest cost // reduction. -static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo, - VP8LHistogram* const histos) { +static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo) { int ok = 0; int image_histo_size = image_histo->size; int i, j; @@ -672,8 +672,7 @@ static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo, clusters[i] = i; for (j = i + 1; j < image_histo_size; ++j) { // Initialize positions array. - PreparePair(histograms, i, j, &histo_queue.queue[histo_queue.size], - histos); + PreparePair(histograms, i, j, &histo_queue.queue[histo_queue.size]); UpdateQueueFront(&histo_queue); } } @@ -715,7 +714,7 @@ static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo, for (i = 0; i < image_histo_size; ++i) { if (clusters[i] != idx1) { PreparePair(histograms, idx1, clusters[i], - &histo_queue.queue[histo_queue.size], histos); + &histo_queue.queue[histo_queue.size]); UpdateQueueFront(&histo_queue); } } @@ -736,11 +735,10 @@ static int HistogramCombineGreedy(VP8LHistogramSet* const image_histo, return ok; } -static VP8LHistogram* HistogramCombineStochastic( - VP8LHistogramSet* const image_histo, - VP8LHistogram* tmp_histo, - VP8LHistogram* best_combo, - int quality, int min_cluster_size) { +static void HistogramCombineStochastic(VP8LHistogramSet* const image_histo, + VP8LHistogram* tmp_histo, + VP8LHistogram* best_combo, + int quality, int min_cluster_size) { int iter; uint32_t seed = 0; int tries_with_no_success = 0; @@ -800,7 +798,6 @@ static VP8LHistogram* HistogramCombineStochastic( } } image_histo->size = image_histo_size; - return best_combo; } // ----------------------------------------------------------------------------- @@ -808,24 +805,23 @@ static VP8LHistogram* HistogramCombineStochastic( // Find the best 'out' histogram for each of the 'in' histograms. // Note: we assume that out[]->bit_cost_ is already up-to-date. -static void HistogramRemap(const VP8LHistogramSet* const orig_histo, - const VP8LHistogramSet* const image_histo, +static void HistogramRemap(const VP8LHistogramSet* const in, + const VP8LHistogramSet* const out, uint16_t* const symbols) { int i; - VP8LHistogram** const orig_histograms = orig_histo->histograms; - VP8LHistogram** const histograms = image_histo->histograms; - const int orig_histo_size = orig_histo->size; - const int image_histo_size = image_histo->size; - if (image_histo_size > 1) { - for (i = 0; i < orig_histo_size; ++i) { + VP8LHistogram** const in_histo = in->histograms; + VP8LHistogram** const out_histo = out->histograms; + const int in_size = in->size; + const int out_size = out->size; + if (out_size > 1) { + for (i = 0; i < in_size; ++i) { int best_out = 0; - double best_bits = - HistogramAddThresh(histograms[0], orig_histograms[i], MAX_COST); + double best_bits = MAX_COST; int k; - for (k = 1; k < image_histo_size; ++k) { + for (k = 0; k < out_size; ++k) { const double cur_bits = - HistogramAddThresh(histograms[k], orig_histograms[i], best_bits); - if (cur_bits < best_bits) { + HistogramAddThresh(out_histo[k], in_histo[i], best_bits); + if (k == 0 || cur_bits < best_bits) { best_bits = cur_bits; best_out = k; } @@ -833,20 +829,20 @@ static void HistogramRemap(const VP8LHistogramSet* const orig_histo, symbols[i] = best_out; } } else { - assert(image_histo_size == 1); - for (i = 0; i < orig_histo_size; ++i) { + assert(out_size == 1); + for (i = 0; i < in_size; ++i) { symbols[i] = 0; } } // Recompute each out based on raw and symbols. - for (i = 0; i < image_histo_size; ++i) { - HistogramClear(histograms[i]); + for (i = 0; i < out_size; ++i) { + HistogramClear(out_histo[i]); } - for (i = 0; i < orig_histo_size; ++i) { + for (i = 0; i < in_size; ++i) { const int idx = symbols[i]; - VP8LHistogramAdd(orig_histograms[i], histograms[idx], histograms[idx]); + VP8LHistogramAdd(in_histo[i], out_histo[idx], out_histo[idx]); } } @@ -920,11 +916,10 @@ int VP8LGetHistoImageSymbols(int xsize, int ysize, const float x = quality / 100.f; // cubic ramp between 1 and MAX_HISTO_GREEDY: const int threshold_size = (int)(1 + (x * x * x) * (MAX_HISTO_GREEDY - 1)); - cur_combo = HistogramCombineStochastic(image_histo, - tmp_histos->histograms[0], - cur_combo, quality, threshold_size); + HistogramCombineStochastic(image_histo, tmp_histos->histograms[0], + cur_combo, quality, threshold_size); if ((image_histo->size <= threshold_size) && - !HistogramCombineGreedy(image_histo, cur_combo)) { + !HistogramCombineGreedy(image_histo)) { goto Error; } } diff --git a/src/3rdparty/libwebp/src/enc/near_lossless.c b/src/3rdparty/libwebp/src/enc/near_lossless.c index 9bc0f0e..f4ab91f 100644 --- a/src/3rdparty/libwebp/src/enc/near_lossless.c +++ b/src/3rdparty/libwebp/src/enc/near_lossless.c @@ -14,6 +14,7 @@ // Author: Jyrki Alakuijala (jyrki@google.com) // Converted to C by Aleksander Kramarz (akramarz@google.com) +#include <assert.h> #include <stdlib.h> #include "../dsp/lossless.h" @@ -23,42 +24,14 @@ #define MIN_DIM_FOR_NEAR_LOSSLESS 64 #define MAX_LIMIT_BITS 5 -// Computes quantized pixel value and distance from original value. -static void GetValAndDistance(int a, int initial, int bits, - int* const val, int* const distance) { - const int mask = ~((1 << bits) - 1); - *val = (initial & mask) | (initial >> (8 - bits)); - *distance = 2 * abs(a - *val); -} - -// Clamps the value to range [0, 255]. -static int Clamp8b(int val) { - const int min_val = 0; - const int max_val = 0xff; - return (val < min_val) ? min_val : (val > max_val) ? max_val : val; -} - -// Quantizes values {a, a+(1<<bits), a-(1<<bits)} and returns the nearest one. +// Quantizes the value up or down to a multiple of 1<<bits (or to 255), +// choosing the closer one, resolving ties using bankers' rounding. static int FindClosestDiscretized(int a, int bits) { - int best_val = a, i; - int min_distance = 256; - - for (i = -1; i <= 1; ++i) { - int candidate, distance; - const int val = Clamp8b(a + i * (1 << bits)); - GetValAndDistance(a, val, bits, &candidate, &distance); - if (i != 0) { - ++distance; - } - // Smallest distance but favor i == 0 over i == -1 and i == 1 - // since that keeps the overall intensity more constant in the - // images. - if (distance < min_distance) { - min_distance = distance; - best_val = candidate; - } - } - return best_val; + const int mask = (1 << bits) - 1; + const int biased = a + (mask >> 1) + ((a >> bits) & 1); + assert(bits > 0); + if (biased > 0xff) return 0xff; + return biased & ~mask; } // Applies FindClosestDiscretized to all channels of pixel. @@ -124,22 +97,11 @@ static void NearLossless(int xsize, int ysize, uint32_t* argb, } } -static int QualityToLimitBits(int quality) { - // quality mapping: - // 0..19 -> 5 - // 0..39 -> 4 - // 0..59 -> 3 - // 0..79 -> 2 - // 0..99 -> 1 - // 100 -> 0 - return MAX_LIMIT_BITS - quality / 20; -} - int VP8ApplyNearLossless(int xsize, int ysize, uint32_t* argb, int quality) { int i; uint32_t* const copy_buffer = (uint32_t*)WebPSafeMalloc(xsize * 3, sizeof(*copy_buffer)); - const int limit_bits = QualityToLimitBits(quality); + const int limit_bits = VP8LNearLosslessBits(quality); assert(argb != NULL); assert(limit_bits >= 0); assert(limit_bits <= MAX_LIMIT_BITS); diff --git a/src/3rdparty/libwebp/src/enc/picture.c b/src/3rdparty/libwebp/src/enc/picture.c index 26679a7..d9befbc 100644 --- a/src/3rdparty/libwebp/src/enc/picture.c +++ b/src/3rdparty/libwebp/src/enc/picture.c @@ -237,6 +237,8 @@ static size_t Encode(const uint8_t* rgba, int width, int height, int stride, WebPMemoryWriter wrt; int ok; + if (output == NULL) return 0; + if (!WebPConfigPreset(&config, WEBP_PRESET_DEFAULT, quality_factor) || !WebPPictureInit(&pic)) { return 0; // shouldn't happen, except if system installation is broken diff --git a/src/3rdparty/libwebp/src/enc/picture_csp.c b/src/3rdparty/libwebp/src/enc/picture_csp.c index 0ef5f9e..607a624 100644 --- a/src/3rdparty/libwebp/src/enc/picture_csp.c +++ b/src/3rdparty/libwebp/src/enc/picture_csp.c @@ -1125,32 +1125,44 @@ static int Import(WebPPicture* const picture, int WebPPictureImportRGB(WebPPicture* picture, const uint8_t* rgb, int rgb_stride) { - return (picture != NULL) ? Import(picture, rgb, rgb_stride, 3, 0, 0) : 0; + return (picture != NULL && rgb != NULL) + ? Import(picture, rgb, rgb_stride, 3, 0, 0) + : 0; } int WebPPictureImportBGR(WebPPicture* picture, const uint8_t* rgb, int rgb_stride) { - return (picture != NULL) ? Import(picture, rgb, rgb_stride, 3, 1, 0) : 0; + return (picture != NULL && rgb != NULL) + ? Import(picture, rgb, rgb_stride, 3, 1, 0) + : 0; } int WebPPictureImportRGBA(WebPPicture* picture, const uint8_t* rgba, int rgba_stride) { - return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 0, 1) : 0; + return (picture != NULL && rgba != NULL) + ? Import(picture, rgba, rgba_stride, 4, 0, 1) + : 0; } int WebPPictureImportBGRA(WebPPicture* picture, const uint8_t* rgba, int rgba_stride) { - return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 1, 1) : 0; + return (picture != NULL && rgba != NULL) + ? Import(picture, rgba, rgba_stride, 4, 1, 1) + : 0; } int WebPPictureImportRGBX(WebPPicture* picture, const uint8_t* rgba, int rgba_stride) { - return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 0, 0) : 0; + return (picture != NULL && rgba != NULL) + ? Import(picture, rgba, rgba_stride, 4, 0, 0) + : 0; } int WebPPictureImportBGRX(WebPPicture* picture, const uint8_t* rgba, int rgba_stride) { - return (picture != NULL) ? Import(picture, rgba, rgba_stride, 4, 1, 0) : 0; + return (picture != NULL && rgba != NULL) + ? Import(picture, rgba, rgba_stride, 4, 1, 0) + : 0; } //------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/enc/picture_psnr.c b/src/3rdparty/libwebp/src/enc/picture_psnr.c index 40214ef..81ab1b5 100644 --- a/src/3rdparty/libwebp/src/enc/picture_psnr.c +++ b/src/3rdparty/libwebp/src/enc/picture_psnr.c @@ -27,7 +27,7 @@ static void AccumulateLSIM(const uint8_t* src, int src_stride, const uint8_t* ref, int ref_stride, - int w, int h, DistoStats* stats) { + int w, int h, VP8DistoStats* stats) { int x, y; double total_sse = 0.; for (y = 0; y < h; ++y) { @@ -71,11 +71,13 @@ static float GetPSNR(const double v) { int WebPPictureDistortion(const WebPPicture* src, const WebPPicture* ref, int type, float result[5]) { - DistoStats stats[5]; + VP8DistoStats stats[5]; int w, h; memset(stats, 0, sizeof(stats)); + VP8SSIMDspInit(); + if (src == NULL || ref == NULL || src->width != ref->width || src->height != ref->height || src->use_argb != ref->use_argb || result == NULL) { diff --git a/src/3rdparty/libwebp/src/enc/quant.c b/src/3rdparty/libwebp/src/enc/quant.c index dd6885a..549ad26 100644 --- a/src/3rdparty/libwebp/src/enc/quant.c +++ b/src/3rdparty/libwebp/src/enc/quant.c @@ -30,8 +30,6 @@ #define SNS_TO_DQ 0.9 // Scaling constant between the sns value and the QP // power-law modulation. Must be strictly less than 1. -#define I4_PENALTY 14000 // Rate-penalty for quick i4/i16 decision - // number of non-zero coeffs below which we consider the block very flat // (and apply a penalty to complex predictions) #define FLATNESS_LIMIT_I16 10 // I16 mode @@ -236,6 +234,8 @@ static int ExpandMatrix(VP8Matrix* const m, int type) { return (sum + 8) >> 4; } +static void CheckLambdaValue(int* const v) { if (*v < 1) *v = 1; } + static void SetupMatrices(VP8Encoder* enc) { int i; const int tlambda_scale = @@ -245,7 +245,7 @@ static void SetupMatrices(VP8Encoder* enc) { for (i = 0; i < num_segments; ++i) { VP8SegmentInfo* const m = &enc->dqm_[i]; const int q = m->quant_; - int q4, q16, quv; + int q_i4, q_i16, q_uv; m->y1_.q_[0] = kDcTable[clip(q + enc->dq_y1_dc_, 0, 127)]; m->y1_.q_[1] = kAcTable[clip(q, 0, 127)]; @@ -255,21 +255,33 @@ static void SetupMatrices(VP8Encoder* enc) { m->uv_.q_[0] = kDcTable[clip(q + enc->dq_uv_dc_, 0, 117)]; m->uv_.q_[1] = kAcTable[clip(q + enc->dq_uv_ac_, 0, 127)]; - q4 = ExpandMatrix(&m->y1_, 0); - q16 = ExpandMatrix(&m->y2_, 1); - quv = ExpandMatrix(&m->uv_, 2); - - m->lambda_i4_ = (3 * q4 * q4) >> 7; - m->lambda_i16_ = (3 * q16 * q16); - m->lambda_uv_ = (3 * quv * quv) >> 6; - m->lambda_mode_ = (1 * q4 * q4) >> 7; - m->lambda_trellis_i4_ = (7 * q4 * q4) >> 3; - m->lambda_trellis_i16_ = (q16 * q16) >> 2; - m->lambda_trellis_uv_ = (quv *quv) << 1; - m->tlambda_ = (tlambda_scale * q4) >> 5; + q_i4 = ExpandMatrix(&m->y1_, 0); + q_i16 = ExpandMatrix(&m->y2_, 1); + q_uv = ExpandMatrix(&m->uv_, 2); + + m->lambda_i4_ = (3 * q_i4 * q_i4) >> 7; + m->lambda_i16_ = (3 * q_i16 * q_i16); + m->lambda_uv_ = (3 * q_uv * q_uv) >> 6; + m->lambda_mode_ = (1 * q_i4 * q_i4) >> 7; + m->lambda_trellis_i4_ = (7 * q_i4 * q_i4) >> 3; + m->lambda_trellis_i16_ = (q_i16 * q_i16) >> 2; + m->lambda_trellis_uv_ = (q_uv * q_uv) << 1; + m->tlambda_ = (tlambda_scale * q_i4) >> 5; + + // none of these constants should be < 1 + CheckLambdaValue(&m->lambda_i4_); + CheckLambdaValue(&m->lambda_i16_); + CheckLambdaValue(&m->lambda_uv_); + CheckLambdaValue(&m->lambda_mode_); + CheckLambdaValue(&m->lambda_trellis_i4_); + CheckLambdaValue(&m->lambda_trellis_i16_); + CheckLambdaValue(&m->lambda_trellis_uv_); + CheckLambdaValue(&m->tlambda_); m->min_disto_ = 10 * m->y1_.q_[0]; // quantization-aware min disto m->max_edge_ = 0; + + m->i4_penalty_ = 1000 * q_i4 * q_i4; } } @@ -348,7 +360,12 @@ static int SegmentsAreEquivalent(const VP8SegmentInfo* const S1, static void SimplifySegments(VP8Encoder* const enc) { int map[NUM_MB_SEGMENTS] = { 0, 1, 2, 3 }; - const int num_segments = enc->segment_hdr_.num_segments_; + // 'num_segments_' is previously validated and <= NUM_MB_SEGMENTS, but an + // explicit check is needed to avoid a spurious warning about 'i' exceeding + // array bounds of 'dqm_' with some compilers (noticed with gcc-4.9). + const int num_segments = (enc->segment_hdr_.num_segments_ < NUM_MB_SEGMENTS) + ? enc->segment_hdr_.num_segments_ + : NUM_MB_SEGMENTS; int num_final_segments = 1; int s1, s2; for (s1 = 1; s1 < num_segments; ++s1) { // find similar segments @@ -1127,19 +1144,29 @@ static void RefineUsingDistortion(VP8EncIterator* const it, int try_both_modes, int refine_uv_mode, VP8ModeScore* const rd) { score_t best_score = MAX_COST; - score_t score_i4 = (score_t)I4_PENALTY; - int16_t tmp_levels[16][16]; - uint8_t modes_i4[16]; int nz = 0; int mode; int is_i16 = try_both_modes || (it->mb_->type_ == 1); + const VP8SegmentInfo* const dqm = &it->enc_->dqm_[it->mb_->segment_]; + // Some empiric constants, of approximate order of magnitude. + const int lambda_d_i16 = 106; + const int lambda_d_i4 = 11; + const int lambda_d_uv = 120; + score_t score_i4 = dqm->i4_penalty_; + score_t i4_bit_sum = 0; + const score_t bit_limit = it->enc_->mb_header_limit_; + if (is_i16) { // First, evaluate Intra16 distortion int best_mode = -1; const uint8_t* const src = it->yuv_in_ + Y_OFF_ENC; for (mode = 0; mode < NUM_PRED_MODES; ++mode) { const uint8_t* const ref = it->yuv_p_ + VP8I16ModeOffsets[mode]; - const score_t score = VP8SSE16x16(src, ref); + const score_t score = VP8SSE16x16(src, ref) * RD_DISTO_MULT + + VP8FixedCostsI16[mode] * lambda_d_i16; + if (mode > 0 && VP8FixedCostsI16[mode] > bit_limit) { + continue; + } if (score < best_score) { best_mode = mode; best_score = score; @@ -1159,25 +1186,28 @@ static void RefineUsingDistortion(VP8EncIterator* const it, int best_i4_mode = -1; score_t best_i4_score = MAX_COST; const uint8_t* const src = it->yuv_in_ + Y_OFF_ENC + VP8Scan[it->i4_]; + const uint16_t* const mode_costs = GetCostModeI4(it, rd->modes_i4); VP8MakeIntra4Preds(it); for (mode = 0; mode < NUM_BMODES; ++mode) { const uint8_t* const ref = it->yuv_p_ + VP8I4ModeOffsets[mode]; - const score_t score = VP8SSE4x4(src, ref); + const score_t score = VP8SSE4x4(src, ref) * RD_DISTO_MULT + + mode_costs[mode] * lambda_d_i4; if (score < best_i4_score) { best_i4_mode = mode; best_i4_score = score; } } - modes_i4[it->i4_] = best_i4_mode; + i4_bit_sum += mode_costs[best_i4_mode]; + rd->modes_i4[it->i4_] = best_i4_mode; score_i4 += best_i4_score; - if (score_i4 >= best_score) { + if (score_i4 >= best_score || i4_bit_sum > bit_limit) { // Intra4 won't be better than Intra16. Bail out and pick Intra16. is_i16 = 1; break; } else { // reconstruct partial block inside yuv_out2_ buffer uint8_t* const tmp_dst = it->yuv_out2_ + Y_OFF_ENC + VP8Scan[it->i4_]; - nz |= ReconstructIntra4(it, tmp_levels[it->i4_], + nz |= ReconstructIntra4(it, rd->y_ac_levels[it->i4_], src, tmp_dst, best_i4_mode) << it->i4_; } } while (VP8IteratorRotateI4(it, it->yuv_out2_ + Y_OFF_ENC)); @@ -1185,8 +1215,7 @@ static void RefineUsingDistortion(VP8EncIterator* const it, // Final reconstruction, depending on which mode is selected. if (!is_i16) { - VP8SetIntra4Mode(it, modes_i4); - memcpy(rd->y_ac_levels, tmp_levels, sizeof(tmp_levels)); + VP8SetIntra4Mode(it, rd->modes_i4); SwapOut(it); best_score = score_i4; } else { @@ -1200,7 +1229,8 @@ static void RefineUsingDistortion(VP8EncIterator* const it, const uint8_t* const src = it->yuv_in_ + U_OFF_ENC; for (mode = 0; mode < NUM_PRED_MODES; ++mode) { const uint8_t* const ref = it->yuv_p_ + VP8UVModeOffsets[mode]; - const score_t score = VP8SSE16x8(src, ref); + const score_t score = VP8SSE16x8(src, ref) * RD_DISTO_MULT + + VP8FixedCostsUV[mode] * lambda_d_uv; if (score < best_uv_score) { best_mode = mode; best_uv_score = score; diff --git a/src/3rdparty/libwebp/src/enc/vp8enci.h b/src/3rdparty/libwebp/src/enc/vp8enci.h index b2cc8d1..c1fbd76 100644 --- a/src/3rdparty/libwebp/src/enc/vp8enci.h +++ b/src/3rdparty/libwebp/src/enc/vp8enci.h @@ -22,10 +22,6 @@ #include "../utils/utils.h" #include "../webp/encode.h" -#ifdef WEBP_EXPERIMENTAL_FEATURES -#include "./vp8li.h" -#endif // WEBP_EXPERIMENTAL_FEATURES - #ifdef __cplusplus extern "C" { #endif @@ -36,7 +32,7 @@ extern "C" { // version numbers #define ENC_MAJ_VERSION 0 #define ENC_MIN_VERSION 5 -#define ENC_REV_VERSION 0 +#define ENC_REV_VERSION 1 enum { MAX_LF_LEVELS = 64, // Maximum loop filter level MAX_VARIABLE_LEVEL = 67, // last (inclusive) level with variable cost @@ -200,6 +196,9 @@ typedef struct { int lambda_i16_, lambda_i4_, lambda_uv_; int lambda_mode_, lambda_trellis_, tlambda_; int lambda_trellis_i16_, lambda_trellis_i4_, lambda_trellis_uv_; + + // lambda values for distortion-based evaluation + score_t i4_penalty_; // penalty for using Intra4 } VP8SegmentInfo; // Handy transient struct to accumulate score and info during RD-optimization @@ -395,6 +394,7 @@ struct VP8Encoder { int method_; // 0=fastest, 6=best/slowest. VP8RDLevel rd_opt_level_; // Deduced from method_. int max_i4_header_bits_; // partition #0 safeness factor + int mb_header_limit_; // rough limit for header bits per MB int thread_level_; // derived from config->thread_level int do_search_; // derived from config->target_XXX int use_tokens_; // if true, use token buffer @@ -477,17 +477,12 @@ int VP8EncFinishAlpha(VP8Encoder* const enc); // finalize compressed data int VP8EncDeleteAlpha(VP8Encoder* const enc); // delete compressed data // in filter.c - -// SSIM utils -typedef struct { - double w, xm, ym, xxm, xym, yym; -} DistoStats; -void VP8SSIMAddStats(const DistoStats* const src, DistoStats* const dst); +void VP8SSIMAddStats(const VP8DistoStats* const src, VP8DistoStats* const dst); void VP8SSIMAccumulatePlane(const uint8_t* src1, int stride1, const uint8_t* src2, int stride2, - int W, int H, DistoStats* const stats); -double VP8SSIMGet(const DistoStats* const stats); -double VP8SSIMGetSquaredError(const DistoStats* const stats); + int W, int H, VP8DistoStats* const stats); +double VP8SSIMGet(const VP8DistoStats* const stats); +double VP8SSIMGetSquaredError(const VP8DistoStats* const stats); // autofilter void VP8InitFilter(VP8EncIterator* const it); diff --git a/src/3rdparty/libwebp/src/enc/vp8l.c b/src/3rdparty/libwebp/src/enc/vp8l.c index db94e78..c16e256 100644 --- a/src/3rdparty/libwebp/src/enc/vp8l.c +++ b/src/3rdparty/libwebp/src/enc/vp8l.c @@ -126,54 +126,8 @@ static int AnalyzeAndCreatePalette(const WebPPicture* const pic, int low_effort, uint32_t palette[MAX_PALETTE_SIZE], int* const palette_size) { - int i, x, y, key; - int num_colors = 0; - uint8_t in_use[MAX_PALETTE_SIZE * 4] = { 0 }; - uint32_t colors[MAX_PALETTE_SIZE * 4]; - static const uint32_t kHashMul = 0x1e35a7bd; - const uint32_t* argb = pic->argb; - const int width = pic->width; - const int height = pic->height; - uint32_t last_pix = ~argb[0]; // so we're sure that last_pix != argb[0] - - for (y = 0; y < height; ++y) { - for (x = 0; x < width; ++x) { - if (argb[x] == last_pix) { - continue; - } - last_pix = argb[x]; - key = (kHashMul * last_pix) >> PALETTE_KEY_RIGHT_SHIFT; - while (1) { - if (!in_use[key]) { - colors[key] = last_pix; - in_use[key] = 1; - ++num_colors; - if (num_colors > MAX_PALETTE_SIZE) { - return 0; - } - break; - } else if (colors[key] == last_pix) { - // The color is already there. - break; - } else { - // Some other color sits there. - // Do linear conflict resolution. - ++key; - key &= (MAX_PALETTE_SIZE * 4 - 1); // key mask for 1K buffer. - } - } - } - argb += pic->argb_stride; - } - - // TODO(skal): could we reuse in_use[] to speed up EncodePalette()? - num_colors = 0; - for (i = 0; i < (int)(sizeof(in_use) / sizeof(in_use[0])); ++i) { - if (in_use[i]) { - palette[num_colors] = colors[i]; - ++num_colors; - } - } + const int num_colors = WebPGetColorPalette(pic, palette); + if (num_colors > MAX_PALETTE_SIZE) return 0; *palette_size = num_colors; qsort(palette, num_colors, sizeof(*palette), PaletteCompareColorsForQsort); if (!low_effort && PaletteHasNonMonotonousDeltas(palette, num_colors)) { @@ -336,7 +290,7 @@ static int AnalyzeEntropy(const uint32_t* argb, } } } - free(histo); + WebPSafeFree(histo); return 1; } else { return 0; @@ -761,6 +715,10 @@ static WebPEncodingError EncodeImageNoHuffman(VP8LBitWriter* const bw, } // Calculate backward references from ARGB image. + if (VP8LHashChainFill(hash_chain, quality, argb, width, height) == 0) { + err = VP8_ENC_ERROR_OUT_OF_MEMORY; + goto Error; + } refs = VP8LGetBackwardReferences(width, height, argb, quality, 0, &cache_bits, hash_chain, refs_array); if (refs == NULL) { @@ -824,7 +782,8 @@ static WebPEncodingError EncodeImageInternal(VP8LBitWriter* const bw, VP8LHashChain* const hash_chain, VP8LBackwardRefs refs_array[2], int width, int height, int quality, - int low_effort, int* cache_bits, + int low_effort, + int use_cache, int* cache_bits, int histogram_bits, size_t init_byte_position, int* const hdr_size, @@ -856,10 +815,14 @@ static WebPEncodingError EncodeImageInternal(VP8LBitWriter* const bw, goto Error; } - *cache_bits = MAX_COLOR_CACHE_BITS; + *cache_bits = use_cache ? MAX_COLOR_CACHE_BITS : 0; // 'best_refs' is the reference to the best backward refs and points to one // of refs_array[0] or refs_array[1]. // Calculate backward references from ARGB image. + if (VP8LHashChainFill(hash_chain, quality, argb, width, height) == 0) { + err = VP8_ENC_ERROR_OUT_OF_MEMORY; + goto Error; + } best_refs = VP8LGetBackwardReferences(width, height, argb, quality, low_effort, cache_bits, hash_chain, refs_array); @@ -1007,14 +970,19 @@ static void ApplySubtractGreen(VP8LEncoder* const enc, int width, int height, static WebPEncodingError ApplyPredictFilter(const VP8LEncoder* const enc, int width, int height, int quality, int low_effort, + int used_subtract_green, VP8LBitWriter* const bw) { const int pred_bits = enc->transform_bits_; const int transform_width = VP8LSubSampleSize(width, pred_bits); const int transform_height = VP8LSubSampleSize(height, pred_bits); + // we disable near-lossless quantization if palette is used. + const int near_lossless_strength = enc->use_palette_ ? 100 + : enc->config_->near_lossless; VP8LResidualImage(width, height, pred_bits, low_effort, enc->argb_, enc->argb_scratch_, enc->transform_data_, - enc->config_->exact); + near_lossless_strength, enc->config_->exact, + used_subtract_green); VP8LPutBits(bw, TRANSFORM_PRESENT, 1); VP8LPutBits(bw, PREDICTOR_TRANSFORM, 2); assert(pred_bits >= 2); @@ -1114,6 +1082,12 @@ static WebPEncodingError WriteImage(const WebPPicture* const pic, // ----------------------------------------------------------------------------- +static void ClearTransformBuffer(VP8LEncoder* const enc) { + WebPSafeFree(enc->transform_mem_); + enc->transform_mem_ = NULL; + enc->transform_mem_size_ = 0; +} + // Allocates the memory for argb (W x H) buffer, 2 rows of context for // prediction and transform data. // Flags influencing the memory allocated: @@ -1122,43 +1096,48 @@ static WebPEncodingError WriteImage(const WebPPicture* const pic, static WebPEncodingError AllocateTransformBuffer(VP8LEncoder* const enc, int width, int height) { WebPEncodingError err = VP8_ENC_OK; - if (enc->argb_ == NULL) { - const int tile_size = 1 << enc->transform_bits_; - const uint64_t image_size = width * height; - // Ensure enough size for tiles, as well as for two scanlines and two - // extra pixels for CopyImageWithPrediction. - const uint64_t argb_scratch_size = - enc->use_predict_ ? tile_size * width + width + 2 : 0; - const int transform_data_size = - (enc->use_predict_ || enc->use_cross_color_) - ? VP8LSubSampleSize(width, enc->transform_bits_) * - VP8LSubSampleSize(height, enc->transform_bits_) - : 0; - const uint64_t total_size = - image_size + WEBP_ALIGN_CST + - argb_scratch_size + WEBP_ALIGN_CST + - (uint64_t)transform_data_size; - uint32_t* mem = (uint32_t*)WebPSafeMalloc(total_size, sizeof(*mem)); + const uint64_t image_size = width * height; + // VP8LResidualImage needs room for 2 scanlines of uint32 pixels with an extra + // pixel in each, plus 2 regular scanlines of bytes. + // TODO(skal): Clean up by using arithmetic in bytes instead of words. + const uint64_t argb_scratch_size = + enc->use_predict_ + ? (width + 1) * 2 + + (width * 2 + sizeof(uint32_t) - 1) / sizeof(uint32_t) + : 0; + const uint64_t transform_data_size = + (enc->use_predict_ || enc->use_cross_color_) + ? VP8LSubSampleSize(width, enc->transform_bits_) * + VP8LSubSampleSize(height, enc->transform_bits_) + : 0; + const uint64_t max_alignment_in_words = + (WEBP_ALIGN_CST + sizeof(uint32_t) - 1) / sizeof(uint32_t); + const uint64_t mem_size = + image_size + max_alignment_in_words + + argb_scratch_size + max_alignment_in_words + + transform_data_size; + uint32_t* mem = enc->transform_mem_; + if (mem == NULL || mem_size > enc->transform_mem_size_) { + ClearTransformBuffer(enc); + mem = (uint32_t*)WebPSafeMalloc(mem_size, sizeof(*mem)); if (mem == NULL) { err = VP8_ENC_ERROR_OUT_OF_MEMORY; goto Error; } - enc->argb_ = mem; - mem = (uint32_t*)WEBP_ALIGN(mem + image_size); - enc->argb_scratch_ = mem; - mem = (uint32_t*)WEBP_ALIGN(mem + argb_scratch_size); - enc->transform_data_ = mem; - enc->current_width_ = width; + enc->transform_mem_ = mem; + enc->transform_mem_size_ = (size_t)mem_size; } + enc->argb_ = mem; + mem = (uint32_t*)WEBP_ALIGN(mem + image_size); + enc->argb_scratch_ = mem; + mem = (uint32_t*)WEBP_ALIGN(mem + argb_scratch_size); + enc->transform_data_ = mem; + + enc->current_width_ = width; Error: return err; } -static void ClearTransformBuffer(VP8LEncoder* const enc) { - WebPSafeFree(enc->argb_); - enc->argb_ = NULL; -} - static WebPEncodingError MakeInputImageCopy(VP8LEncoder* const enc) { WebPEncodingError err = VP8_ENC_OK; const WebPPicture* const picture = enc->pic_; @@ -1178,8 +1157,35 @@ static WebPEncodingError MakeInputImageCopy(VP8LEncoder* const enc) { // ----------------------------------------------------------------------------- -static void MapToPalette(const uint32_t palette[], int num_colors, +static int SearchColor(const uint32_t sorted[], uint32_t color, int hi) { + int low = 0; + if (sorted[low] == color) return low; // loop invariant: sorted[low] != color + while (1) { + const int mid = (low + hi) >> 1; + if (sorted[mid] == color) { + return mid; + } else if (sorted[mid] < color) { + low = mid; + } else { + hi = mid; + } + } +} + +// Sort palette in increasing order and prepare an inverse mapping array. +static void PrepareMapToPalette(const uint32_t palette[], int num_colors, + uint32_t sorted[], int idx_map[]) { + int i; + memcpy(sorted, palette, num_colors * sizeof(*sorted)); + qsort(sorted, num_colors, sizeof(*sorted), PaletteCompareColorsForQsort); + for (i = 0; i < num_colors; ++i) { + idx_map[SearchColor(sorted, palette[i], num_colors)] = i; + } +} + +static void MapToPalette(const uint32_t sorted_palette[], int num_colors, uint32_t* const last_pix, int* const last_idx, + const int idx_map[], const uint32_t* src, uint8_t* dst, int width) { int x; int prev_idx = *last_idx; @@ -1187,14 +1193,8 @@ static void MapToPalette(const uint32_t palette[], int num_colors, for (x = 0; x < width; ++x) { const uint32_t pix = src[x]; if (pix != prev_pix) { - int i; - for (i = 0; i < num_colors; ++i) { - if (pix == palette[i]) { - prev_idx = i; - prev_pix = pix; - break; - } - } + prev_idx = idx_map[SearchColor(sorted_palette, pix, num_colors)]; + prev_pix = pix; } dst[x] = prev_idx; } @@ -1241,11 +1241,16 @@ static WebPEncodingError ApplyPalette(const uint32_t* src, uint32_t src_stride, } } else { // Use 1 pixel cache for ARGB pixels. - uint32_t last_pix = palette[0]; - int last_idx = 0; + uint32_t last_pix; + int last_idx; + uint32_t sorted[MAX_PALETTE_SIZE]; + int idx_map[MAX_PALETTE_SIZE]; + PrepareMapToPalette(palette, palette_size, sorted, idx_map); + last_pix = palette[0]; + last_idx = 0; for (y = 0; y < height; ++y) { - MapToPalette(palette, palette_size, &last_pix, &last_idx, - src, tmp_row, width); + MapToPalette(sorted, palette_size, &last_pix, &last_idx, + idx_map, src, tmp_row, width); VP8LBundleColorMap(tmp_row, width, xbits, dst); src += src_stride; dst += dst_stride; @@ -1378,7 +1383,7 @@ static void VP8LEncoderDelete(VP8LEncoder* enc) { WebPEncodingError VP8LEncodeStream(const WebPConfig* const config, const WebPPicture* const picture, - VP8LBitWriter* const bw) { + VP8LBitWriter* const bw, int use_cache) { WebPEncodingError err = VP8_ENC_OK; const int quality = (int)config->quality; const int low_effort = (config->method == 0); @@ -1405,7 +1410,8 @@ WebPEncodingError VP8LEncodeStream(const WebPConfig* const config, } // Apply near-lossless preprocessing. - use_near_lossless = !enc->use_palette_ && (config->near_lossless < 100); + use_near_lossless = + (config->near_lossless < 100) && !enc->use_palette_ && !enc->use_predict_; if (use_near_lossless) { if (!VP8ApplyNearLossless(width, height, picture->argb, config->near_lossless)) { @@ -1457,7 +1463,7 @@ WebPEncodingError VP8LEncodeStream(const WebPConfig* const config, if (enc->use_predict_) { err = ApplyPredictFilter(enc, enc->current_width_, height, quality, - low_effort, bw); + low_effort, enc->use_subtract_green_, bw); if (err != VP8_ENC_OK) goto Error; } @@ -1474,8 +1480,8 @@ WebPEncodingError VP8LEncodeStream(const WebPConfig* const config, // Encode and write the transformed image. err = EncodeImageInternal(bw, enc->argb_, &enc->hash_chain_, enc->refs_, enc->current_width_, height, quality, low_effort, - &enc->cache_bits_, enc->histo_bits_, byte_position, - &hdr_size, &data_size); + use_cache, &enc->cache_bits_, enc->histo_bits_, + byte_position, &hdr_size, &data_size); if (err != VP8_ENC_OK) goto Error; if (picture->stats != NULL) { @@ -1560,7 +1566,7 @@ int VP8LEncodeImage(const WebPConfig* const config, if (!WebPReportProgress(picture, 5, &percent)) goto UserAbort; // Encode main image stream. - err = VP8LEncodeStream(config, picture, &bw); + err = VP8LEncodeStream(config, picture, &bw, 1 /*use_cache*/); if (err != VP8_ENC_OK) goto Error; // TODO(skal): have a fine-grained progress report in VP8LEncodeStream(). diff --git a/src/3rdparty/libwebp/src/enc/vp8li.h b/src/3rdparty/libwebp/src/enc/vp8li.h index 6b6db12..371e276 100644 --- a/src/3rdparty/libwebp/src/enc/vp8li.h +++ b/src/3rdparty/libwebp/src/enc/vp8li.h @@ -25,14 +25,17 @@ extern "C" { #endif typedef struct { - const WebPConfig* config_; // user configuration and parameters - const WebPPicture* pic_; // input picture. + const WebPConfig* config_; // user configuration and parameters + const WebPPicture* pic_; // input picture. - uint32_t* argb_; // Transformed argb image data. - uint32_t* argb_scratch_; // Scratch memory for argb rows - // (used for prediction). - uint32_t* transform_data_; // Scratch memory for transform data. - int current_width_; // Corresponds to packed image width. + uint32_t* argb_; // Transformed argb image data. + uint32_t* argb_scratch_; // Scratch memory for argb rows + // (used for prediction). + uint32_t* transform_data_; // Scratch memory for transform data. + uint32_t* transform_mem_; // Currently allocated memory. + size_t transform_mem_size_; // Currently allocated memory size. + + int current_width_; // Corresponds to packed image width. // Encoding parameters derived from quality parameter. int histo_bits_; @@ -64,9 +67,10 @@ int VP8LEncodeImage(const WebPConfig* const config, const WebPPicture* const picture); // Encodes the main image stream using the supplied bit writer. +// If 'use_cache' is false, disables the use of color cache. WebPEncodingError VP8LEncodeStream(const WebPConfig* const config, const WebPPicture* const picture, - VP8LBitWriter* const bw); + VP8LBitWriter* const bw, int use_cache); //------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/enc/webpenc.c b/src/3rdparty/libwebp/src/enc/webpenc.c index fece736..a7d04ea 100644 --- a/src/3rdparty/libwebp/src/enc/webpenc.c +++ b/src/3rdparty/libwebp/src/enc/webpenc.c @@ -105,6 +105,10 @@ static void MapConfigToTools(VP8Encoder* const enc) { 256 * 16 * 16 * // upper bound: up to 16bit per 4x4 block (limit * limit) / (100 * 100); // ... modulated with a quadratic curve. + // partition0 = 512k max. + enc->mb_header_limit_ = + (score_t)256 * 510 * 8 * 1024 / (enc->mb_w_ * enc->mb_h_); + enc->thread_level_ = config->thread_level; enc->do_search_ = (config->target_size > 0 || config->target_PSNR > 0); diff --git a/src/3rdparty/libwebp/src/mux/anim_encode.c b/src/3rdparty/libwebp/src/mux/anim_encode.c index fa86eaa..53e2906 100644 --- a/src/3rdparty/libwebp/src/mux/anim_encode.c +++ b/src/3rdparty/libwebp/src/mux/anim_encode.c @@ -12,7 +12,9 @@ #include <assert.h> #include <limits.h> +#include <math.h> // for pow() #include <stdio.h> +#include <stdlib.h> // for abs() #include "../utils/utils.h" #include "../webp/decode.h" @@ -49,8 +51,10 @@ struct WebPAnimEncoder { FrameRect prev_rect_; // Previous WebP frame rectangle. WebPConfig last_config_; // Cached in case a re-encode is needed. - WebPConfig last_config2_; // 2nd cached config; only valid if - // 'options_.allow_mixed' is true. + WebPConfig last_config_reversed_; // If 'last_config_' uses lossless, then + // this config uses lossy and vice versa; + // only valid if 'options_.allow_mixed' + // is true. WebPPicture* curr_canvas_; // Only pointer; we don't own memory. @@ -173,6 +177,7 @@ static void DefaultEncoderOptions(WebPAnimEncoderOptions* const enc_options) { enc_options->minimize_size = 0; DisableKeyframes(enc_options); enc_options->allow_mixed = 0; + enc_options->verbose = 0; } int WebPAnimEncoderOptionsInitInternal(WebPAnimEncoderOptions* enc_options, @@ -185,7 +190,8 @@ int WebPAnimEncoderOptionsInitInternal(WebPAnimEncoderOptions* enc_options, return 1; } -#define TRANSPARENT_COLOR 0x00ffffff +// This starting value is more fit to WebPCleanupTransparentAreaLossless(). +#define TRANSPARENT_COLOR 0x00000000 static void ClearRectangle(WebPPicture* const picture, int left, int top, int width, int height) { @@ -338,11 +344,16 @@ static EncodedFrame* GetFrame(const WebPAnimEncoder* const enc, return &enc->encoded_frames_[enc->start_ + position]; } -// Returns true if 'length' number of pixels in 'src' and 'dst' are identical, +typedef int (*ComparePixelsFunc)(const uint32_t*, int, const uint32_t*, int, + int, int); + +// Returns true if 'length' number of pixels in 'src' and 'dst' are equal, // assuming the given step sizes between pixels. -static WEBP_INLINE int ComparePixels(const uint32_t* src, int src_step, - const uint32_t* dst, int dst_step, - int length) { +// 'max_allowed_diff' is unused and only there to allow function pointer use. +static WEBP_INLINE int ComparePixelsLossless(const uint32_t* src, int src_step, + const uint32_t* dst, int dst_step, + int length, int max_allowed_diff) { + (void)max_allowed_diff; assert(length > 0); while (length-- > 0) { if (*src != *dst) { @@ -354,15 +365,62 @@ static WEBP_INLINE int ComparePixels(const uint32_t* src, int src_step, return 1; } +// Helper to check if each channel in 'src' and 'dst' is at most off by +// 'max_allowed_diff'. +static WEBP_INLINE int PixelsAreSimilar(uint32_t src, uint32_t dst, + int max_allowed_diff) { + const int src_a = (src >> 24) & 0xff; + const int src_r = (src >> 16) & 0xff; + const int src_g = (src >> 8) & 0xff; + const int src_b = (src >> 0) & 0xff; + const int dst_a = (dst >> 24) & 0xff; + const int dst_r = (dst >> 16) & 0xff; + const int dst_g = (dst >> 8) & 0xff; + const int dst_b = (dst >> 0) & 0xff; + + return (abs(src_r * src_a - dst_r * dst_a) <= (max_allowed_diff * 255)) && + (abs(src_g * src_a - dst_g * dst_a) <= (max_allowed_diff * 255)) && + (abs(src_b * src_a - dst_b * dst_a) <= (max_allowed_diff * 255)) && + (abs(src_a - dst_a) <= max_allowed_diff); +} + +// Returns true if 'length' number of pixels in 'src' and 'dst' are within an +// error bound, assuming the given step sizes between pixels. +static WEBP_INLINE int ComparePixelsLossy(const uint32_t* src, int src_step, + const uint32_t* dst, int dst_step, + int length, int max_allowed_diff) { + assert(length > 0); + while (length-- > 0) { + if (!PixelsAreSimilar(*src, *dst, max_allowed_diff)) { + return 0; + } + src += src_step; + dst += dst_step; + } + return 1; +} + static int IsEmptyRect(const FrameRect* const rect) { return (rect->width_ == 0) || (rect->height_ == 0); } +static int QualityToMaxDiff(float quality) { + const double val = pow(quality / 100., 0.5); + const double max_diff = 31 * (1 - val) + 1 * val; + return (int)(max_diff + 0.5); +} + // Assumes that an initial valid guess of change rectangle 'rect' is passed. static void MinimizeChangeRectangle(const WebPPicture* const src, const WebPPicture* const dst, - FrameRect* const rect) { + FrameRect* const rect, + int is_lossless, float quality) { int i, j; + const ComparePixelsFunc compare_pixels = + is_lossless ? ComparePixelsLossless : ComparePixelsLossy; + const int max_allowed_diff_lossy = QualityToMaxDiff(quality); + const int max_allowed_diff = is_lossless ? 0 : max_allowed_diff_lossy; + // Sanity checks. assert(src->width == dst->width && src->height == dst->height); assert(rect->x_offset_ + rect->width_ <= dst->width); @@ -374,8 +432,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src, &src->argb[rect->y_offset_ * src->argb_stride + i]; const uint32_t* const dst_argb = &dst->argb[rect->y_offset_ * dst->argb_stride + i]; - if (ComparePixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride, - rect->height_)) { + if (compare_pixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride, + rect->height_, max_allowed_diff)) { --rect->width_; // Redundant column. ++rect->x_offset_; } else { @@ -390,8 +448,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src, &src->argb[rect->y_offset_ * src->argb_stride + i]; const uint32_t* const dst_argb = &dst->argb[rect->y_offset_ * dst->argb_stride + i]; - if (ComparePixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride, - rect->height_)) { + if (compare_pixels(src_argb, src->argb_stride, dst_argb, dst->argb_stride, + rect->height_, max_allowed_diff)) { --rect->width_; // Redundant column. } else { break; @@ -405,7 +463,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src, &src->argb[j * src->argb_stride + rect->x_offset_]; const uint32_t* const dst_argb = &dst->argb[j * dst->argb_stride + rect->x_offset_]; - if (ComparePixels(src_argb, 1, dst_argb, 1, rect->width_)) { + if (compare_pixels(src_argb, 1, dst_argb, 1, rect->width_, + max_allowed_diff)) { --rect->height_; // Redundant row. ++rect->y_offset_; } else { @@ -420,7 +479,8 @@ static void MinimizeChangeRectangle(const WebPPicture* const src, &src->argb[j * src->argb_stride + rect->x_offset_]; const uint32_t* const dst_argb = &dst->argb[j * dst->argb_stride + rect->x_offset_]; - if (ComparePixels(src_argb, 1, dst_argb, 1, rect->width_)) { + if (compare_pixels(src_argb, 1, dst_argb, 1, rect->width_, + max_allowed_diff)) { --rect->height_; // Redundant row. } else { break; @@ -445,20 +505,46 @@ static WEBP_INLINE void SnapToEvenOffsets(FrameRect* const rect) { rect->y_offset_ &= ~1; } +typedef struct { + int should_try_; // Should try this set of parameters. + int empty_rect_allowed_; // Frame with empty rectangle can be skipped. + FrameRect rect_ll_; // Frame rectangle for lossless compression. + WebPPicture sub_frame_ll_; // Sub-frame pic for lossless compression. + FrameRect rect_lossy_; // Frame rectangle for lossy compression. + // Could be smaller than rect_ll_ as pixels + // with small diffs can be ignored. + WebPPicture sub_frame_lossy_; // Sub-frame pic for lossless compression. +} SubFrameParams; + +static int SubFrameParamsInit(SubFrameParams* const params, + int should_try, int empty_rect_allowed) { + params->should_try_ = should_try; + params->empty_rect_allowed_ = empty_rect_allowed; + if (!WebPPictureInit(¶ms->sub_frame_ll_) || + !WebPPictureInit(¶ms->sub_frame_lossy_)) { + return 0; + } + return 1; +} + +static void SubFrameParamsFree(SubFrameParams* const params) { + WebPPictureFree(¶ms->sub_frame_ll_); + WebPPictureFree(¶ms->sub_frame_lossy_); +} + // Given previous and current canvas, picks the optimal rectangle for the -// current frame. The initial guess for 'rect' will be the full canvas. +// current frame based on 'is_lossless' and other parameters. Assumes that the +// initial guess 'rect' is valid. static int GetSubRect(const WebPPicture* const prev_canvas, const WebPPicture* const curr_canvas, int is_key_frame, int is_first_frame, int empty_rect_allowed, - FrameRect* const rect, WebPPicture* const sub_frame) { - rect->x_offset_ = 0; - rect->y_offset_ = 0; - rect->width_ = curr_canvas->width; - rect->height_ = curr_canvas->height; + int is_lossless, float quality, FrameRect* const rect, + WebPPicture* const sub_frame) { if (!is_key_frame || is_first_frame) { // Optimize frame rectangle. // Note: This behaves as expected for first frame, as 'prev_canvas' is // initialized to a fully transparent canvas in the beginning. - MinimizeChangeRectangle(prev_canvas, curr_canvas, rect); + MinimizeChangeRectangle(prev_canvas, curr_canvas, rect, + is_lossless, quality); } if (IsEmptyRect(rect)) { @@ -477,6 +563,29 @@ static int GetSubRect(const WebPPicture* const prev_canvas, rect->width_, rect->height_, sub_frame); } +// Picks optimal frame rectangle for both lossless and lossy compression. The +// initial guess for frame rectangles will be the full canvas. +static int GetSubRects(const WebPPicture* const prev_canvas, + const WebPPicture* const curr_canvas, int is_key_frame, + int is_first_frame, float quality, + SubFrameParams* const params) { + // Lossless frame rectangle. + params->rect_ll_.x_offset_ = 0; + params->rect_ll_.y_offset_ = 0; + params->rect_ll_.width_ = curr_canvas->width; + params->rect_ll_.height_ = curr_canvas->height; + if (!GetSubRect(prev_canvas, curr_canvas, is_key_frame, is_first_frame, + params->empty_rect_allowed_, 1, quality, + ¶ms->rect_ll_, ¶ms->sub_frame_ll_)) { + return 0; + } + // Lossy frame rectangle. + params->rect_lossy_ = params->rect_ll_; // seed with lossless rect. + return GetSubRect(prev_canvas, curr_canvas, is_key_frame, is_first_frame, + params->empty_rect_allowed_, 0, quality, + ¶ms->rect_lossy_, ¶ms->sub_frame_lossy_); +} + static void DisposeFrameRectangle(int dispose_method, const FrameRect* const rect, WebPPicture* const curr_canvas) { @@ -490,9 +599,9 @@ static uint32_t RectArea(const FrameRect* const rect) { return (uint32_t)rect->width_ * rect->height_; } -static int IsBlendingPossible(const WebPPicture* const src, - const WebPPicture* const dst, - const FrameRect* const rect) { +static int IsLosslessBlendingPossible(const WebPPicture* const src, + const WebPPicture* const dst, + const FrameRect* const rect) { int i, j; assert(src->width == dst->width && src->height == dst->height); assert(rect->x_offset_ + rect->width_ <= dst->width); @@ -512,88 +621,66 @@ static int IsBlendingPossible(const WebPPicture* const src, return 1; } -#define MIN_COLORS_LOSSY 31 // Don't try lossy below this threshold. -#define MAX_COLORS_LOSSLESS 194 // Don't try lossless above this threshold. -#define MAX_COLOR_COUNT 256 // Power of 2 greater than MAX_COLORS_LOSSLESS. -#define HASH_SIZE (MAX_COLOR_COUNT * 4) -#define HASH_RIGHT_SHIFT 22 // 32 - log2(HASH_SIZE). - -// TODO(urvang): Also used in enc/vp8l.c. Move to utils. -// If the number of colors in the 'pic' is at least MAX_COLOR_COUNT, return -// MAX_COLOR_COUNT. Otherwise, return the exact number of colors in the 'pic'. -static int GetColorCount(const WebPPicture* const pic) { - int x, y; - int num_colors = 0; - uint8_t in_use[HASH_SIZE] = { 0 }; - uint32_t colors[HASH_SIZE]; - static const uint32_t kHashMul = 0x1e35a7bd; - const uint32_t* argb = pic->argb; - const int width = pic->width; - const int height = pic->height; - uint32_t last_pix = ~argb[0]; // so we're sure that last_pix != argb[0] - - for (y = 0; y < height; ++y) { - for (x = 0; x < width; ++x) { - int key; - if (argb[x] == last_pix) { - continue; - } - last_pix = argb[x]; - key = (kHashMul * last_pix) >> HASH_RIGHT_SHIFT; - while (1) { - if (!in_use[key]) { - colors[key] = last_pix; - in_use[key] = 1; - ++num_colors; - if (num_colors >= MAX_COLOR_COUNT) { - return MAX_COLOR_COUNT; // Exact count not needed. - } - break; - } else if (colors[key] == last_pix) { - break; // The color is already there. - } else { - // Some other color sits here, so do linear conflict resolution. - ++key; - key &= (HASH_SIZE - 1); // Key mask. - } +static int IsLossyBlendingPossible(const WebPPicture* const src, + const WebPPicture* const dst, + const FrameRect* const rect, + float quality) { + const int max_allowed_diff_lossy = QualityToMaxDiff(quality); + int i, j; + assert(src->width == dst->width && src->height == dst->height); + assert(rect->x_offset_ + rect->width_ <= dst->width); + assert(rect->y_offset_ + rect->height_ <= dst->height); + for (j = rect->y_offset_; j < rect->y_offset_ + rect->height_; ++j) { + for (i = rect->x_offset_; i < rect->x_offset_ + rect->width_; ++i) { + const uint32_t src_pixel = src->argb[j * src->argb_stride + i]; + const uint32_t dst_pixel = dst->argb[j * dst->argb_stride + i]; + const uint32_t dst_alpha = dst_pixel >> 24; + if (dst_alpha != 0xff && + !PixelsAreSimilar(src_pixel, dst_pixel, max_allowed_diff_lossy)) { + // In this case, if we use blending, we can't attain the desired + // 'dst_pixel' value for this pixel. So, blending is not possible. + return 0; } } - argb += pic->argb_stride; } - return num_colors; + return 1; } -#undef MAX_COLOR_COUNT -#undef HASH_SIZE -#undef HASH_RIGHT_SHIFT - // For pixels in 'rect', replace those pixels in 'dst' that are same as 'src' by // transparent pixels. -static void IncreaseTransparency(const WebPPicture* const src, - const FrameRect* const rect, - WebPPicture* const dst) { +// Returns true if at least one pixel gets modified. +static int IncreaseTransparency(const WebPPicture* const src, + const FrameRect* const rect, + WebPPicture* const dst) { int i, j; + int modified = 0; assert(src != NULL && dst != NULL && rect != NULL); assert(src->width == dst->width && src->height == dst->height); for (j = rect->y_offset_; j < rect->y_offset_ + rect->height_; ++j) { const uint32_t* const psrc = src->argb + j * src->argb_stride; uint32_t* const pdst = dst->argb + j * dst->argb_stride; for (i = rect->x_offset_; i < rect->x_offset_ + rect->width_; ++i) { - if (psrc[i] == pdst[i]) { + if (psrc[i] == pdst[i] && pdst[i] != TRANSPARENT_COLOR) { pdst[i] = TRANSPARENT_COLOR; + modified = 1; } } } + return modified; } #undef TRANSPARENT_COLOR // Replace similar blocks of pixels by a 'see-through' transparent block // with uniform average color. -static void FlattenSimilarBlocks(const WebPPicture* const src, - const FrameRect* const rect, - WebPPicture* const dst) { +// Assumes lossy compression is being used. +// Returns true if at least one pixel gets modified. +static int FlattenSimilarBlocks(const WebPPicture* const src, + const FrameRect* const rect, + WebPPicture* const dst, float quality) { + const int max_allowed_diff_lossy = QualityToMaxDiff(quality); int i, j; + int modified = 0; const int block_size = 8; const int y_start = (rect->y_offset_ + block_size) & ~(block_size - 1); const int y_end = (rect->y_offset_ + rect->height_) & ~(block_size - 1); @@ -615,11 +702,12 @@ static void FlattenSimilarBlocks(const WebPPicture* const src, const uint32_t src_pixel = psrc[x + y * src->argb_stride]; const int alpha = src_pixel >> 24; if (alpha == 0xff && - src_pixel == pdst[x + y * dst->argb_stride]) { - ++cnt; - avg_r += (src_pixel >> 16) & 0xff; - avg_g += (src_pixel >> 8) & 0xff; - avg_b += (src_pixel >> 0) & 0xff; + PixelsAreSimilar(src_pixel, pdst[x + y * dst->argb_stride], + max_allowed_diff_lossy)) { + ++cnt; + avg_r += (src_pixel >> 16) & 0xff; + avg_g += (src_pixel >> 8) & 0xff; + avg_b += (src_pixel >> 0) & 0xff; } } } @@ -635,9 +723,11 @@ static void FlattenSimilarBlocks(const WebPPicture* const src, pdst[x + y * dst->argb_stride] = color; } } + modified = 1; } } } + return modified; } static int EncodeFrame(const WebPConfig* const config, WebPPicture* const pic, @@ -662,9 +752,10 @@ typedef struct { // Generates a candidate encoded frame given a picture and metadata. static WebPEncodingError EncodeCandidate(WebPPicture* const sub_frame, const FrameRect* const rect, - const WebPConfig* const config, + const WebPConfig* const encoder_config, int use_blending, Candidate* const candidate) { + WebPConfig config = *encoder_config; WebPEncodingError error_code = VP8_ENC_OK; assert(candidate != NULL); memset(candidate, 0, sizeof(*candidate)); @@ -682,7 +773,13 @@ static WebPEncodingError EncodeCandidate(WebPPicture* const sub_frame, // Encode picture. WebPMemoryWriterInit(&candidate->mem_); - if (!EncodeFrame(config, sub_frame, &candidate->mem_)) { + if (!config.lossless && use_blending) { + // Disable filtering to avoid blockiness in reconstructed frames at the + // time of decoding. + config.autofilter = 0; + config.filter_strength = 0; + } + if (!EncodeFrame(&config, sub_frame, &candidate->mem_)) { error_code = sub_frame->error_code; goto Err; } @@ -698,6 +795,8 @@ static WebPEncodingError EncodeCandidate(WebPPicture* const sub_frame, static void CopyCurrentCanvas(WebPAnimEncoder* const enc) { if (enc->curr_canvas_copy_modified_) { WebPCopyPixels(enc->curr_canvas_, &enc->curr_canvas_copy_); + enc->curr_canvas_copy_.progress_hook = enc->curr_canvas_->progress_hook; + enc->curr_canvas_copy_.user_data = enc->curr_canvas_->user_data; enc->curr_canvas_copy_modified_ = 0; } } @@ -710,12 +809,15 @@ enum { CANDIDATE_COUNT }; -// Generates candidates for a given dispose method given pre-filled 'rect' -// and 'sub_frame'. +#define MIN_COLORS_LOSSY 31 // Don't try lossy below this threshold. +#define MAX_COLORS_LOSSLESS 194 // Don't try lossless above this threshold. + +// Generates candidates for a given dispose method given pre-filled sub-frame +// 'params'. static WebPEncodingError GenerateCandidates( WebPAnimEncoder* const enc, Candidate candidates[CANDIDATE_COUNT], WebPMuxAnimDispose dispose_method, int is_lossless, int is_key_frame, - const FrameRect* const rect, WebPPicture* sub_frame, + SubFrameParams* const params, const WebPConfig* const config_ll, const WebPConfig* const config_lossy) { WebPEncodingError error_code = VP8_ENC_OK; const int is_dispose_none = (dispose_method == WEBP_MUX_DISPOSE_NONE); @@ -727,16 +829,24 @@ static WebPEncodingError GenerateCandidates( WebPPicture* const curr_canvas = &enc->curr_canvas_copy_; const WebPPicture* const prev_canvas = is_dispose_none ? &enc->prev_canvas_ : &enc->prev_canvas_disposed_; - const int use_blending = + int use_blending_ll; + int use_blending_lossy; + + CopyCurrentCanvas(enc); + use_blending_ll = + !is_key_frame && + IsLosslessBlendingPossible(prev_canvas, curr_canvas, ¶ms->rect_ll_); + use_blending_lossy = !is_key_frame && - IsBlendingPossible(prev_canvas, curr_canvas, rect); + IsLossyBlendingPossible(prev_canvas, curr_canvas, ¶ms->rect_lossy_, + config_lossy->quality); // Pick candidates to be tried. if (!enc->options_.allow_mixed) { candidate_ll->evaluate_ = is_lossless; candidate_lossy->evaluate_ = !is_lossless; } else { // Use a heuristic for trying lossless and/or lossy compression. - const int num_colors = GetColorCount(sub_frame); + const int num_colors = WebPGetColorPalette(¶ms->sub_frame_ll_, NULL); candidate_ll->evaluate_ = (num_colors < MAX_COLORS_LOSSLESS); candidate_lossy->evaluate_ = (num_colors >= MIN_COLORS_LOSSY); } @@ -744,23 +854,26 @@ static WebPEncodingError GenerateCandidates( // Generate candidates. if (candidate_ll->evaluate_) { CopyCurrentCanvas(enc); - if (use_blending) { - IncreaseTransparency(prev_canvas, rect, curr_canvas); - enc->curr_canvas_copy_modified_ = 1; + if (use_blending_ll) { + enc->curr_canvas_copy_modified_ = + IncreaseTransparency(prev_canvas, ¶ms->rect_ll_, curr_canvas); } - error_code = EncodeCandidate(sub_frame, rect, config_ll, use_blending, - candidate_ll); + error_code = EncodeCandidate(¶ms->sub_frame_ll_, ¶ms->rect_ll_, + config_ll, use_blending_ll, candidate_ll); if (error_code != VP8_ENC_OK) return error_code; } if (candidate_lossy->evaluate_) { CopyCurrentCanvas(enc); - if (use_blending) { - FlattenSimilarBlocks(prev_canvas, rect, curr_canvas); - enc->curr_canvas_copy_modified_ = 1; + if (use_blending_lossy) { + enc->curr_canvas_copy_modified_ = + FlattenSimilarBlocks(prev_canvas, ¶ms->rect_lossy_, curr_canvas, + config_lossy->quality); } - error_code = EncodeCandidate(sub_frame, rect, config_lossy, use_blending, - candidate_lossy); + error_code = + EncodeCandidate(¶ms->sub_frame_lossy_, ¶ms->rect_lossy_, + config_lossy, use_blending_lossy, candidate_lossy); if (error_code != VP8_ENC_OK) return error_code; + enc->curr_canvas_copy_modified_ = 1; } return error_code; } @@ -918,13 +1031,16 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc, const int is_lossless = config->lossless; const int is_first_frame = enc->is_first_frame_; - int try_dispose_none = 1; // Default. - FrameRect rect_none; - WebPPicture sub_frame_none; // First frame cannot be skipped as there is no 'previous frame' to merge it // to. So, empty rectangle is not allowed for the first frame. const int empty_rect_allowed_none = !is_first_frame; + // Even if there is exact pixel match between 'disposed previous canvas' and + // 'current canvas', we can't skip current frame, as there may not be exact + // pixel match between 'previous canvas' and 'current canvas'. So, we don't + // allow empty rectangle in this case. + const int empty_rect_allowed_bg = 0; + // If current frame is a key-frame, dispose method of previous frame doesn't // matter, so we don't try dispose to background. // Also, if key-frame insertion is on, and previous frame could be picked as @@ -933,19 +1049,20 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc, // background. const int dispose_bg_possible = !is_key_frame && !enc->prev_candidate_undecided_; - int try_dispose_bg = 0; // Default. - FrameRect rect_bg; - WebPPicture sub_frame_bg; + + SubFrameParams dispose_none_params; + SubFrameParams dispose_bg_params; WebPConfig config_ll = *config; WebPConfig config_lossy = *config; config_ll.lossless = 1; config_lossy.lossless = 0; enc->last_config_ = *config; - enc->last_config2_ = config->lossless ? config_lossy : config_ll; + enc->last_config_reversed_ = config->lossless ? config_lossy : config_ll; *frame_skipped = 0; - if (!WebPPictureInit(&sub_frame_none) || !WebPPictureInit(&sub_frame_bg)) { + if (!SubFrameParamsInit(&dispose_none_params, 1, empty_rect_allowed_none) || + !SubFrameParamsInit(&dispose_bg_params, 0, empty_rect_allowed_bg)) { return VP8_ENC_ERROR_INVALID_CONFIGURATION; } @@ -954,10 +1071,14 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc, } // Change-rectangle assuming previous frame was DISPOSE_NONE. - GetSubRect(prev_canvas, curr_canvas, is_key_frame, is_first_frame, - empty_rect_allowed_none, &rect_none, &sub_frame_none); + if (!GetSubRects(prev_canvas, curr_canvas, is_key_frame, is_first_frame, + config_lossy.quality, &dispose_none_params)) { + error_code = VP8_ENC_ERROR_INVALID_CONFIGURATION; + goto Err; + } - if (IsEmptyRect(&rect_none)) { + if ((is_lossless && IsEmptyRect(&dispose_none_params.rect_ll_)) || + (!is_lossless && IsEmptyRect(&dispose_none_params.rect_lossy_))) { // Don't encode the frame at all. Instead, the duration of the previous // frame will be increased later. assert(empty_rect_allowed_none); @@ -971,36 +1092,43 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc, WebPCopyPixels(prev_canvas, prev_canvas_disposed); DisposeFrameRectangle(WEBP_MUX_DISPOSE_BACKGROUND, &enc->prev_rect_, prev_canvas_disposed); - // Even if there is exact pixel match between 'disposed previous canvas' and - // 'current canvas', we can't skip current frame, as there may not be exact - // pixel match between 'previous canvas' and 'current canvas'. So, we don't - // allow empty rectangle in this case. - GetSubRect(prev_canvas_disposed, curr_canvas, is_key_frame, is_first_frame, - 0 /* empty_rect_allowed */, &rect_bg, &sub_frame_bg); - assert(!IsEmptyRect(&rect_bg)); + + if (!GetSubRects(prev_canvas_disposed, curr_canvas, is_key_frame, + is_first_frame, config_lossy.quality, + &dispose_bg_params)) { + error_code = VP8_ENC_ERROR_INVALID_CONFIGURATION; + goto Err; + } + assert(!IsEmptyRect(&dispose_bg_params.rect_ll_)); + assert(!IsEmptyRect(&dispose_bg_params.rect_lossy_)); if (enc->options_.minimize_size) { // Try both dispose methods. - try_dispose_bg = 1; - try_dispose_none = 1; - } else if (RectArea(&rect_bg) < RectArea(&rect_none)) { - try_dispose_bg = 1; // Pick DISPOSE_BACKGROUND. - try_dispose_none = 0; + dispose_bg_params.should_try_ = 1; + dispose_none_params.should_try_ = 1; + } else if ((is_lossless && + RectArea(&dispose_bg_params.rect_ll_) < + RectArea(&dispose_none_params.rect_ll_)) || + (!is_lossless && + RectArea(&dispose_bg_params.rect_lossy_) < + RectArea(&dispose_none_params.rect_lossy_))) { + dispose_bg_params.should_try_ = 1; // Pick DISPOSE_BACKGROUND. + dispose_none_params.should_try_ = 0; } } - if (try_dispose_none) { + if (dispose_none_params.should_try_) { error_code = GenerateCandidates( enc, candidates, WEBP_MUX_DISPOSE_NONE, is_lossless, is_key_frame, - &rect_none, &sub_frame_none, &config_ll, &config_lossy); + &dispose_none_params, &config_ll, &config_lossy); if (error_code != VP8_ENC_OK) goto Err; } - if (try_dispose_bg) { + if (dispose_bg_params.should_try_) { assert(!enc->is_first_frame_); assert(dispose_bg_possible); error_code = GenerateCandidates( enc, candidates, WEBP_MUX_DISPOSE_BACKGROUND, is_lossless, is_key_frame, - &rect_bg, &sub_frame_bg, &config_ll, &config_lossy); + &dispose_bg_params, &config_ll, &config_lossy); if (error_code != VP8_ENC_OK) goto Err; } @@ -1016,8 +1144,8 @@ static WebPEncodingError SetFrame(WebPAnimEncoder* const enc, } End: - WebPPictureFree(&sub_frame_none); - WebPPictureFree(&sub_frame_bg); + SubFrameParamsFree(&dispose_none_params); + SubFrameParamsFree(&dispose_bg_params); return error_code; } @@ -1163,6 +1291,7 @@ static int FlushFrames(WebPAnimEncoder* const enc) { int WebPAnimEncoderAdd(WebPAnimEncoder* enc, WebPPicture* frame, int timestamp, const WebPConfig* encoder_config) { WebPConfig config; + int ok; if (enc == NULL) { return 0; @@ -1212,6 +1341,10 @@ int WebPAnimEncoderAdd(WebPAnimEncoder* enc, WebPPicture* frame, int timestamp, } if (encoder_config != NULL) { + if (!WebPValidateConfig(encoder_config)) { + MarkError(enc, "ERROR adding frame: Invalid WebPConfig"); + return 0; + } config = *encoder_config; } else { WebPConfigInit(&config); @@ -1222,17 +1355,14 @@ int WebPAnimEncoderAdd(WebPAnimEncoder* enc, WebPPicture* frame, int timestamp, assert(enc->curr_canvas_copy_modified_ == 1); CopyCurrentCanvas(enc); - if (!CacheFrame(enc, &config)) { - return 0; - } + ok = CacheFrame(enc, &config) && FlushFrames(enc); - if (!FlushFrames(enc)) { - return 0; - } enc->curr_canvas_ = NULL; enc->curr_canvas_copy_modified_ = 1; - enc->prev_timestamp_ = timestamp; - return 1; + if (ok) { + enc->prev_timestamp_ = timestamp; + } + return ok; } // ----------------------------------------------------------------------------- @@ -1278,7 +1408,7 @@ static int FrameToFullCanvas(WebPAnimEncoder* const enc, GetEncodedData(&mem1, full_image); if (enc->options_.allow_mixed) { - if (!EncodeFrame(&enc->last_config_, canvas_buf, &mem2)) goto Err; + if (!EncodeFrame(&enc->last_config_reversed_, canvas_buf, &mem2)) goto Err; if (mem2.size < mem1.size) { GetEncodedData(&mem2, full_image); WebPMemoryWriterClear(&mem1); diff --git a/src/3rdparty/libwebp/src/mux/muxedit.c b/src/3rdparty/libwebp/src/mux/muxedit.c index b27663f..9bbed42 100644 --- a/src/3rdparty/libwebp/src/mux/muxedit.c +++ b/src/3rdparty/libwebp/src/mux/muxedit.c @@ -558,8 +558,8 @@ static WebPMuxError CreateVP8XChunk(WebPMux* const mux) { height = mux->canvas_height_; } - if (flags == 0) { - // For Simple Image, VP8X chunk should not be added. + if (flags == 0 && mux->unknown_ == NULL) { + // For simple file format, VP8X chunk should not be added. return WEBP_MUX_OK; } diff --git a/src/3rdparty/libwebp/src/mux/muxi.h b/src/3rdparty/libwebp/src/mux/muxi.h index 5e8ba2e..d4d5cba 100644 --- a/src/3rdparty/libwebp/src/mux/muxi.h +++ b/src/3rdparty/libwebp/src/mux/muxi.h @@ -28,7 +28,7 @@ extern "C" { #define MUX_MAJ_VERSION 0 #define MUX_MIN_VERSION 3 -#define MUX_REV_VERSION 0 +#define MUX_REV_VERSION 1 // Chunk object. typedef struct WebPChunk WebPChunk; diff --git a/src/3rdparty/libwebp/src/utils/bit_reader.c b/src/3rdparty/libwebp/src/utils/bit_reader.c index 45198e1..50ffb74 100644 --- a/src/3rdparty/libwebp/src/utils/bit_reader.c +++ b/src/3rdparty/libwebp/src/utils/bit_reader.c @@ -16,6 +16,7 @@ #endif #include "./bit_reader_inl.h" +#include "../utils/utils.h" //------------------------------------------------------------------------------ // VP8BitReader @@ -119,11 +120,10 @@ int32_t VP8GetSignedValue(VP8BitReader* const br, int bits) { #define VP8L_LOG8_WBITS 4 // Number of bytes needed to store VP8L_WBITS bits. -#if !defined(WEBP_FORCE_ALIGNED) && \ - (defined(__arm__) || defined(_M_ARM) || defined(__aarch64__) || \ - defined(__i386__) || defined(_M_IX86) || \ - defined(__x86_64__) || defined(_M_X64)) -#define VP8L_USE_UNALIGNED_LOAD +#if defined(__arm__) || defined(_M_ARM) || defined(__aarch64__) || \ + defined(__i386__) || defined(_M_IX86) || \ + defined(__x86_64__) || defined(_M_X64) +#define VP8L_USE_FAST_LOAD #endif static const uint32_t kBitMask[VP8L_MAX_NUM_BIT_READ + 1] = { @@ -191,15 +191,11 @@ static void ShiftBytes(VP8LBitReader* const br) { void VP8LDoFillBitWindow(VP8LBitReader* const br) { assert(br->bit_pos_ >= VP8L_WBITS); - // TODO(jzern): given the fixed read size it may be possible to force - // alignment in this block. -#if defined(VP8L_USE_UNALIGNED_LOAD) +#if defined(VP8L_USE_FAST_LOAD) if (br->pos_ + sizeof(br->val_) < br->len_) { br->val_ >>= VP8L_WBITS; br->bit_pos_ -= VP8L_WBITS; - // The expression below needs a little-endian arch to work correctly. - // This gives a large speedup for decoding speed. - br->val_ |= (vp8l_val_t)WebPMemToUint32(br->buf_ + br->pos_) << + br->val_ |= (vp8l_val_t)HToLE32(WebPMemToUint32(br->buf_ + br->pos_)) << (VP8L_LBITS - VP8L_WBITS); br->pos_ += VP8L_LOG8_WBITS; return; diff --git a/src/3rdparty/libwebp/src/utils/bit_reader_inl.h b/src/3rdparty/libwebp/src/utils/bit_reader_inl.h index 3721570..99ed313 100644 --- a/src/3rdparty/libwebp/src/utils/bit_reader_inl.h +++ b/src/3rdparty/libwebp/src/utils/bit_reader_inl.h @@ -55,7 +55,8 @@ void VP8LoadFinalBytes(VP8BitReader* const br); // Inlined critical functions // makes sure br->value_ has at least BITS bits worth of data -static WEBP_INLINE void VP8LoadNewBytes(VP8BitReader* const br) { +static WEBP_UBSAN_IGNORE_UNDEF WEBP_INLINE +void VP8LoadNewBytes(VP8BitReader* const br) { assert(br != NULL && br->buf_ != NULL); // Read 'BITS' bits at a time if possible. if (br->buf_ < br->buf_max_) { diff --git a/src/3rdparty/libwebp/src/utils/color_cache.c b/src/3rdparty/libwebp/src/utils/color_cache.c index f9ff4b5..c34b2e7 100644 --- a/src/3rdparty/libwebp/src/utils/color_cache.c +++ b/src/3rdparty/libwebp/src/utils/color_cache.c @@ -15,7 +15,7 @@ #include <stdlib.h> #include <string.h> #include "./color_cache.h" -#include "../utils/utils.h" +#include "./utils.h" //------------------------------------------------------------------------------ // VP8LColorCache. diff --git a/src/3rdparty/libwebp/src/utils/huffman.c b/src/3rdparty/libwebp/src/utils/huffman.c index d57376a..36e5502 100644 --- a/src/3rdparty/libwebp/src/utils/huffman.c +++ b/src/3rdparty/libwebp/src/utils/huffman.c @@ -15,7 +15,7 @@ #include <stdlib.h> #include <string.h> #include "./huffman.h" -#include "../utils/utils.h" +#include "./utils.h" #include "../webp/format_constants.h" // Huffman data read via DecodeImageStream is represented in two (red and green) diff --git a/src/3rdparty/libwebp/src/utils/huffman_encode.c b/src/3rdparty/libwebp/src/utils/huffman_encode.c index 6421c2b..4e5ef6b 100644 --- a/src/3rdparty/libwebp/src/utils/huffman_encode.c +++ b/src/3rdparty/libwebp/src/utils/huffman_encode.c @@ -15,7 +15,7 @@ #include <stdlib.h> #include <string.h> #include "./huffman_encode.h" -#include "../utils/utils.h" +#include "./utils.h" #include "../webp/format_constants.h" // ----------------------------------------------------------------------------- diff --git a/src/3rdparty/libwebp/src/utils/quant_levels_dec.c b/src/3rdparty/libwebp/src/utils/quant_levels_dec.c index 5b8b8b4..ee0a3fe 100644 --- a/src/3rdparty/libwebp/src/utils/quant_levels_dec.c +++ b/src/3rdparty/libwebp/src/utils/quant_levels_dec.c @@ -44,6 +44,7 @@ static const uint8_t kOrderedDither[DSIZE][DSIZE] = { typedef struct { int width_, height_; // dimension + int stride_; // stride in bytes int row_; // current input row being processed uint8_t* src_; // input pointer uint8_t* dst_; // output pointer @@ -99,7 +100,7 @@ static void VFilter(SmoothParams* const p) { // We replicate edges, as it's somewhat easier as a boundary condition. // That's why we don't update the 'src' pointer on top/bottom area: if (p->row_ >= 0 && p->row_ < p->height_ - 1) { - p->src_ += p->width_; + p->src_ += p->stride_; } } @@ -149,7 +150,7 @@ static void ApplyFilter(SmoothParams* const p) { #endif } } - p->dst_ += w; // advance output pointer + p->dst_ += p->stride_; // advance output pointer } //------------------------------------------------------------------------------ @@ -178,17 +179,20 @@ static void InitCorrectionLUT(int16_t* const lut, int min_dist) { lut[0] = 0; } -static void CountLevels(const uint8_t* const data, int size, - SmoothParams* const p) { - int i, last_level; +static void CountLevels(SmoothParams* const p) { + int i, j, last_level; uint8_t used_levels[256] = { 0 }; + const uint8_t* data = p->src_; p->min_ = 255; p->max_ = 0; - for (i = 0; i < size; ++i) { - const int v = data[i]; - if (v < p->min_) p->min_ = v; - if (v > p->max_) p->max_ = v; - used_levels[v] = 1; + for (j = 0; j < p->height_; ++j) { + for (i = 0; i < p->width_; ++i) { + const int v = data[i]; + if (v < p->min_) p->min_ = v; + if (v > p->max_) p->max_ = v; + used_levels[v] = 1; + } + data += p->stride_; } // Compute the mininum distance between two non-zero levels. p->min_level_dist_ = p->max_ - p->min_; @@ -208,7 +212,7 @@ static void CountLevels(const uint8_t* const data, int size, } // Initialize all params. -static int InitParams(uint8_t* const data, int width, int height, +static int InitParams(uint8_t* const data, int width, int height, int stride, int radius, SmoothParams* const p) { const int R = 2 * radius + 1; // total size of the kernel @@ -233,6 +237,7 @@ static int InitParams(uint8_t* const data, int width, int height, p->width_ = width; p->height_ = height; + p->stride_ = stride; p->src_ = data; p->dst_ = data; p->radius_ = radius; @@ -240,7 +245,7 @@ static int InitParams(uint8_t* const data, int width, int height, p->row_ = -radius; // analyze the input distribution so we can best-fit the threshold - CountLevels(data, width * height, p); + CountLevels(p); // correction table p->correction_ = ((int16_t*)mem) + LUT_SIZE; @@ -253,7 +258,7 @@ static void CleanupParams(SmoothParams* const p) { WebPSafeFree(p->mem_); } -int WebPDequantizeLevels(uint8_t* const data, int width, int height, +int WebPDequantizeLevels(uint8_t* const data, int width, int height, int stride, int strength) { const int radius = 4 * strength / 100; if (strength < 0 || strength > 100) return 0; @@ -261,7 +266,7 @@ int WebPDequantizeLevels(uint8_t* const data, int width, int height, if (radius > 0) { SmoothParams p; memset(&p, 0, sizeof(p)); - if (!InitParams(data, width, height, radius, &p)) return 0; + if (!InitParams(data, width, height, stride, radius, &p)) return 0; if (p.num_levels_ > 2) { for (; p.row_ < p.height_; ++p.row_) { VFilter(&p); // accumulate average of input diff --git a/src/3rdparty/libwebp/src/utils/quant_levels_dec.h b/src/3rdparty/libwebp/src/utils/quant_levels_dec.h index 9aab068..59a1349 100644 --- a/src/3rdparty/libwebp/src/utils/quant_levels_dec.h +++ b/src/3rdparty/libwebp/src/utils/quant_levels_dec.h @@ -21,11 +21,11 @@ extern "C" { #endif // Apply post-processing to input 'data' of size 'width'x'height' assuming that -// the source was quantized to a reduced number of levels. +// the source was quantized to a reduced number of levels. 'stride' is in bytes. // Strength is in [0..100] and controls the amount of dithering applied. // Returns false in case of error (data is NULL, invalid parameters, // malloc failure, ...). -int WebPDequantizeLevels(uint8_t* const data, int width, int height, +int WebPDequantizeLevels(uint8_t* const data, int width, int height, int stride, int strength); #ifdef __cplusplus diff --git a/src/3rdparty/libwebp/src/utils/utils.c b/src/3rdparty/libwebp/src/utils/utils.c index d8e3093..2602ca3 100644 --- a/src/3rdparty/libwebp/src/utils/utils.c +++ b/src/3rdparty/libwebp/src/utils/utils.c @@ -15,6 +15,7 @@ #include <string.h> // for memcpy() #include "../webp/decode.h" #include "../webp/encode.h" +#include "../webp/format_constants.h" // for MAX_PALETTE_SIZE #include "./utils.h" // If PRINT_MEM_INFO is defined, extra info (like total memory used, number of @@ -237,3 +238,68 @@ void WebPCopyPixels(const WebPPicture* const src, WebPPicture* const dst) { } //------------------------------------------------------------------------------ + +#define MAX_COLOR_COUNT MAX_PALETTE_SIZE +#define COLOR_HASH_SIZE (MAX_COLOR_COUNT * 4) +#define COLOR_HASH_RIGHT_SHIFT 22 // 32 - log2(COLOR_HASH_SIZE). + +int WebPGetColorPalette(const WebPPicture* const pic, uint32_t* const palette) { + int i; + int x, y; + int num_colors = 0; + uint8_t in_use[COLOR_HASH_SIZE] = { 0 }; + uint32_t colors[COLOR_HASH_SIZE]; + static const uint32_t kHashMul = 0x1e35a7bdU; + const uint32_t* argb = pic->argb; + const int width = pic->width; + const int height = pic->height; + uint32_t last_pix = ~argb[0]; // so we're sure that last_pix != argb[0] + assert(pic != NULL); + assert(pic->use_argb); + + for (y = 0; y < height; ++y) { + for (x = 0; x < width; ++x) { + int key; + if (argb[x] == last_pix) { + continue; + } + last_pix = argb[x]; + key = (kHashMul * last_pix) >> COLOR_HASH_RIGHT_SHIFT; + while (1) { + if (!in_use[key]) { + colors[key] = last_pix; + in_use[key] = 1; + ++num_colors; + if (num_colors > MAX_COLOR_COUNT) { + return MAX_COLOR_COUNT + 1; // Exact count not needed. + } + break; + } else if (colors[key] == last_pix) { + break; // The color is already there. + } else { + // Some other color sits here, so do linear conflict resolution. + ++key; + key &= (COLOR_HASH_SIZE - 1); // Key mask. + } + } + } + argb += pic->argb_stride; + } + + if (palette != NULL) { // Fill the colors into palette. + num_colors = 0; + for (i = 0; i < COLOR_HASH_SIZE; ++i) { + if (in_use[i]) { + palette[num_colors] = colors[i]; + ++num_colors; + } + } + } + return num_colors; +} + +#undef MAX_COLOR_COUNT +#undef COLOR_HASH_SIZE +#undef COLOR_HASH_RIGHT_SHIFT + +//------------------------------------------------------------------------------ diff --git a/src/3rdparty/libwebp/src/utils/utils.h b/src/3rdparty/libwebp/src/utils/utils.h index f506d66..e0a8112 100644 --- a/src/3rdparty/libwebp/src/utils/utils.h +++ b/src/3rdparty/libwebp/src/utils/utils.h @@ -21,6 +21,7 @@ #include <assert.h> +#include "../dsp/dsp.h" #include "../webp/types.h" #ifdef __cplusplus @@ -51,7 +52,7 @@ WEBP_EXTERN(void) WebPSafeFree(void* const ptr); // Alignment #define WEBP_ALIGN_CST 31 -#define WEBP_ALIGN(PTR) ((uintptr_t)((PTR) + WEBP_ALIGN_CST) & ~WEBP_ALIGN_CST) +#define WEBP_ALIGN(PTR) (((uintptr_t)(PTR) + WEBP_ALIGN_CST) & ~WEBP_ALIGN_CST) #if defined(WEBP_FORCE_ALIGNED) #include <string.h> @@ -65,10 +66,12 @@ static WEBP_INLINE void WebPUint32ToMem(uint8_t* const ptr, uint32_t val) { memcpy(ptr, &val, sizeof(val)); } #else -static WEBP_INLINE uint32_t WebPMemToUint32(const uint8_t* const ptr) { +static WEBP_UBSAN_IGNORE_UNDEF WEBP_INLINE +uint32_t WebPMemToUint32(const uint8_t* const ptr) { return *(const uint32_t*)ptr; } -static WEBP_INLINE void WebPUint32ToMem(uint8_t* const ptr, uint32_t val) { +static WEBP_UBSAN_IGNORE_UNDEF WEBP_INLINE +void WebPUint32ToMem(uint8_t* const ptr, uint32_t val) { *(uint32_t*)ptr = val; } #endif @@ -158,6 +161,19 @@ WEBP_EXTERN(void) WebPCopyPixels(const struct WebPPicture* const src, struct WebPPicture* const dst); //------------------------------------------------------------------------------ +// Unique colors. + +// Returns count of unique colors in 'pic', assuming pic->use_argb is true. +// If the unique color count is more than MAX_COLOR_COUNT, returns +// MAX_COLOR_COUNT+1. +// If 'palette' is not NULL and number of unique colors is less than or equal to +// MAX_COLOR_COUNT, also outputs the actual unique colors into 'palette'. +// Note: 'palette' is assumed to be an array already allocated with at least +// MAX_COLOR_COUNT elements. +WEBP_EXTERN(int) WebPGetColorPalette(const struct WebPPicture* const pic, + uint32_t* const palette); + +//------------------------------------------------------------------------------ #ifdef __cplusplus } // extern "C" diff --git a/src/3rdparty/libwebp/src/webp/config.h b/src/3rdparty/libwebp/src/webp/config.h index 4ea0737..118ac38 100644 --- a/src/3rdparty/libwebp/src/webp/config.h +++ b/src/3rdparty/libwebp/src/webp/config.h @@ -79,7 +79,7 @@ #define PACKAGE_NAME "libwebp" /* Define to the full name and version of this package. */ -#define PACKAGE_STRING "libwebp 0.5.0" +#define PACKAGE_STRING "libwebp 0.5.1" /* Define to the one symbol short name of this package. */ #define PACKAGE_TARNAME "libwebp" @@ -88,7 +88,7 @@ #define PACKAGE_URL "http://developers.google.com/speed/webp" /* Define to the version of this package. */ -#define PACKAGE_VERSION "0.5.0" +#define PACKAGE_VERSION "0.5.1" /* Define to necessary symbol if this constant uses a non-standard name on your system. */ @@ -98,7 +98,7 @@ /* #undef STDC_HEADERS */ /* Version number of package */ -#define VERSION "0.5.0" +#define VERSION "0.5.1" /* Enable experimental code */ /* #undef WEBP_EXPERIMENTAL_FEATURES */ @@ -118,12 +118,21 @@ /* Set to 1 if JPEG library is installed */ /* #undef WEBP_HAVE_JPEG */ +/* Set to 1 if NEON is supported */ +/* #undef WEBP_HAVE_NEON */ + +/* Set to 1 if runtime detection of NEON is enabled */ +/* #undef WEBP_HAVE_NEON_RTCD */ + /* Set to 1 if PNG library is installed */ /* #undef WEBP_HAVE_PNG */ /* Set to 1 if SSE2 is supported */ /* #undef WEBP_HAVE_SSE2 */ +/* Set to 1 if SSE4.1 is supported */ +/* #undef WEBP_HAVE_SSE41 */ + /* Set to 1 if TIFF library is installed */ /* #undef WEBP_HAVE_TIFF */ @@ -132,6 +141,15 @@ /* Define WORDS_BIGENDIAN to 1 if your processor stores words with the most significant byte first (like Motorola and SPARC, unlike Intel). */ +/* #if defined AC_APPLE_UNIVERSAL_BUILD */ +/* # if defined __BIG_ENDIAN__ */ +/* # define WORDS_BIGENDIAN 1 */ +/* # endif */ +/* #else */ +/* # ifndef WORDS_BIGENDIAN */ +/* /* # undef WORDS_BIGENDIAN */ +/* # endif */ +/* #endif */ #if (Q_BYTE_ORDER == Q_BIG_ENDIAN) #define WORDS_BIGENDIAN 1 #endif diff --git a/src/3rdparty/libwebp/src/webp/decode.h b/src/3rdparty/libwebp/src/webp/decode.h index 143e4fb..7a3bed9 100644 --- a/src/3rdparty/libwebp/src/webp/decode.h +++ b/src/3rdparty/libwebp/src/webp/decode.h @@ -39,8 +39,8 @@ typedef struct WebPDecoderConfig WebPDecoderConfig; WEBP_EXTERN(int) WebPGetDecoderVersion(void); // Retrieve basic header information: width, height. -// This function will also validate the header and return 0 in -// case of formatting error. +// This function will also validate the header, returning true on success, +// false otherwise. '*width' and '*height' are only valid on successful return. // Pointers 'width' and 'height' can be passed NULL if deemed irrelevant. WEBP_EXTERN(int) WebPGetInfo(const uint8_t* data, size_t data_size, int* width, int* height); @@ -197,7 +197,10 @@ struct WebPYUVABuffer { // view as YUVA struct WebPDecBuffer { WEBP_CSP_MODE colorspace; // Colorspace. int width, height; // Dimensions. - int is_external_memory; // If true, 'internal_memory' pointer is not used. + int is_external_memory; // If non-zero, 'internal_memory' pointer is not + // used. If value is '2' or more, the external + // memory is considered 'slow' and multiple + // read/write will be avoided. union { WebPRGBABuffer RGBA; WebPYUVABuffer YUVA; @@ -205,7 +208,7 @@ struct WebPDecBuffer { uint32_t pad[4]; // padding for later use uint8_t* private_memory; // Internally allocated memory (only when - // is_external_memory is false). Should not be used + // is_external_memory is 0). Should not be used // externally, but accessed via the buffer union. }; @@ -269,7 +272,7 @@ typedef enum VP8StatusCode { // that of the returned WebPIDecoder object. // The supplied 'output_buffer' content MUST NOT be changed between calls to // WebPIAppend() or WebPIUpdate() unless 'output_buffer.is_external_memory' is -// set to 1. In such a case, it is allowed to modify the pointers, size and +// not set to 0. In such a case, it is allowed to modify the pointers, size and // stride of output_buffer.u.RGBA or output_buffer.u.YUVA, provided they remain // within valid bounds. // All other fields of WebPDecBuffer MUST remain constant between calls. @@ -468,16 +471,18 @@ static WEBP_INLINE int WebPInitDecoderConfig(WebPDecoderConfig* config) { // parameter, in which case the features will be parsed and stored into // config->input. Otherwise, 'data' can be NULL and no parsing will occur. // Note that 'config' can be NULL too, in which case a default configuration -// is used. +// is used. If 'config' is not NULL, it must outlive the WebPIDecoder object +// as some references to its fields will be used. No internal copy of 'config' +// is made. // The return WebPIDecoder object must always be deleted calling WebPIDelete(). // Returns NULL in case of error (and config->status will then reflect -// the error condition). +// the error condition, if available). WEBP_EXTERN(WebPIDecoder*) WebPIDecode(const uint8_t* data, size_t data_size, WebPDecoderConfig* config); // Non-incremental version. This version decodes the full data at once, taking // 'config' into account. Returns decoding status (which should be VP8_STATUS_OK -// if the decoding was successful). +// if the decoding was successful). Note that 'config' cannot be NULL. WEBP_EXTERN(VP8StatusCode) WebPDecode(const uint8_t* data, size_t data_size, WebPDecoderConfig* config); diff --git a/src/3rdparty/libwebp/src/webp/encode.h b/src/3rdparty/libwebp/src/webp/encode.h index c382ea7..9291b71 100644 --- a/src/3rdparty/libwebp/src/webp/encode.h +++ b/src/3rdparty/libwebp/src/webp/encode.h @@ -134,8 +134,8 @@ struct WebPConfig { int thread_level; // If non-zero, try and use multi-threaded encoding. int low_memory; // If set, reduce memory usage (but increase CPU use). - int near_lossless; // Near lossless encoding [0 = off(default) .. 100]. - // This feature is experimental. + int near_lossless; // Near lossless encoding [0 = max loss .. 100 = off + // (default)]. int exact; // if non-zero, preserve the exact RGB values under // transparent area. Otherwise, discard this invisible // RGB information for better compression. The default |