summaryrefslogtreecommitdiff
path: root/libavutil/mips
Commit message (Collapse)AuthorAgeFilesLines
* mips: fix build fail on MIPS R6Junxian Zhu2023-03-261-3/+3
| | | | | | | | Add macro define to avoid causing build fail with incompatible assembler code on MIPS R6. Signed-off-by: Junxian Zhu <zhujunxian@oss.cipunited.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: Use $at as MMI macro temporary registerJiaxun Yang2021-07-281-42/+66
| | | | | | | | | | | | | Some function had exceed 30 inline assembly register oprands limiation when using LOONGSON2 version of MMI macros. We can avoid that by take $at, which is register reserved for assembler, as temporary register. As none of instructions used in these macros is pseudo, it is safe to utilize $at here. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: Use MMI_{L, S}QC1 macro in {SAVE, RECOVER}_REGJiaxun Yang2021-07-281-15/+17
| | | | | | | | | {SAVE,RECOVER}_REG will be available for Loongson2 again, also comment about the magic. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavcodec/mips: Fix build errors reported by clangJin Bo2021-06-031-0/+8
| | | | | | | | | | Clang is more strict on the type of asm operands, float or double type variable should use constraint 'f', integer variable should use constraint 'r'. Signed-off-by: Jin Bo <jinbo@loongson.cn> Reviewed-by: yinshiyou-hf@loongson.cn Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* mips: Fix potential illegal instruction error.Shiyou Yin2021-05-071-37/+0
| | | | | | | | MSA2 optimizations are attached to MSA macros in generic_macros_msa.h. It's difficult to do runtime check for them. Remove this part of code can make it more robust. H264 1080p decoding: 5.13x==>5.12x. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* Include attributes.h directlyAndreas Rheinhardt2021-04-191-0/+2
| | | | | | | | Some files currently rely on libavutil/cpu.h to include it for them; yet said file won't use include it any more after the currently deprecated functions are removed, so include attributes.h directly. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
* lavu: move LOCAL_ALIGNED from internal.h to mem_internal.hAnton Khirnov2021-01-011-0/+2
| | | | That is a more appropriate place for it.
* avutil/mips/generic_macros_msa: Fix prob that 'ulw' and 'uld' unsupported by ↵Shiyou Yin2020-07-301-6/+8
| | | | | | | | | clang. GCC support these two synthesized instruction, but clang does not yet. Use machine instruction instead to adapt clang compiler. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavutil: Detect MMI and MSA flags for MIPSJiaxun Yang2020-07-233-1/+163
| | | | | | | | | | | | | | | Add MMI & MSA runtime detection for MIPS. Basically there are two code pathes. For systems that natively support CPUCFG instruction or kernel emulated that instruction, we'll sense this feature from HWCAP and report the flags according to values grab from CPUCFG. For systems that have no CPUCFG (or not export it in HWCAP), we'll parse /proc/cpuinfo instead. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavutils: Add parse_r helper for MIPSJiaxun Yang2020-07-231-0/+42
| | | | | | | | | | That helper grab from kernel code can allow us to inline newer instructions (not implemented by the assembler) in a elegant manner. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: msa optimizations for vc1dspgxw2019-10-301-0/+3
| | | | | | | Performance of WMV3 decoding has speed up from 3.66x to 5.23x tested on 3A4000. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: refactor msa SLDI_Bn_0 and SLDI_Bn macros.gxw2019-09-161-49/+31
| | | | | | | | | | | | Changing details as following: 1. The previous order of parameters are irregular and difficult to understand. Adjust the order of the parameters according to the rule: (RTYPE, input registers, input mask/input index/..., output registers). Most of the existing msa macros follow the rule. 2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: remove redundant code in TRANSPOSE16x8_UB_UB.Shiyou Yin2019-08-151-2/+0
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: refine msa macros CLIP_*.gxw2019-08-131-70/+49
| | | | | | | | | | | | | | | Changing details as following: 1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in source vector. 2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'. Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x). Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x). Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x). 3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255' instead, because there are no difference in the effect of this two macros. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: Avoid instruction exception caused by gssqc1/gslqc1.Shiyou Yin2019-08-021-1/+1
| | | | Ensure the address accesed by gssqc1/gslqc1 are 16-byte aligned.
* avutil/mips: refactor msa load and store macros.Shiyou Yin2019-07-191-198/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}. The old macros are difficult to use because they don't follow the same parameter passing rules. Changing details as following: 1. remove LD4x4_SH. 2. replace ST2x4_UB with ST_H4. 3. replace ST4x2_UB with ST_W2. 4. replace ST4x4_UB with ST_W4. 5. replace ST4x8_UB with ST_W8. 6. replace ST6x4_UB with ST_W2 and ST_H2. 7. replace ST8x1_UB with ST_D1. 8. replace ST8x2_UB with ST_D2. 9. replace ST8x4_UB with ST_D4. 10. replace ST8x8_UB with ST_D8. 11. replace ST12x4_UB with ST_D4 and ST_W4. Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride) ST_H4 store four half-word elements in vector 'in' to pdst with stride. About the macro name: 1) 'ST' means store operation. 2) 'H/W/D' means type of vector element is 'half-word/word/double-word'. 3) Number '1/2/4/8' means how many elements will be stored. About the macro parameter: 1) 'in0, in1...' 128-bits vector. 2) 'idx0, idx1...' elements index. 3) 'pdst' destination pointer to store to 4) 'stride' stride of each store operation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: optimize UNPCK&SAD macros with MSA2.0 instruction.Shiyou Yin2019-07-101-3/+39
| | | | | | | Loongson 3A4000 and 2k1000 has supported MSA2.0. This patch optimized SAD_UB2_UH,UNPCK_R_SH_SW,UNPCK_SB_SH and UNPCK_SH_SW with MSA2.0 instruction. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functionsgxw2019-02-271-0/+15
| | | | | | | VP9 decoding speed improved about 60.5%(from 38fps to 61fps, tested on loongson 3A3000). Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize put_hevc_qpel_hv_8 with mmi.Shiyou Yin2019-01-221-0/+9
| | | | | | | Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: [loongson] simplify macro TRANSPOSE_4H and TRANSPOSE_8BShiyou Yin2018-09-091-31/+53
| | | | | | Simplify macro TRANSPOSE_4H in mmiutils.h and add TRANSPOSE_8B as a common macro. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] optimize vp8 decoding in vp8dsp.gxw2018-09-091-0/+28
| | | | | | | | | | | | | Optimize vp8 loop filter with mmi, four functions optimized: 1. ff_vp8_h_loop_filter8uv_mmi. 2. ff_vp8_v_loop_filter8uv_mmi. 3. ff_vp8_h_loop_filter16_mmi. 4. ff_vp8_v_loop_filter16_mmi. Vp8 decoding speed improved about 50%(from 73fps to 110fps, Tested on loongson 3A3000). Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: [loongson] reoptimize simple idct with mmi.Shiyou Yin2018-09-021-0/+49
| | | | | | | | | | Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000). Reoptimized following functions with mmi. 1. ff_simple_idct_put_8_mmi 2. ff_simple_idct_add_8_mmi 3. ff_simple_idct_8_mmi Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc bi weighted hv mc msa functionsKaustubh Raste2017-10-251-0/+35
| | | | | | | | Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve avc bi-weighted mc msa functionsKaustubh Raste2017-10-101-0/+4
| | | | | | | | Replace generic with block size specific function. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve avc weighted mc msa functionsKaustubh Raste2017-09-271-0/+36
| | | | | | | | Replace generic with block size specific function. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc uni-w copy mc msa functionsKaustubh Raste2017-09-241-0/+30
| | | | | | | | | | Load the specific destination bytes instead of MSA load and pack. Pack the data to half word before clipping. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve hevc sao band filter msa functionsKaustubh Raste2017-09-151-0/+1
| | | | | | | | Preload data in band filter 0-8 for better pipeline parallelization. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: Improve vp9 mc msa functionsKaustubh Raste2017-09-081-13/+11
| | | | | | | | Load the specific destination bytes instead of MSA load and pack. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavcodec/mips: Optimize avc idct 4x4 for msaKaustubh Raste2017-07-251-0/+18
| | | | | | | | Removed memset call and improved performance. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* libavutil/mips: Updated msa generic macrosKaustubh Raste2017-07-211-384/+245
| | | | | | | | | | | Reduced msa load-store code. Removed inline asm of GP load-store for 64 bit. Updated variable names in GP load-store macros for naming consistency. Corrected macro descriptions. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: loongson add mmi utils header fileZhou Xiaoyong2016-10-231-0/+241
| | | | | | | 1.mmiutils.h defined MMI_ load/store macros for loongson2e/2f/3a 2.mmiutils.h defined some mmi assembly macors Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips/generic_macros_msa: rename macro variable which causes segfault ↵Shivraj Patil2016-10-051-6/+6
| | | | | | | for mips r6 Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avutil/mips: header asmdefs.h add some PTR_ macros for loongsonZhouXiaoyong2016-05-141-0/+12
| | | | Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* mips: add support for R6Vicente Olivert Riera2016-03-091-0/+4
| | | | | | | | | | | Understanding the mips32r6 and mips64r6 ISAs in the configure script is not enough. In order to have full support for MIPS R6 in FFmpeg we need to be able to build it, and for that we need to make sure we don't use incompatible assembler code which makes the build fail. Ifdefing the offending code is sufficient to fix the problem. Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* all: Make header guard names consistentTimothy Gu2016-01-311-3/+3
|
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for VP9 lpf functionsShivraj Patil2015-07-231-0/+3
| | | | | | Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for idctdsp functionsShivraj Patil2015-07-071-0/+37
| | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for idctdsp functions in new file idctdsp_msa.c and simple_idct_msa.c Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for me_cmp functionsShivraj Patil2015-07-061-0/+59
| | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for me_cmp functions in new file me_cmp_msa.c Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for mpegvideoencdsp functionsShivraj Patil2015-07-061-0/+34
| | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for mpegvideoencdsp functions in new file mpegvideoencdsp_msa.c Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for mpegvideo functionsShivraj Patil2015-07-011-0/+94
| | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for mpegvideo functions in new file mpegvideo_msa.c Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for pixblock functionsShivraj Patil2015-06-291-0/+8
| | | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for pixblock functions in new file pixblockdsp_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for hpel functionsShivraj Patil2015-06-191-0/+162
| | | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for hpel functions in new file hpeldsp_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for qpel functionsShivraj Patil2015-06-181-0/+21
| | | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for qpel functions in new file qpeldsp_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for AVC qpel functionsShivraj Patil2015-06-131-0/+124
| | | | | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC qpel functions in new file h264qpel_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Added const to local static array. Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for AVC idct functionsShivraj Patil2015-06-111-0/+96
| | | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC idct functions in new file h264idct_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for AVC intra prediction ↵Shivraj Patil2015-06-111-0/+11
| | | | | | | | | | functions This patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC intra prediction functions in new file h264pred_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for AVC chroma mc functionsShivraj Patil2015-06-111-0/+56
| | | | | | | | s patch adds MSA (MIPS-SIMD-Arch) optimizations for AVC chroma mc functions in new file h264chroma_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for HEVC intra prediction ↵Shivraj Patil2015-06-101-0/+46
| | | | | | | | | | functions This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC intra predition functions in new file hevcpred_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for HEVC loop filter and ↵Shivraj Patil2015-06-101-1/+110
| | | | | | | | | | | | sao functions This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC loop filter and sao functions in new file hevc_lpf_sao_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h In this patch, in comparision with previous patch, duplicated c functions are removed. Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for HEVC idct functionsShivraj Patil2015-06-041-0/+195
| | | | | | | | This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC idct functions in new file hevc_idct_msa.c Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>