mozjpeg

Author	SHA1	Message	Date
DRC	00d48d7e8c	Merge branch 'master' into dev	2020-02-17 18:14:10 -06:00
DRC	035262a18d	MIPS DSPr2: Work around various 'make test' errors Referring to #408, this commit #ifdefs DSPr2 SIMD functions that only work on little endian processors, and it completely excludes jsimd_h2v1_downsample_dspr2() and jsimd_h2v2_downsample_dspr2(). The latter two functions fail with the TJBench tiling regression tests, most likely because the implementation of the functions predates those tests.	2020-02-17 18:13:31 -06:00
DRC	ed7cab47d9	MIPS DSPr2: Fix compiler warning with -mdspr2 If -mdspr2 is passed to the compiler, __mips_dsp will be defined, and __mips_dsp_rev will be >= 2, so parse_proc_cpuinfo() will not be used.	2020-02-17 16:35:00 -06:00
DRC	42d679b9fc	MIPS SIMD: Always honor JSIMD_FORCE* env vars Previously, these environment variables were not honored unless a 74K CPU was detected, but this detection doesn't work properly with QEMU's user mode emulation. With all other CPU types, libjpeg-turbo honors JSIMD_FORCE* regardless of CPU detection.	2020-02-17 15:19:32 -06:00
DRC	b34c85ea4a	Merge branch 'master' into dev	2019-12-31 01:20:12 -06:00
DRC	166e34213e	simd/arm64/jsimd_neon.S: Fix checkstyle issue	2019-12-31 01:10:30 -06:00
DRC	c4675d62e8	Merge branch 'master' into dev	2019-12-31 00:58:42 -06:00
DRC	b542e4c8e9	ARMv8 SIMD: Support execute-only memory (XOM) Move constants out of the .text section in simd/arm64/jsimd_neon.S and into a .rodata section. This ensures that the ARMv8 NEON SIMD extensions are compatible with memory layouts that are marked execute-only (and thus unreadable.) Based on: `88f3ca7664` Closes #318	2019-12-20 14:24:10 -06:00
DRC	81b8c0eed5	Loongson MMI: Merge with MIPS64/add auto-detection Modern Loongson processors are MIPS64-compatible, and MMI instructions are now supported in the mainline of GCC. Thus, this commit adds compile-time and run-time auto-detection of MMI instructions and moves the MMI SIMD extensions for libjpeg-turbo from simd/loongson/ to simd/mips64/. That will allow MMI and MSA instructions to co-exist in the same build once #377 has been integrated. Based on: `82953ddd61` Closes #383	2019-12-17 14:35:49 -06:00
mayeut	e821464f79	ARM64 NEON SIMD impl. of prog. Huffman encoding This commit adds ARM64 NEON optimizations for the encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in progressive Huffman encoding. Compression speedups for the typical set of five libjpeg-turbo test images (https://libjpeg-turbo.org/About/Performance): Cortex-A53: 23.8-39.2% (avg. 32.2%) Cortex-A72: 26.8-41.1% (avg. 33.5%) Apple A7: 29.7-45.9% (avg. 39.6%) Closes #229	2019-12-10 00:21:57 -06:00
DRC	9c6f79e919	Fix formatting issues detected by checkstyle	2019-11-14 12:16:38 -06:00
DRC	f60b6dd36f	Remove vestigial jpeg_nbits_table.inc Not needed since `087c29e07f`	2019-11-12 17:42:39 -06:00
DRC	713c451f58	Enable SSE2 progressive Huffman encoder for x32 Referring to #289, I'm not sure where I arrived at the conclusion that the SSE2 progressive Huffman encoder doesn't provide any speedup for x32. Upon re-testing, I discovered it to be about 50% faster than the C encoder. This commit also re-purposes one of the CI tests (specifically, the jpeg-7 API/ABI test) so that it tests x32 as well.	2019-11-08 16:03:38 -06:00
DRC	cbf0fcc8b7	i386 SSE2 Huffman: Fix pointer arithmetic issue Splitting the pointer arithmetic in GET_SYM() into a separate add and sub instruction was an attempt to work around an error ("invalid operand type") that occurred when assembling the file with NASM. However, this created a link error on macOS ("ld: illegal text-relocation to '_jconst_huff_encode_one_block' in simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o from '_jsimd_huff_encode_one_block_sse2' in simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o for architecture i386") and also changed the alignment of the code in ways that might have affected the previous benchmark results (which took a great deal of time to obtain.) Ultimately, the path of least resistance is just to require NASM 2.13 or later.	2019-11-05 15:56:28 -06:00
DRC	bbedb4b564	Merge branch 'master' into dev	2019-11-05 15:43:21 -06:00
DRC	cf54623b08	Mac: Support hiding SIMD fct symbols w/ NASM 2.14+ (NASM 2.14+ now supports the private_extern section directive, which was previously only available with YASM.)	2019-11-05 15:41:59 -06:00
DRC	087c29e07f	Optimize Huffman encoding This commit improves the C and SSE2 Huffman encoding implementations in the following ways: - Avoid using xmm8-xmm15 in the x86-64 SSE2 implementation. There is no actual need to use those registers, and avoiding them produces a cleaner WIN64 function entry/exit-- as well as shorter code, since REX prefixes can be avoided (this is helpful on certain CPUs, such as Intel Atom, for which instruction fetch and decoding can be a bottleneck.) - Optimize register usage so that fewer REX prefixes and register-register moves are needed. - Use the bit counter to store the number of free bits in the bit buffer rather than the number of bits in the bit buffer. This changes the method for inserting a code into the bit buffer to: (put_buffer \|= code << (free_bits -= code_size)); As a result: * Only one bit counter needs to stay in a register (we just keep it in cl.) * The bit buffer contents are already properly aligned to be written out (after a byte swap.) * Adjusting the free bits counter and checking if the bit buffer is full can be combined into a single operation. * We can wait to flush the bit buffer until the buffer is actually full and not just in danger of becoming full. Thus, eight bytes can be flushed at a time. - Speed is quite sensitive to the alignment of branch target labels, so insert some padding and remove branches from the flush code. (Flushing this way isn't actually faster when compared to using branches, but the branchless code doesn't need extra alignment and is thus smaller.) - Speculatively write out the bit buffer as a single 8-byte write, falling back to a byte-by-byte write only if there are any 0xFF bytes in the bit buffer that need to be encoded as 0xFF 0x00. - Use MMX registers for the 32-bit implementation (so the bit buffer can be 64 bits wide.) - Slightly reduce overall function code size. - Eliminate or combine a few SSE instructions. - Make some minor improvements to instruction scheduling. - Adjust flush_bits() in jchuff.c to handle cases in which the bit buffer has less than 7 free bits (apparently that couldn't happen before.) Based on: `947a09defa` `262ebb6b81` `6e9a091221` See change log for performance claims. Closes #292	2019-11-04 19:04:05 -06:00
DRC	d92ae5df0c	Merge branch 'master' into dev	2019-11-04 18:50:45 -06:00
DRC	6902cdb177	Build: Don't require ASM_NASM if !REQUIRE_SIMD The build system is supposed to fall back to a non-SIMD build if WITH_SIMD==1 but REQUIRE_SIMD==0. Based on: `972df912d0` Closes #384	2019-10-29 12:08:40 -05:00
DRC	95f4d6ef8b	Merge branch 'master' into dev	2019-10-24 02:13:23 -05:00
DRC	3a32d199df	x86 SIMD: Consistify capitalization of NASM types byte, word, dword, qword, oword, and yword are all assembler keywords, so it makes sense to use lowercase for these so as not to mistake them for macros or constants.	2019-10-17 20:02:20 -05:00
DRC	9a51a87af3	x86 SIMD: Remove obsolete [TAB8] comments With apologies to Richard Hendricks, our assembly code no longer uses tabs.	2019-10-17 14:11:35 -05:00
DRC	8ef53b102f	Merge branch 'master' into dev	2019-08-14 22:08:59 -05:00
DRC	a81a8c137b	SSE2 SIMD: Fix prog Huffman enc. error if Sl%16==0 (regression introduced by `5b177b3cab`) The SSE2 implementation of progressive Huffman encoding performed extraneous iterations when the scan length was a multiple of 16. Based on: `bb7f1ef983` Fixes #335 Closes #367	2019-08-14 22:01:30 -05:00
DRC	7fbfe29c65	Merge branch 'master' into dev	2019-07-18 15:18:27 -05:00
DRC	f37b7c1f96	Build: Fix build/install with Xcode IDE Closes #355	2019-07-02 11:28:26 -05:00
DRC	f36d531553	Merge branch 'master' into dev	2019-04-23 14:54:23 -05:00
Chris Blume	aa9db61677	x86 SIMD: Check for CPUID leaf 07H before using According to Intel's manual [1], "If a value entered for CPUID.EAX is higher than the maximum input value for basic or extended function for that processor then the data for the highest basic information leaf is returned." Right now, libjpeg-turbo doesn't first check that leaf 07H is supported before attempting to use it, so the ostensible AVX2 bit (Bit 05) of the CPUID result might actually be Bit 05 from a lower leaf. That bit might be set, even if the CPU doesn't support AVX2. This commit modifies the x86 and x86-64 SIMD feature detection code so that it first checks whether CPUID leaf 07H is supported before attempting to use it to check for AVX2 instruction support. DRC: This commit should fix https://bugzilla.mozilla.org/show_bug.cgi?id=1520760 However, I have not personally been able to reproduce that issue, despite using a Nehalem (pre-AVX2) CPU on which the maximum CPUID leaf has been limited via a BIOS setting. Closes #348 [1] "Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z", https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf, page 3-192.	2019-04-16 17:07:28 -05:00
DRC	afbe48c290	MMI: Support 32-bit Loongson architectures	2019-02-27 13:36:48 -06:00
DRC	98ff5507d8	MMI: Fix bug in jsimd_h2v1_merged_upsample_mmi() ... that occurred when ((image width) & 1) != 0.	2019-02-27 13:36:48 -06:00
DRC	3ca6dba96e	Merge branch 'master' into dev	2019-02-17 09:33:57 -06:00
Chris Blume	b46af82cc1	ARMv7 NEON: #ifdef unused funcs/vars w/ -mfpu=neon When simd/arm/jsimd.c is compiled with __ARM_NEON__ defined (which will be the case if -mfpu=neon is passed to the compiler), the parse_proc_cpuinfo() and check_feature() functions and the bufsize variable are unused and thus need to be #ifdef'ed out in order to avoid compiler warnings. Note that the bufsize variable was already #ifdef'ed out on Linux but not on Android due to lack of parentheses (&& takes precedence over \|\|.) Closes #331	2019-02-14 08:53:49 -06:00
DRC	bdec995839	MMI: Fix unaligned decomp. perf. for 32-bit PFs (Oversight from `db84125fcb`)	2019-02-01 01:16:13 -06:00
DRC	fa905fbf7b	MMI: Use unaligned stores w/ merged upsampling ... when necessary. This was an oversight from `2f9e7c84d1`	2019-02-01 01:03:32 -06:00
DRC	9aada25ced	Merge branch 'master' into dev	2019-02-01 01:02:55 -06:00
DRC	e2442e0707	MMI: Fix unaligned comp. perf. for 32-bit PFs also (Oversight from `1c2d3cfaaf`)	2019-02-01 00:59:58 -06:00
DRC	73fd604161	MMI: Fix formatting issue detected by checkstyle	2019-02-01 00:24:09 -06:00
DRC	2f9e7c84d1	Loongson MMI h2v1 and h2v2 merged upsampling Based on: `e8f5cee5aa`	2019-01-31 23:18:48 -06:00
DRC	3c7199ff06	Loongson MMI h2v1 fancy upsampling Based on: `e8f5cee5aa`	2019-01-31 17:01:01 -06:00
DRC	73b98acd8b	Loongson MMI RGB-to-Grayscale conversion Based on: `e8f5cee5aa`	2019-01-31 16:44:55 -06:00
DRC	bb0d170288	Improve readability of Loongson MMI code We have more than eight registers to work with, as well as three-operand intrinsics, so there's no need for the implementation to be such a literal port of the MMX code.	2019-01-31 16:44:48 -06:00
DRC	db84125fcb	MMI: Use aligned store instructions when possible This improves decompression performance by 2-5%.	2019-01-31 15:30:58 -06:00
DRC	ae4221f905	Loongson MMI fast forward/inverse DCT Based on: `32a9ca222d`	2019-01-31 15:30:58 -06:00
DRC	674343ab14	Merge branch 'master' into dev	2019-01-31 15:30:25 -06:00
DRC	1c2d3cfaaf	MMI: Fix comp. perf. issue w/ unaligned image rows Using ldc1 with a non-64-bit-aligned memory location causes as much as a 10x slow-down in overall compression performance.	2019-01-31 15:30:05 -06:00
DRC	01e3032354	Eliminate support for compilers w/o unsigned char libjpeg-turbo has never really supported such compilers, since (AFAIK) they are non-existent on any modern computing platform and thus impossible for us to test. (Also, the TurboJPEG API would break without unsigned chars.) Furthermore, the unified CMake-based build system introduced in 2.0 always defines HAVE_UNSIGNED_CHAR, so retaining other code paths is pointless. Eliminating support for compilers without unsigned char eliminates the need for the GETJSAMPLE() macro, which improves the readability of many parts of the code as well as improving the performance of writing Targa and Windows BMP files. Fixes #317	2019-01-23 15:12:26 -06:00
DRC	2cc4f93c88	Merge branch 'master' into dev	2018-11-12 14:40:19 -06:00
DRC	d5f281b734	SIMD: Fix c000001d exception on Win 7 w/o SP1 Apparently Windows 7 without SP1 has O/S support for XSAVE but not for YMM registers, and this exposed a bug in our usage of xgetbv. The test instruction will set ZF only if none of the bits match between the two operarands, so in effect, we were enabling AVX2 instructions if the O/S supported XSAVE and the CPU supported AVX2 but the O/S only supported XMM registers. This bug was not exposed on, for instance, Windows XP or RHEL 5 because those O/S's do not support XSAVE. Fixes #288	2018-09-28 16:23:14 -05:00
DRC	133e4af070	Add x32 ABI support on Linux The x32 ABI is similar to the x86-64 ABI but uses 32-bit pointers. (Refer to https://sites.google.com/site/x32abi) Based on: `8da8fc5213` `1e33dfea80` `24ffea78da` `dedcf76753` `d04228a7b5` `b4ad38316a` Closes #274	2018-09-05 17:10:06 -05:00
Rosen Penev	4f943644e5	Enable DSPr2 SIMD extensions if CPU type is mipsel The DSPr2 extensions have been verified to work with little endian MIPS. Whether or not CMAKE_SYSTEM_PROCESSOR is set to "mips" or "mipsel" in a little endian MIPS environment seems to be inconsistent, but our build system needs to handle both cases.	2018-09-04 21:17:58 -05:00

1 2 3 4 5 ...

410 Commits