mozjpeg

Author	SHA1	Message	Date
Jonathan Wright	eb14189caa	Fix Neon SIMD build issues with Visual Studio - Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for compile-time detection of Arm builds, since __arm__ and __aarch64__ are only present in GNU-compatible compilers. - Neon/intrinsics: Use the _CountLeadingZeros() and _CountLeadingZeros64() intrinsics provided by Visual Studio, since __builtin_clz() and __builtin_clzl() are only present in GNU-compatible compilers. - Neon/intrinsics: Since Visual Studio does not support static vector initialization, replace static initialization of Neon vectors with the appropriate intrinsics. Compared to the static initialization approach, this produces identical assembly code with both GCC and Clang. - Neon/intrinsics: Since Visual Studio does not support inline assembly code, provide alternative code paths for Visual Studio whenever inline assembly is used. - Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds (Visual Studio does not emit fused multiply-add [FMA] instructions by default for such builds.) - Neon/intrinsics: Move temporary buffer allocation outside of nested loops. Since Visual Studio configures Arm builds with a relatively small amount of stack memory, attempting to allocate those buffers within the inner loops caused a stack overflow. Closes #461 Closes #475	2020-11-24 21:13:16 -06:00
DRC	91dd3b23ad	ChangeLog: macOS Armv8/x86-64 univ. binary support	2020-11-24 20:31:59 -06:00
DRC	7e0d94d3a7	Merge branch 'master' into dev	2020-11-24 20:31:51 -06:00
DRC	1c839761cf	Force Git to treat testorig.ppm as a binary file Otherwise, because the file begins with an ASCII header, Git will erroneously treat is as an ASCII file, and if Git for Windows is configured with default options (specifically, "Checkout windows-style, commit Unix-style line endings"), it will add carriage return characters to all of the "linefeed" characters in the PPM file, thus corrupting it and causing libjpeg-turbo's regression tests to fail.	2020-11-24 18:54:44 -06:00
DRC	6d91e950c8	Use 5x5 win & 9 AC coeffs when smoothing DC scans ... of progressive images. Based on: `be8d36d13b` `9d528f278e` `85f36f0765` `63a4d39e38` `51336a6ad5` Closes #459 Closes #474	2020-11-24 18:21:42 -06:00
DRC	d523435e18	Travis: Use Xcode 12.2 for all iOS & macOS builds There doesn't seem to be any performance or compatibility downside to this, and it has the advantages of simplicity and consistency between the PR and official builds.	2020-11-19 19:30:51 -06:00
DRC	1ac83cd636	Travis: The Mac build log is now log-macos.txt (oversight from `f7a10a61e3`)	2020-11-18 18:16:12 -06:00
DRC	0ba70b6a13	Build: Support macOS Armv8/x86-64 univ. binaries - Rename IOS_ARMV8_BUILD to ARMV8_BUILD. - Rename install_ios() to install_subbuild() in makemacpkg. - Wordsmith the build instructions accordingly. - Use xcode12.2 image in Travis CI.	2020-11-18 17:40:44 -06:00
DRC	e417033d84	Merge branch 'master' into dev	2020-11-18 14:13:54 -06:00
DRC	6d2e8837b4	jpeg_skip_scanlines(): Avoid NULL + 0 UBSan error This error occurs at the call to (cinfo->cconvert->color_convert)() in sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE (i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is innocuous, since (cinfo->cconvert->color_convert)() points to a dummy function (noop_convert()) in that case. Fixes #470	2020-11-18 13:33:47 -06:00
DRC	f7c5489244	Travis: Add /opt/local/bin to PATH for Mac build (oversight from previous commit) macports-ci does this, and it's necessary in order for the build script to find md5sum.	2020-11-18 10:11:21 -06:00
DRC	f7a10a61e3	Build: "OS X"/"OSX" = "macOS"/"MACOS" There are no supported versions of "OS X" anymore. The operating system has been named "macOS" since 10.12 Sierra, which was released four years ago.	2020-11-17 13:53:33 -06:00
DRC	d111d9ff7a	Merge branch 'master' into dev	2020-11-17 11:54:20 -06:00
DRC	10ba6ed336	Travis: Install MacPorts without using macports-ci	2020-11-16 17:38:06 -06:00
DRC	292d78e786	Merge branch 'master' into dev	2020-11-16 15:28:02 -06:00
DRC	88bf1d1678	Build: Set FLOATTEST more intelligently The "32bit" vs. "64bit" floating point test results actually have nothing to do with the FPU. That was a fallacious assumption based on the observation that, with multiple CPU types, 32-bit and 64-bit builds produce different floating point test results. It seems that this is, in fact, due to differing compiler behavior-- more specifically, whether fused multiply-add (FMA) instructions are used to combine multiple floating point operations into a single instruction ("floating point expression contraction".) GCC does this by default if the target supports FMA instructions, which PowerPC and AArch64 targets both do. Fixes #468	2020-11-16 15:19:42 -06:00
DRC	8f8305981b	Merge branch 'master' into dev	2020-11-13 15:21:26 -06:00
DRC	42f7c78fe3	BUILDING.md: Use min. iOS v8 in iOS Armv8 example This is necessary in order to enable thread-local storage.	2020-11-13 15:18:35 -06:00
DRC	33859880e9	Neon: Auto-detect compiler intrinsics completeness This allows the Neon intrinsics code to be built successfully (albeit likely with reduced run-time performance) with Xcode 5.0-6.2 (iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2 will not build the Armv8 GAS code without gas-preprocessor.pl, and no version of Xcode will build the Armv7 GAS code without gas-preprocessor.pl, so we always use the full Neon intrinsics implementation by default with macOS and iOS builds. Auto-detecting the completeness of the compiler's set of Neon intrinsics also allows us to more intelligently set the default value of NEON_INTRINSICS, based on the values of HAVE_VLD1. This is a reasonable, albeit imperfect, proxy for whether a compiler has a full and optimal set of Neon intrinsics. Specific notes: - 64-bit RGB-to-YCbCr color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer forward DCT uses vld1_s16_x3(), regresses with GCC - 64-bit Huffman encoding uses vld1q_u8_x4(), regresses with GCC - 64-bit YCbCr-to-RGB color conversion does not use any of the intrinsics in question, regresses with GCC - 64-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 64-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. - 32-bit RGB-to-YCbCr color conversion: uses vld1_u16_x2(), regresses with GCC - 32-bit accurate integer forward DCT uses vld1_s16_x3(), regression irrelevant because there was no previous implementation - 32-bit accurate integer inverse DCT uses vld1_s16_x3(), regresses with GCC - 32-bit fast integer inverse DCT does not use any of the intrinsics in question, regresses with GCC - 32-bit 4x4 inverse DCT uses vld1_s16_x3(). I did not test this algorithm in isolation, so it may in fact regress with GCC, but the regression may be hidden by the speedup from the new SIMD-accelerated upsampling algorithms. Presumably when GCC includes a full and optimal set of Neon intrinsics, the HAVE_VLD1 tests will pass, and the full Neon intrinsics implementation will be enabled automatically.	2020-11-13 15:16:34 -06:00
DRC	3e9e7c7055	Fix build if WITH_12BIT==1 && WITH_JPEG(7\|8)==1 Fixes #466	2020-11-11 17:54:06 -06:00
DRC	bbd8089297	Neon: Finalize intrinsics implementation - Remove gas-preprocessor.pl. None of the compilers that can build the new intrinsics implementation require gas-preprocessor.pl (tested with Xcode and with Clang 3.9+ for Linux.) - Document that Xcode 6.3.x or later is now required for iOS builds (older versions of Xcode do not have a full set of Neon intrinsics.) - Add a change log entry. - Do not enable the ASM CMake language unless NEON_INTRINSICS is false. - Add a Clang/Arm64 test to .travis.yml in order to test the new intrinsics implementation. Closes #455	2020-11-10 19:58:28 -06:00
Martyn Jacques	141f26ff6d	Neon: Intrinsics impl. of 2x2 and 4x4 scaled IDCTs The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementations provide the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	4574f01f43	Neon: Intrinsics impl. of h2v1 & h2v2 plain upsamp There was no previous GAS implementation. NOTE: This doesn't produce much of a speedup when using -O3, because -O3 already enables Neon autovectorization, which works well for the scalar C implementation of plain upsampling. However, the Neon SIMD implementation will benefit other optimization levels.	2020-11-10 19:09:09 -06:00
Jonathan Wright	ba52a3de32	Neon: Intrinsics impl of h2v1 & h2v2 merged upsamp There was no previous GAS implementation. This commit also reverts `40557b2301` and `7723d7f7d0`. `7723d7f7d0` was only necessary because there was no Neon implementation of merged upsampling/color conversion, and `40557b2301` was only necessary because of `7723d7f7d0`.	2020-11-10 19:09:09 -06:00
Jonathan Wright	240ba417aa	Neon: Intrinsics impl. of prog. Huffman encoding The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.	2020-11-10 19:09:09 -06:00
Jonathan Wright	ed581cd935	Neon: Intrinsics impl. of accurate int inverse DCT The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable.	2020-11-10 19:09:09 -06:00
Jonathan Wright	2c6b68e283	Neon: Intrinsics impl. of fast integer Inverse DCT The previous AArch32 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	2acfb93c94	Neon: Intrinsics impl. of h1v2 fancy upsamling There was no previous GAS implementation.	2020-11-10 19:09:09 -06:00
Jonathan Wright	975307775c	Neon: Intrinsics impl. of h2v1 & h2v2 fancy upsamp The previous AArch32 GAS implementation of h2v1 fancy upsampling has been removed, since the intrinsics implementation provides the same or better performance. There was no previous GAS implementation of h2v2 fancy upsampling, and there was no previous AArch64 GAS implementation of h2v1 fancy upsampling.	2020-11-10 19:09:09 -06:00
Jonathan Wright	5dbd39323c	Neon: Intrinsics implementation of YCbCr->RGB565 The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	0f35cd68f2	Neon: Intrinsics implementation of YCbCr->RGB The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	f3c3f01d23	Neon: Intrinsics impl. of Huffman encoding The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. The previous AArch32 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	d0004de5dd	Neon: Intrinsics impl. of accurate int forward DCT The previous AArch64 GAS implementation is retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using the new NEON_INTRINSICS CMake variable. There was no previous AArch32 GAS implementation.	2020-11-10 19:09:09 -06:00
Jonathan Wright	3d84668d42	Neon: Intrinsics impl. of fast integer forward DCT The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	951d3677eb	Neon: Intrinsics impl. of int sample conv./quant. The previous AArch32 and AArch64 GAS implementations have been removed, since the intrinsics implementation provides the same or better performance.	2020-11-10 19:09:09 -06:00
Jonathan Wright	366168aa7d	Neon: Intrinsics impl. of h2v1 & h2v2 downsampling The previous AArch64 GAS implementation has been removed, since the intrinsics implementation provides the same or better performance. There was no previous AArch32 GAS implementation.	2020-11-10 19:09:09 -06:00
Jonathan Wright	f73b1dbc60	Neon: Intrinsics implementation of RGB->Grayscale There was no previous GAS implementation.	2020-11-10 19:09:09 -06:00
Jonathan Wright	4f2216b435	Neon: Intrinsics implementation of RGB->YCbCr The previous AArch32 and AArch64 GAS implementations are retained by default when using GCC, in order to avoid a performance regression. The intrinsics implementation can be forced on or off using a new NEON_INTRINSICS CMake variable.	2020-11-10 19:09:05 -06:00
DRC	0efc4858d4	Merge branch 'master' into dev	2020-11-09 19:02:28 -06:00
DRC	02227e48a9	Travis: Combine PPC/Arm tests with jpeg-7/8 tests There is no reason not to, since the jpeg-7 and jpeg-8 API/ABI tests do not exercise the SIMD extensions any differently than the other tests.	2020-11-09 18:20:41 -06:00
DRC	c7dd191271	Merge branch 'master' into dev	2020-11-08 15:15:02 -06:00
DRC	40557b2301	Build: Fix test failures w/ Arm Neon SIMD exts Regression caused by `a46c111d9f` Because of `7723d7f7d0`, which was introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/ color conversion is disabled on platforms that have SIMD-accelerated YCbCr -> RGB color conversion but not SIMD-accelerated merged upsampling/color conversion. This was intended to improve performance with the Neon SIMD extensions, since those are the only SIMD extensions for which those circumstances apply. Under normal circumstances, the separate "plain" (non-fancy) upsampling and color conversion routines will produce bitwise-identical output to the merged upsampling/color conversion routines, but that is not the case when skipping scanlines starting at an odd-numbered scanline. The modified test introduced in `a46c111d9f` does precisely that in order to validate the fixes introduced in `9120a24743` and `a46c111d9f`. Because of `7723d7f7d0`, the segfault fixed in `9120a24743` and `a46c111d9f` didn't affect the Neon SIMD extensions, so this commit effectively reverts the test modifications in `a46c111d9f` when using those SIMD extensions. We can get rid of this hack, as well as `7723d7f7d0`, once a Neon implementation of merged upsampling/color conversion is available.	2020-11-08 14:57:01 -06:00
DRC	a524b9b06b	Travis: Regression-test Armv8 and PPC SIMD exts Currently this only tests the 64-bit code paths, but it's better than nothing.	2020-11-06 17:24:16 -06:00
DRC	7c1a1789d2	Merge branch 'master' into dev	2020-11-05 16:04:55 -06:00
DRC	6e632af9f6	Demote "fast" [I]DCT algorithms to legacy status - Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.	2020-11-05 15:59:31 -06:00
DRC	cd342acf7f	Merge branch 'master' into dev	2020-10-27 16:45:23 -05:00
DRC	c3bfbde21d	jpegtran.c: "subarea" = "region" It is our convention to use the term "region" when referring to crop specs, since this is more consistent with the terminology used by the rest of the image processing community.	2020-10-27 15:45:24 -05:00
DRC	a8656d703d	jpegtran.1: Minor formatting tweak	2020-10-27 15:45:24 -05:00
DRC	9ecb67c219	transupp.c: Code formatting tweaks	2020-10-27 15:45:24 -05:00
DRC	53c685b7f4	cdjpeg.h: Remove unused function stub enable_signal_catcher() was only needed with libjpeg's temp. file memory manager (jmemname.c), which libjpeg-turbo has never supported.	2020-10-27 15:45:24 -05:00

... 2 3 4 5 6 ...

2337 Commits