The primary purpose of this is to encourage adoption of libjpeg-turbo in
downstream Windows projects that forbid the use of "deprecated"
functions. libjpeg-turbo's usage of those functions was not actually
unsafe, because:
- libjpeg-turbo always checks the return value of fopen() and ensures
that a NULL filename can never be passed to it.
- libjpeg-turbo always checks the return value of getenv() and never
passes a NULL argument to it.
- The sprintf() calls in format_message() (jerror.c) could never
overflow the destination string buffer or leave it unterminated as
long as the buffer was at least JMSG_LENGTH_MAX bytes in length, as
instructed. (Regardless, this commit replaces those calls with
snprintf() calls.)
- libjpeg-turbo never uses sscanf() to read strings or multi-byte
character arrays.
- Because of b7d6e84d6a, wrjpgcom
explicitly checks the bounds of the source and destination strings
before calling strcat() and strcpy().
- libjpeg-turbo always ensures that the destination string is
terminated when using strncpy().
(548490fe5e made this explicit.)
Regarding thread safety:
Technically speaking, getenv() is not thread-safe, because the returned
pointer may be invalidated if another thread sets the same environment
variable between the time that the first thread calls getenv() and the
time that that thread uses the return value. In practice, however, this
could only occur with libjpeg-turbo if:
(1) A multithreaded calling application used the deprecated and
undocumented TJFLAG_FORCEMMX/TJFLAG_FORCESSE/TJFLAG_FORCESSE2 flags in
the TurboJPEG API or set one of the corresponding environment variables
(which are only intended for testing purposes.) Since the TurboJPEG API
library only ever passed string constants to putenv(), the only inherent
risk (i.e. the only risk introduced by the library and not the calling
application) was that the SIMD extensions may have read an incorrect
value from one of the aforementioned environment variables.
or
(2) A multithreaded calling application modified the value of the
JPEGMEM environment variable in one thread while another thread was
reading the value of that environment variable (in the body of
jpeg_create_compress() or jpeg_create_decompress().) Given that the
libjpeg API provides a thread-safe way for applications to modify the
default memory limit without using the JPEGMEM environment variable,
direct modification of that environment variable by calling applications
is not supported.
Microsoft's implementation of getenv_s() does not claim to be
thread-safe either, so this commit uses getenv_s() solely to mollify
Visual Studio. New inline functions and macros (GETENV_S() and
PUTENV_S) wrap getenv_s()/_putenv_s() when building for Visual Studio
and getenv()/setenv() otherwise, but GETENV_S()/PUTENV_S() provide no
advantages over getenv()/setenv() other than parameter validation. They
are implemented solely for convenience.
Technically speaking, strerror() is not thread-safe, because the
returned pointer may be invalidated if another thread changes the locale
and/or calls strerror() between the time that the first thread calls
strerror() and the time that that thread uses the return value. In
practice, however, this could only occur with libjpeg-turbo if a
multithreaded calling application encountered a file I/O error in
tjLoadImage() or tjSaveImage(). Since both of those functions
immediately copy the string returned from strerror() into a thread-local
buffer, the risk is minimal, and the worst case would involve an
incorrect error string being reported to the calling application.
Regardless, this commit uses strerror_s() in the TurboJPEG API library
when building for Visual Studio. Note that strerror_r() could have been
used on Un*x systems, but it would have been necessary to handle both
the POSIX and GNU implementations of that function and perform
widespread compatibility testing. Such is left as an exercise for
another day.
Fixes#568
libjpeg-turbo has never supported non-ANSI C compilers. Per the spec,
ANSI C compilers must have locale.h, stddef.h, stdlib.h, memset(),
memcpy(), unsigned char, and unsigned short. They must also handle
undefined structures.
(regression introduced by aa7459050d)
cjpeg sets cinfo.in_color_space to JCS_RGB as an "arbitrary guess."
Since tjLoadImage() never uses JCS_RGB, the PGM reader should treat
JCS_RGB the same as JCS_UNKNOWN.
Fixes#566
Referring to https://cmake.org/cmake/help/latest/policy/CMP0065.html,
CMake 3.3 and earlier automatically added compiler/linker flags such as
-rdynamic/-export-dynamic, which caused symbols to be exported from
executables. The primary purpose of this is to allow plugins loaded via
dlopen() to access symbols from the calling program. libjpeg-turbo
does not need this functionality, and enabling it needlessly increases
the size of the libjpeg-turbo executables.
Setting CMP0065 to NEW when using CMake 3.4 and later prevents CMake
from automatically adding the aforementioned compiler/linker flags
unless the ENABLE_EXPORTS property is set for a target (or the
CMAKE_ENABLE_EXPORTS variable is set, which causes ENABLE_EXPORTS to be
set for all targets.)
Closes#554
In theory, all objects that will be included in a Un*x shared library
must be built using PIC. In practice, most compilers don't require PIC
to be explicitly specified for jsimd_none.o, either because the compiler
automatically enables PIC in all cases (Ubuntu) or because the size of
the generated object is too small. But some rare compilers do require
PIC to be explicitly specified for jsimd_none.o.
Fixes#520
This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo
source tree, thus obsoleting and improving code coverage relative to
Google's OSS-Fuzz target for libjpeg-turbo (previously available here:
https://github.com/google/oss-fuzz).
I hope to eventually create fuzz targets for the BMP, GIF, and PPM
readers as well, which would allow for fuzz-testing compression, but
since those readers all require an input file, it is unclear how to
build an efficient fuzzer around them. It doesn't make sense to
fuzz-test compression in isolation, because compression can't accept
arbitrary input data.
(regression introduced by 16bd984557)
This implements the same fix for
jsimd_encode_mcu_AC_refine_prepare_sse2() that
a81a8c137b implemented for
jsimd_encode_mcu_AC_first_prepare_sse2().
Based on:
1a59587397eb176a91d8Fixes#509Closes#510
We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo
anymore, but they still work (bearing in mind that PowerPC builds
require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or
earlier.) Referring to #495, apparently MacPorts needs this
functionality.
When configuring a Visual Studio IDE build and passing -A arm64 to
CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE
based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of
CMAKE_SYSTEM_PROCESSOR.
- Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for
compile-time detection of Arm builds, since __arm__ and __aarch64__
are only present in GNU-compatible compilers.
- Neon/intrinsics: Use the _CountLeadingZeros() and
_CountLeadingZeros64() intrinsics provided by Visual Studio, since
__builtin_clz() and __builtin_clzl() are only present in
GNU-compatible compilers.
- Neon/intrinsics: Since Visual Studio does not support static vector
initialization, replace static initialization of Neon vectors with the
appropriate intrinsics. Compared to the static initialization
approach, this produces identical assembly code with both GCC and
Clang.
- Neon/intrinsics: Since Visual Studio does not support inline assembly
code, provide alternative code paths for Visual Studio whenever inline
assembly is used.
- Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds
(Visual Studio does not emit fused multiply-add [FMA] instructions by
default for such builds.)
- Neon/intrinsics: Move temporary buffer allocation outside of nested
loops. Since Visual Studio configures Arm builds with a relatively
small amount of stack memory, attempting to allocate those buffers
within the inner loops caused a stack overflow.
Closes#461Closes#475
The "32bit" vs. "64bit" floating point test results actually have
nothing to do with the FPU. That was a fallacious assumption based on
the observation that, with multiple CPU types, 32-bit and 64-bit builds
produce different floating point test results. It seems that this is,
in fact, due to differing compiler behavior-- more specifically, whether
fused multiply-add (FMA) instructions are used to combine multiple
floating point operations into a single instruction ("floating point
expression contraction".) GCC does this by default if the target
supports FMA instructions, which PowerPC and AArch64 targets both do.
Fixes#468
This allows the Neon intrinsics code to be built successfully (albeit
likely with reduced run-time performance) with Xcode 5.0-6.2
(iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2
will not build the Armv8 GAS code without gas-preprocessor.pl, and no
version of Xcode will build the Armv7 GAS code without
gas-preprocessor.pl, so we always use the full Neon intrinsics
implementation by default with macOS and iOS builds.
Auto-detecting the completeness of the compiler's set of Neon intrinsics
also allows us to more intelligently set the default value of
NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a
reasonable, albeit imperfect, proxy for whether a compiler has a full
and optimal set of Neon intrinsics. Specific notes:
- 64-bit RGB-to-YCbCr color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer forward DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit Huffman encoding
uses vld1q_u8_x4(), regresses with GCC
- 64-bit YCbCr-to-RGB color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
- 32-bit RGB-to-YCbCr color conversion:
uses vld1_u16_x2(), regresses with GCC
- 32-bit accurate integer forward DCT
uses vld1_s16_x3(), regression irrelevant because there was no
previous implementation
- 32-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 32-bit fast integer inverse DCT
does not use any of the intrinsics in question, regresses with GCC
- 32-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
Presumably when GCC includes a full and optimal set of Neon intrinsics,
the HAVE_VLD1* tests will pass, and the full Neon intrinsics
implementation will be enabled automatically.
There was no previous GAS implementation.
This commit also reverts 40557b2301 and
7723d7f7d0.
7723d7f7d0 was only necessary because
there was no Neon implementation of merged upsampling/color conversion,
and 40557b2301 was only necessary because
of 7723d7f7d0.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using a new
NEON_INTRINSICS CMake variable.
Regression caused by
a46c111d9f
Because of 7723d7f7d0, which was
introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/
color conversion is disabled on platforms that have SIMD-accelerated
YCbCr -> RGB color conversion but not SIMD-accelerated merged
upsampling/color conversion. This was intended to improve performance
with the Neon SIMD extensions, since those are the only SIMD extensions
for which those circumstances apply. Under normal circumstances, the
separate "plain" (non-fancy) upsampling and color conversion routines
will produce bitwise-identical output to the merged upsampling/color
conversion routines, but that is not the case when skipping scanlines
starting at an odd-numbered scanline. The modified test introduced in
a46c111d9f does precisely that in order to
validate the fixes introduced in
9120a24743 and
a46c111d9f.
Because of 7723d7f7d0, the segfault fixed
in 9120a24743 and
a46c111d9f didn't affect the Neon SIMD
extensions, so this commit effectively reverts the test modifications in
a46c111d9f when using those SIMD
extensions. We can get rid of this hack, as well as
7723d7f7d0, once a Neon implementation of
merged upsampling/color conversion is available.
- Set CPU_TYPE=arm if performing a 32-bit build on an AArch64 system.
This eliminates the need to use a CMake toolchain file.
- Set RPMARCH=armv7hl if building on a 32-bit Arm system with an FPU.
- Set RPMARCH=armv7hl and DEBARCH=armhf if performing a 32-bit build
using a gnueabihf toolchain.
- If performing a 32-bit Arm build, generate a 32-bit supplementary DEB
package for AArch64 systems.
- Introduce a partial image decompression regression test script that
validates the correctness of jpeg_skip_scanlines() and
jpeg_crop_scanlines() for a variety of cropping regions and libjpeg
settings.
This regression test catches the following issues:
#182, fixed in 5bc43c7821#237, fixed in 6e95c08649794f5018608f37250026a45ead2db8
#244, fixed in 398c1e9acc#441, fully fixed in this commit
It does not catch the following issues:
#194, fixed in 773040f9d9#244 (additional segfault), fixed in
9120a24743
- Modify the libjpeg-turbo regression test suite (make test) so that it
checks for the issue reported in #441 (segfault in
jpeg_skip_scanlines() when used with 4:2:0 merged upsampling/color
conversion.)
- Fix issues in jpeg_skip_scanlines() that caused incorrect output with
h2v2 (4:2:0) merged upsampling/color conversion. The previous commit
fixed the segfault reported in #441, but that was a symptom of a
larger problem. Because merged 4:2:0 upsampling uses a "spare row"
buffer, it is necessary to allow the upsampler to run when skipping
rows (fancy 4:2:0 upsampling, which uses context rows, also requires
this.) Otherwise, if skipping starts at an odd-numbered row, the
output image will be incorrect.
- Throw an error if jpeg_skip_scanlines() is called with two-pass color
quantization enabled. With two-pass color quantization, the first
pass occurs within jpeg_start_decompress(), so subsequent calls to
jpeg_skip_scanlines() interfere with the multipass state and prevent
the second pass from occurring during subsequent calls to
jpeg_read_scanlines().
This CMake variable is intended to define a wrapper program for
executing cross-compiled executables. However, CTest doesn't use
CMAKE_CROSSCOMPILING_EMULATOR, because it isn't obvious which tests
should be executed with the wrapper and which tests are scripts that
don't need it. This commit manually prepends
${CMAKE_CROSSCOMPILING_EMULATOR} to all unit test command lines that
execute a program built by the libjpeg-turbo build system. Thus, one
can set CMAKE_CROSSCOMPILING_EMULATOR in a CMake toolchain file to (for
instance) "qemu-{architecture} {qemu_arguments}") in order to execute
all eligible unit tests using QEMU.
If the TurboJPEG instance passed to tjDecodeYUV[Planes]() was previously
used to decompress a progressive JPEG image, then we need to disable the
progressive decompression parameters in the underlying libjpeg instance
before calling jinit_master_decompress().
This commit also modifies the build system so that the "tjtest" target
will test for this issue, and it corrects a previous oversight in the
build system whereby tjbenchtest did not test progressive
compression/decompression unless WITH_JAVA was true.
(regression introduced by 5b177b3cab)
The SSE2 implementation of progressive Huffman encoding performed
extraneous iterations when the scan length was a multiple of 16.
Based on:
bb7f1ef983Fixes#335Closes#367