- Integrate
c07bba2730 (diff-1e2deb5301e9481203102fcddd1b2d0d2bf0ddc1cbb445c7f4b6414a3b869ce8)
so that the default man directory is <CMAKE_INSTALL_PREFIX>/share/man
on FreeBSD systems.
- For good measure, integrate
f835f189ae
so that the default info directory is
<CMAKE_INSTALL_PREFIX>/share/info on FreeBSD systems, even though we
don't use that directory.
- Automatically set the CMake variable type to PATH for any
user-specified CMAKE_INSTALL_*DIR variables.
Addresses concerns raised in #326, #346, #648Closes#648
libjpeg-turbo always includes Adobe APP14 markers in the lossless JPEG
images that it generates, but some compressors (e.g. accusoft PICTools
Medical) do not.
Fixes#743
ef9a4e05ba (libjpeg-turbo 1.4.x), which
was based on
https://bug815473.bmoattachments.org/attachment.cgi?id=692126
(https://bugzilla.mozilla.org/show_bug.cgi?id=815473), modified the C
baseline Huffman encoder so that it precomputes jpeg_nbits_table, in
order to facilitate sharing the table among multiple processes.
However, libjpeg-turbo never shared the table, and because the table was
implemented as a static array, f3a8684cd1
(libjpeg-turbo 1.5.x) and 37bae1a0e9
(libjpeg-turbo 2.0.x) each introduced a duplicate copy of the table for
(respectively) the SSE2 baseline Huffman encoder and the C progressive
Huffman encoder.
This commit does the following:
- Move the duplicated code in jchuff.c and jcphuff.c, originally
introduced in 0cfc4c17b7 and
37bae1a0e9, into a header
(jpeg_nbits.h).
- Credit the co-author of 0cfc4c17b7.
(Refer to https://sourceforge.net/p/libjpeg-turbo/patches/57).
- Modify the SSE2 baseline Huffman encoder so that the C Huffman
encoders can share its definition of jpeg_nbits_table.
- Move the definition of jpeg_nbits_table into a C source file
(jpeg_nbits.c) rather than a header, and define the table only if
USE_CLZ_INTRINSIC is undefined and the SSE2 baseline Huffman encoder
will not be built.
- Apply hidden symbol visibility to the shared definition of
jpeg_nbits_table, if the compiler supports the necessary attribute.
(In practice, only Visual C++ doesn't.)
Closes#114
See also:
https://bugzilla.mozilla.org/show_bug.cgi?id=1501523
- Clarify that lossless JPEG is slower than and doesn't compress as well
as lossy JPEG. (That should be obvious, because "lossy" literally
means that data is thrown away.)
- Re-generate TurboJPEG C API documentation using Doxygen 1.9.8.
- Clarify that setting the data_precision field in jpeg_compress_struct
to 16 requires lossless mode.
- Explain what the predictor selection value actually does. (Refer to
Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994, Section
H.1.2.1.)
TJPARAM_MAXPIXELS was previously hidden and used only for fuzz testing,
but it is potentially useful for calling applications as well,
particularly if they want to guard against excessive memory consumption
by the tj3LoadImage*() functions. The parameter has also been extended
to decompression and lossless transformation functions/methods, mainly
as a convenience. (It was already possible for calling applications to
impose their own JPEG image size limits by reading the JPEG header prior
to decompressing or transforming the image.)
Unfortunately, most of the GitHub security advisories filed against
libjpeg-turbo thus far have been the result of non-exploitable API
abuses triggered by randomly-generated test programs and accompanied by
wild claims of denials of service with no demonstrable or even probable
exploit that might cause such a DoS (assuming a service even existed
that used the API in question.) Security advisories remain private
unless accepted, and I cannot accept them if they do not describe an
actual security issue. Thus, it's best to steer most users toward
regular bug reports.
Because of the TurboJPEG 3 API overhaul, the legacy decompression and
lossless transformation functions now wrap the new TurboJPEG 3
functions. For performance reasons, we don't want to read the JPEG
header more than once during the same operation, so the wrapped
functions do not read the header if it has already been read by a
wrapper function. Initially the TurboJPEG 3 functions used a state
variable to track whether the header had already been read, but
b94041390c made this more robust by using
the libjpeg global decompression state instead. If a wrapper function
has already read the JPEG header successfully, then the global
decompression state will be DSTATE_READY, and the logic introduced in
b94041390c will prevent the header from
being read again.
A subtle issue arises because tj3DecompressHeader() does not call
jpeg_abort_decompress() if jpeg_read_header() fails. (That is arguably
a bug, but it has existed since the very first implementation of the
function.) Depending on the nature of the failure, this can cause
tj3DecompressHeader() to return an error code and leave the libjpeg
global decompression state set to DSTATE_INHEADER. If a misbehaved
application ignored the error and subsequently called a TurboJPEG
decompression or lossless transformation function, then the function
would fail to read the JPEG header because the global decompression
state was greater than DSTATE_START. In the case of the decompression
functions, this was innocuous, because jpeg_calc_output_dimensions()
and jpeg_start_decompress() both sanity check the global decompression
state. However, it was possible for a misbehaved application to call
tj3DecompressHeader() with junk data, ignore the return value, and pass
the same junk data into tj3Transform(). Because tj3DecompressHeader()
left the global decompression state set to DSTATE_INHEADER,
tj3Transform() failed to detect the junk data (because it didn't try to
read the JPEG header), and it called jtransform_request_workspace() with
dinfo->image_width and dinfo->image_height still initialized to 0.
Because jtransform_request_workspace() does not sanity check the
decompression state, a division-by-zero error occurred with certain
combinations of transform options in which TJXOPT_TRIM or TJXOPT_CROP
was specified. However, it should be noted that TJXOPT_TRIM and
TJXOPT_CROP cannot be expected to work properly without foreknowledge of
the JPEG source image dimensions, which cannot be gained except by
calling tj3DecompressHeader() successfully. Thus, a calling application
is inviting trouble if it does not check the return value of
tj3DecompressHeader() and sanity check the JPEG source image dimensions
before calling tj3Transform(). This commit softens the failure, but the
failure is still due to improper API usage.
(regression introduced by 300a344d65)
300a344d65 fixed the recompression code
path but also broke the pure decompression code path, because the fix
caused the TurboJPEG decompression instance to be destroyed before
tj3SaveImage() could use it. Furthermore, the fix in
300a344d65 prevented pixel density
information from being transferred from the input image to the output
image when recompressing. This commit does the following:
- Modify tjexample.c so that a single TurboJPEG instance is initialized
for lossless transformation and shared by all code paths. In addition
to fixing both of the aforementioned issues, this makes the code more
readable.
- Extend tjexampletest to test the recompression code path, thus
ensuring that the issues fixed by this commit and
300a344d65 are not reintroduced.
- Modify tjexample.c to remove redundant fclose(), tj3Destroy(), and
tj3Free() calls.
This corresponds to max_memory_to_use in the jpeg_memory_mgr struct in
the libjpeg API, except that the TurboJPEG parameter is specified in
megabytes. Because this is 2023 and computers with less than 1 MB of
memory are not a thing (at least not within the scope of libjpeg-turbo
support), it isn't useful to allow a limit less than 1 MB to be
specified. Furthermore, because TurboJPEG parameters are signed
integers, if we allowed the memory limit to be specified in bytes, then
it would be impossible to specify a limit larger than 2 GB on 64-bit
machines. Because max_memory_to_use is a long signed integer,
effectively we can specify a limit of up to 2 petabytes on 64-bit
machines if the TurboJPEG parameter is specified in megabytes. (2 PB
should be enough for anybody, right?)
This commit also bumps the TurboJPEG API version to 3.0.1. Since the
TurboJPEG API version no longer tracks the libjpeg-turbo version, it
makes sense to increment the API revision number when adding constants,
to increment the minor version number when adding functions, and to
increment the major version number for a complete overhaul.
This commit also removes the vestigial TJ_NUMPARAM macro, which was
never defined because it proved unnecessary.
Partially implements #735
This is very subtle, but if a user specifies a libjpeg virtual array
memory limit via the JPEGMEM environment variable and one of the
tj3Compress*() functions hits that limit, the libjpeg error handler
will be invoked in jpeg_start_compress() (more specifically in
realize_virt_arrays() in jinit_compress_master()) before the libjpeg
global compression state can be incremented. Thus,
jpeg_abort_compress() will not be called before the tj3Compress*()
function exits, the unrealized virtual arrays will not be freed, and if
the TurboJPEG compression instance is reused, those unrealized virtual
arrays will count against the specified memory limit. This could cause
subsequent compression operations that require smaller virtual arrays
(or even no virtual arrays at all) to fail when they would otherwise
succeed. In reality, the vast majority of calling programs would abort
and free the TurboJPEG compression instance if one of the tj3Compress*()
functions failed, but TJBench is a rare exception. This issue does not
bear documenting because of its subtlety and rarity and because JPEGMEM
is not a documented feature of the TurboJPEG API.
Note that the issue does not exist in the tj3Encode*() and tj3Decode*()
functions, because realize_virt_arrays() is never called in the body of
those functions. The issue also does not exist in the tj3Decompress*()
and tj3Transform() functions, because those functions ensure that the
JPEG header is read (and thus the libjpeg global decompression state is
incremented) prior to calling a function that calls
realize_virt_arrays() (i.e. jpeg_start_decompress() or
jpeg_read_coefficients().) If realize_virt_arrays() failed in the body
of jpeg_write_coefficients(), then tj3Transform() would abort without
calling jpeg_abort_compress(). However, since jpeg_start_compress() is
never called in the body of tj3Transform(), no virtual arrays are ever
requested from the compression object, so failing to call
jpeg_abort_compress() would be innocuous.
Those obsolete memory managers have never been included in
libjpeg-turbo, nor has libjpeg-turbo ever claimed to support MS-DOS
or Mac operating systems prior to OS X 10.4 "Tiger." Note that we
retain the ability to use temp files, even though libjpeg-turbo does
not use them. This allows downstream implementations to write their own
memory managers that use temp files, if necessary.
If the align parameter was set to an unreasonably large value, such as
0x2000000, strides[0] * ph0 and strides[1] * ph1 could have overflowed
the int datatype and wrapped around when computing (src|dst)Planes[1]
and (src|dst)Planes[2] (respectively.) This would have caused
(src|dst)Planes[1] and (src|dst)Planes[2] to point to lower addresses in
the YUV buffer than expected, so the worst case would have been a
visually incorrect output image, not a buffer overrun or other
exploitable issue.
Because of 47656a0820, we can now
reliably determine the correct default values for FLOATTEST8 and
FLOATTEST12 when using Clang or GCC to build for AArch64 or PowerPC
platforms. (Testing confirms that this is the case with GCC 5-13 and
Clang 5-14 on Ubuntu/AArch64, GCC 4 on CentOS 7/PPC, and GCC 8-10 and
Clang 6-12 on Ubuntu/PPCLE.) Other CPU architectures and compilers can
be added on a case-by-case basis as they are tested.
Because of bf01ed2fbc, the simd field in
huff_entropy_encoder (and, by extension, the simd field in
savable_state) is only initialized if WITH_SIMD is defined. Due to an
oversight, the simd field in savable_state was queried in flush_bits()
regardless of whether WITH_SIMD was defined. In most cases, both
branches of the query have identical code, and the optimizer removes the
branch. However, because the legacy Neon GAS Huffman encoder uses the
older bit buffer logic from libjpeg-turbo 2.0.x and prior (refer to
087c29e07f), the branches do not have
identical code when building for AArch64 with NEON_INTRINSICS undefined
(which will be the case if WITH_SIMD is undefined.) Thus, if
libjpeg-turbo was built for AArch64 with the SIMD extensions disabled
at build time, it was possible for the Neon GAS branch in flush_bits()
to be taken, which would have set put_bits to a value that is incorrect
for the C Huffman encoder. Referring to #728, a user reported that this
issue sometimes caused libjpeg-turbo to generate bogus JPEG images if it
was built for AArch64 without SIMD extensions and subsequently used
through the Qt framework. (It should be noted, however, that disabling
the SIMD extensions in AArch64 builds of libjpeg-turbo is inadvisable
for performance reasons.)
I was unable to reproduce the issue on Linux/AArch64 using libjpeg-turbo
alone, despite testing various versions of GCC and Clang and various
optimization levels. However, the issue is reproducible using MSan with
-O0, so this commit also modifies the GitHub Actions workflow so that
compiler optimization is disabled in the linux-msan job. That should
prevent the issue or similar issues from re-emerging.
Fixes#728
libjpeg-turbo implements 4:4:0 "fancy" (smooth) upsampling, which is
enabled by default when decompressing JPEG images that use 4:4:0
chrominance subsampling. libjpeg did not and does not implement fancy
4:4:0 upsampling.
The MD5 sums associated with FLOATTEST8=fp-contract and
FLOATTEST12=fp-contract are appropriate for GCC (tested v5 through v13)
with -ffp-contract=fast, which is the default when compiling for an
architecture that has fused multiply-add (FMA) instructions. However,
different MD5 sums are needed for Clang (tested v5 through v14) with
-ffp-contract=on, which is now the default in Clang 14 when compiling
for an architecture that has FMA instructions.
Refer to #705, #709, #710
* libjpeg-turbo/2.1.x:
ChangeLog.md: List CVE ID fixed by ccaba5d7
jpeglib.h: Document that JCS_RGB565 is decomp-only
Fix block smoothing w/vert.-subsampled prog. JPEGs
# By DRC
# Via DRC
* commit 'eadd243':
Fix interblock smoothing with narrow prog. JPEGs
jchuff.c/flush_bits(): Guard against free_bits < 0
jchuff.c/flush_bits(): Guard against put_bits < 0
Restore xform fuzzer behavior from before 19f9d8f0
xform fuzz: Use src subsamp to calc dst buf size
Doc: Mention that we are a JPEG ref implementation
jchuff.c: Test for out-of-range coefficients
turbojpeg.h: Make customFilter() proto match doc
ChangeLog.md: Fix typo
tjTransform(): Calc dst buf size from xformed dims
Fix build warnings/errs w/ -DNO_GETENV/-DNO_PUTENV
GitHub: Fix x32 build
tjexample.c: Prevent integer overflow
jpeg_crop_scanline: Fix calc w/sclg + 2x4,4x2 samp
Decomp: Don't enable 2-pass color quant w/ RGB565
TJBench: w/JPEG input imgs, set min tile= MCU size
Bump version to 2.1.6 to prepare for new commits
GitHub: Add pull request template
Build: Clarify CMAKE_OSX_ARCHITECTURES error
Build: Fail if included with add_subdirectory()
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# release/deb-control.in
# By DRC
# Via DRC
* tag '2.1.5.1':
ChangeLog.md: Document d743a2c1
OSS-Fuzz: Bail out immediately on decomp failure
SIMD/x86: Initialize simd_support before every use
Build: Define THREAD_LOCAL even if !WITH_TURBOJPEG
# Conflicts:
# CMakeLists.txt
The 5x5 interblock smoothing implementation, introduced in libjpeg-turbo
2.1, improperly extended the logic from the traditional 3x3 smoothing
implementation. Both implementations point prev_block_row and
next_block_row to the current block row when processing, respectively,
the first and the last block row in the image:
if (block_row > 0 || cinfo->output_iMCU_row > 0)
prev_block_row =
buffer[block_row - 1] + cinfo->master->first_MCU_col[ci];
else
prev_block_row = buffer_ptr;
if (block_row < block_rows - 1 ||
cinfo->output_iMCU_row < last_iMCU_row)
next_block_row =
buffer[block_row + 1] + cinfo->master->first_MCU_col[ci];
else
next_block_row = buffer_ptr;
6d91e950c8 naively extended that logic to
accommodate a 5x5 smoothing window:
if (block_row > 1 || cinfo->output_iMCU_row > 1)
prev_prev_block_row =
buffer[block_row - 2] + cinfo->master->first_MCU_col[ci];
else
prev_prev_block_row = prev_block_row;
if (block_row < block_rows - 2 ||
cinfo->output_iMCU_row + 1 < last_iMCU_row)
next_next_block_row =
buffer[block_row + 2] + cinfo->master->first_MCU_col[ci];
else
next_next_block_row = next_block_row;
However, this new logic was only correct if block_rows == 1, so the
values of prev_prev_block_row and next_next_block_row were incorrect
when processing, respectively, the second and second to last iMCU rows
in a vertically-subsampled progressive JPEG image.
The intent was to:
- point prev_block_row to the current block row when processing the
first block row in the image,
- point prev_prev_block_row to prev_block_row when processing the first
two block rows in the image,
- point next_block_row to the current block row when processing the
last block row in the image, and
- point next_next_block_row to next_block_row when processing the last
two block rows in the image.
This commit modifies decompress_smooth_data() so that it computes the
current block row's position relative to the whole image and sets
the block row pointers based on that value.
This commit also restores a line of code that was accidentally deleted
by 6d91e950c8:
access_rows += compptr->v_samp_factor; /* prior iMCU row too */
access_rows is merely a sanity check that tells the access_virt_barray()
method to generate an error if accessing the specified number of rows
would cause a buffer overrun. Essentially, it is a belt-and-suspenders
measure to ensure that j*init_d_coef_controller() allocated enough rows
for the full-image virtual array. Thus, excluding that line of code did
not cause an observable issue.
This commit also documents eadd2436ee in
the change log.
Fixes#721
The 5x5 interblock smoothing implementation, introduced in libjpeg-turbo
2.1, improperly extended the logic from the traditional 3x3 smoothing
implementation. Both implementations point prev_block_row and
next_block_row to the current block row when processing, respectively,
the first and the last block row in the image:
if (block_row > 0 || cinfo->output_iMCU_row > 0)
prev_block_row =
buffer[block_row - 1] + cinfo->master->first_MCU_col[ci];
else
prev_block_row = buffer_ptr;
if (block_row < block_rows - 1 ||
cinfo->output_iMCU_row < last_iMCU_row)
next_block_row =
buffer[block_row + 1] + cinfo->master->first_MCU_col[ci];
else
next_block_row = buffer_ptr;
6d91e950c8 naively extended that logic to
accommodate a 5x5 smoothing window:
if (block_row > 1 || cinfo->output_iMCU_row > 1)
prev_prev_block_row =
buffer[block_row - 2] + cinfo->master->first_MCU_col[ci];
else
prev_prev_block_row = prev_block_row;
if (block_row < block_rows - 2 ||
cinfo->output_iMCU_row + 1 < last_iMCU_row)
next_next_block_row =
buffer[block_row + 2] + cinfo->master->first_MCU_col[ci];
else
next_next_block_row = next_block_row;
However, this new logic was only correct if block_rows == 1, so the
values of prev_prev_block_row and next_next_block_row were incorrect
when processing, respectively, the second and second to last iMCU rows
in a vertically-subsampled progressive JPEG image.
The intent was to:
- point prev_block_row to the current block row when processing the
first block row in the image,
- point prev_prev_block_row to prev_block_row when processing the first
two block rows in the image,
- point next_block_row to the current block row when processing the
last block row in the image, and
- point next_next_block_row to next_block_row when processing the last
two block rows in the image.
This commit modifies decompress_smooth_data() so that it computes the
current block row's position relative to the whole image and sets
the block row pointers based on that value.
This commit also restores a line of code that was accidentally deleted
by 6d91e950c8:
access_rows += compptr->v_samp_factor; /* prior iMCU row too */
access_rows is merely a sanity check that tells the access_virt_barray()
method to generate an error if accessing the specified number of rows
would cause a buffer overrun. Essentially, it is a belt-and-suspenders
measure to ensure that j*init_d_coef_controller() allocated enough rows
for the full-image virtual array. Thus, excluding that line of code did
not cause an observable issue.
This commit also documents dbae59281f in
the change log.
Fixes#721
This allows debuggers and profilers to reliably capture backtraces from
within the x86-64 SIMD functions.
In places where rbp was previously used to access temporary variables
(after stack alignment), we now use r15 and save/restore it accordingly.
The total amount of work is approximately the same, because the previous
code pushed the pre-alignment stack pointer to the aligned stack. The
new prologue and epilogue actually have fewer instructions.
Also note that the {un}collect_args macros now use rbp instead of rax to
access arguments passed on the stack, so we save a few instructions
there as well.
Based on:
debcc7c3b4Closes#707Closes#708
Due to an oversight, the assignment of DC05, DC10, DC15, DC20, and DC25
(the right edge coefficients in the 5x5 interblock smoothing window) in
decompress_smooth_data() was incorrect for images exactly two MCU blocks
wide. For such images, DC04, DC09, DC14, DC19, and DC24 were assigned
values based on the last MCU column, but DC05, DC10, DC15, DC20, and
DC25 were assigned values based on the first MCU column (because
block_num + 1 was never less than last_block_column.) This commit
modifies jdcoefct.c so that, for images at least two MCU blocks wide,
DC05, DC10, DC15, DC20, and DC25 are assigned the same values as DC04,
DC09, DC14, DC19, and DC24 (respectively.) DC05, DC10, DC15, DC20, and
DC25 are then immediately overwritten for images more than two MCU
blocks wide.
Since this issue was minor and not likely obvious to an end user, the
fix is undocumented.
Fixes#700
Due to an oversight, the assignment of DC05, DC10, DC15, DC20, and DC25
(the right edge coefficients in the 5x5 interblock smoothing window) in
decompress_smooth_data() was incorrect for images exactly two MCU blocks
wide. For such images, DC04, DC09, DC14, DC19, and DC24 were assigned
values based on the last MCU column, but DC05, DC10, DC15, DC20, and
DC25 were assigned values based on the first MCU column (because
block_num + 1 was never less than last_block_column.) This commit
modifies jdcoefct.c so that, for images at least two MCU blocks wide,
DC05, DC10, DC15, DC20, and DC25 are assigned the same values as DC04,
DC09, DC14, DC19, and DC24 (respectively.) DC05, DC10, DC15, DC20, and
DC25 are then immediately overwritten for images more than two MCU
blocks wide.
Since this issue was minor and not likely obvious to an end user, the
fix is undocumented.
Fixes#700
(regression introduced by fc01f4673b)
Because of the TurboJPEG 3 API overhaul, the pure compression code path
in tjexample.c now creates a TurboJPEG compression instance prior to
calling tj3LoadImage(). However, due to an oversight, a compression
instance was no longer created in the recompression code path. Thus,
that code path attempted to reuse the TurboJPEG decompression instance,
which caused an error ("tj3Set(): TJPARAM_QUALITY is not applicable to
decompression instances.") This commit modifies tjexample.c so that the
recompression code path creates a new TurboJPEG compression instance if
one has not already been created.
Fixes#716
This fixes a buffer overrun, reported by OSS-Fuzz, that occurred when
attempting to transform a specially-crafted malformed arithmetic-coded
JPEG image into a baseline Huffman-coded JPEG destination image with
default Huffman tables. This issue probably had a similar root cause to
the issue fixed in 31a301389b, but in this
case, the issue only occurred with the SIMD baseline Huffman encoder in
libjpeg-turbo 2.1.x and 2.0.x. It was not reproducible in 3.0.x or when
using the C baseline Huffman encoder. (NOTE: In order to reproduce the
issue with 2.1.x, it was necessary to revert
58cee6d90cd616c86fae142891b987c289ea055a.)
Because libjpeg-turbo 3.0.x now supports multiple data precisions in the
same build, the regression test system can test the 8-bit and 12-bit
floating point DCT/IDCT algorithms separately. The expected MD5 sums
for those tests are communicated to the test system using the FLOATTEST8
and FLOATTEST12 CMake variables. Whereas it is possible to
intelligently set a default value for FLOATTEST8 when building for
x86[-64] and a default value for FLOATTEST12 when building for x86-64,
it is not possible with other architectures. (Refer to #705, #709,
and #710.) Clang 14, for example, now enables FMA (fused multiply-add)
instructions by default on architectures that support them, but with
AArch64 builds, the results are not the same as when using GCC/AArch64
with FMA instructions enabled. Thus, setting FLOATTEST12=fp-contract
doesn't make the tests pass. It was already impossible to intelligently
set a default for FLOATTEST8 with i386 builds, but referring to #710,
that appears to be the case with other non-x86-64 builds as well.
Back in 1991, when Tom Lane first released libjpeg, some CPUs had
floating point units and some didn't. It could take minutes to compress
or decompress a 1-megapixel JPEG image using the "slow" integer DCT/IDCT
algorithms, and the floating point algorithms were significantly faster
on systems that had an FPU. On systems without FPUs, floating point
math was emulated and really slow, so Tom also developed "fast" integer
DCT/IDCT algorithms to speed up JPEG performance, at the expense of
accuracy, on those systems. Because of libjpeg-turbo's SIMD extensions,
the floating point algorithms are now significantly slower than the
"slow" integer algorithms without being significantly more accurate, and
the "fast" integer algorithms fail the ISO/ITU-T conformance tests
without being any faster than the "slow" integer algorithms on x86
systems. Thus, the floating point and "fast" integer algorithms are
considered legacy features.
In order for the floating point regression tests to be useful, the
results of the tests must be validated against an independent metric.
(In other words, it wouldn't be useful to use the floating point
DCT/IDCT algorithms to determine the expected results of the floating
point DCT/IDCT algorithms.) In the past, I attempted without success to
develop a low-level floating point test that would run at configure time
and determine the appropriate default value of FLOATTEST*. Barring that
approach, the only other possibilities would be:
1. Develop a test framework that compares the floating point results
with a margin of error, as TJUnitTest does. However, that effort isn't
justified unless it could also benefit non-legacy features.
2. Compare the floating point results against an expected MD5 sum, as we
currently do. However, as previously described, it isn't possible in
most cases to determine an appropriate default value for the expected
MD5 sum.
For the moment, it makes the most sense to disable the 8-bit floating
point tests by default except with x86[-64] builds and to disable the
12-bit floating point tests by default except with x86-64 builds. That
means that the floating point algorithms will still be regression tested
when performing x86[-64] builds, but other types of builds will have to
opt in to the same regression tests. Since the floating point DCT/IDCT
algorithms are unlikely to change ever again (the only reason they still
exist at all is to maintain backward compatibility with libjpeg), this
seems like a reasonable tradeoff.
This fixes a UBSan negative shift warning, reported by OSS-Fuzz, that
occurred when attempting to transform a specially-crafted malformed
arithmetic-coded JPEG image into a baseline Huffman-coded JPEG
destination image with default Huffman tables. This issue probably
had a similar root cause to the issue fixed in
31a301389b, but in this case, the issue
only occurred with the SIMD baseline Huffman encoder in libjpeg-turbo
2.1.x. It was not reproducible in 2.0.x or 3.0.x or when using the
C baseline Huffman encoder.