* commit '15274b901acb75d6d2433e8578f3cfbc6f4f5fd9': (98 commits)
AppVeyor: Use SignPath release cert/only sign tags
xform fuzz: Use only xform opts to set entropy alg
jchuff.c: Test for out-of-range coefficients
turbojpeg.h: Make customFilter() proto match doc
ChangeLog.md: Fix typo
djpeg: Fix -map option with 12-bit data precision
Disallow color quantization with lossless decomp
tj3Transform: Calc dst buf size from xformed dims
README.md: Include link to project home page
AppVeyor: Only add installers to zip file
AppVeyor: Integrate with SignPath.io
Fix build warnings/errs w/ -DNO_GETENV/-DNO_PUTENV
GitHub: Fix x32 build
Bump version to 3.0.0
tjexample.c: Prevent integer overflow
Disallow merged upsampling with lossless decomp
SECURITY.md: Wordsmithing and clarifications
GitHub: Add security policy
ChangeLog.md: List CVE ID fixed by 9f756bc6
jpeg_crop_scanline: Fix calc w/sclg + 2x4,4x2 samp
...
# By DRC (96) and others
# Via DRC
* tag '3.0.0': (98 commits)
AppVeyor: Use SignPath release cert/only sign tags
xform fuzz: Use only xform opts to set entropy alg
jchuff.c: Test for out-of-range coefficients
turbojpeg.h: Make customFilter() proto match doc
ChangeLog.md: Fix typo
djpeg: Fix -map option with 12-bit data precision
Disallow color quantization with lossless decomp
tj3Transform: Calc dst buf size from xformed dims
README.md: Include link to project home page
AppVeyor: Only add installers to zip file
AppVeyor: Integrate with SignPath.io
Fix build warnings/errs w/ -DNO_GETENV/-DNO_PUTENV
GitHub: Fix x32 build
Bump version to 3.0.0
tjexample.c: Prevent integer overflow
Disallow merged upsampling with lossless decomp
SECURITY.md: Wordsmithing and clarifications
GitHub: Add security policy
ChangeLog.md: List CVE ID fixed by 9f756bc6
jpeg_crop_scanline: Fix calc w/sclg + 2x4,4x2 samp
...
# Conflicts:
# CMakeLists.txt
# ChangeLog.md
# fuzz/decompress.cc
# fuzz/decompress_yuv.cc
If a hypothetical calling application does something really stupid and
changes cinfo->data_precision after calling jpeg_start_*compress(), then
the precision-specific methods called by jpeg_write_scanlines(),
jpeg_write_raw_data(), jpeg_finish_compress(), jpeg_read_scanlines(),
jpeg_read_raw_data(), or jpeg_start_output() may not be initialized.
Ensure that the first precision-specific method (which will always be
cinfo->main->process_data*(), cinfo->coef->compress_data*(), or
cinfo->coef->decompress_data()) called by any global function that may
be called after jpeg_start_*compress() is initialized and non-NULL.
This increases the likelihood (but does not guarantee) that a
hypothetical stupid calling application will fail gracefully rather than
segfault if it changes cinfo->data_precision after calling
jpeg_start_*compress(). A hypothetical stupid calling application can
still bork itself by changing cinfo->data_precision after initializing
the source manager but before calling jpeg_start_compress(), or after
initializing the destination manager but before calling
jpeg_start_decompress().
As with x86-64, the Power ISA basically implements 64-bit instructions
as extensions of their 32-bit counterparts. Thus, 64-bit Power ISA CPUs
can natively execute legacy 32-bit PowerPC instructions when running in
big-endian mode. Most Power ISA support has shifted (pun intended) to
little-endian mode, so there are few remaining operating systems that
support big-endian mode. Debian is one of them, however (albeit
unofficially.)
Unlike other versions of Clang 14.0.0, AppleClang 14.0.0 in Xcode 14.2
retains the old -ffp-contract=off default. Apple didn't adopt
-ffp-contract=on as the default until Xcode 14.3 (AppleClang 14.0.3.)
This has been confirmed in the Xcode 14.3 release notes.
We use a standard set of strict compiler warnings with Clang and GCC to
continuously test and maintain C89 conformance in the libjpeg API code.
However, SIMD extensions need not comply with that. The AltiVec code
specifically uses some C99isms, so disable -Wc99-extensions and
-Wpedantic in the scope of that code. Also disable -Wshadow, because
I'm too lazy to fix the TRANSPOSE() macro. Also
use #ifdef __BIG_ENDIAN__ and #if defined(__BIG_ENDIAN__) instead
of #if __BIG_ENDIAN__
We use a standard set of strict compiler warnings with Clang and GCC to
continuously test and maintain C89 conformance in the libjpeg API code.
However, SIMD extensions need not comply with that. The Neon code
specifically uses some C99isms, so disable
-Wdeclaration-after-statement, -Wc99-extensions, and -Wpedantic in the
scope of that code. Also modify the Neon feature tests so that they
will succeed if any of the aforementioned compiler warnings are enabled.
Rename the ARMV8_BUILD CMake variable to SECONDARY_BUILD, and modify the
makemacpkg script so that it allows any architecture in a primary or
secondary build. The idea is that Apple Silicon users can package an
arm64 primary build and a secondary x86_64 build, and Intel users can
package an x86_64 primary build and a secondary arm64 build, using the
same procedure.
Also simplify the iOS build instructions, using the
CMAKE_OSX_ARCHITECTURES variable rather than a toolchain.
Since libjpeg-turbo does not support Exif, the only way it can embed
density information in a JPEG image is by using the JFIF marker, which
is only written if the JPEG colorspace is YCbCr or grayscale.
(Referring to the conversation under #793, we may need to further
restrict that to 8-bit-per-sample JPEG images, because the JFIF spec
requires 8-bit data precision.)
- Due to an oversight, a113506d17
(libjpeg-turbo 1.4 beta1) effectively made the call to
std_huff_tables() in jpeg_set_defaults() a no-op if the Huffman tables
were previously defined, which made it impossible to disable Huffman
table optimization or progressive mode if they were previously enabled
in the same API instance. std_huff_tables() retains its previous
behavior for decompression instances, but it now force-enables the
standard (baseline) Huffman tables for compression instances.
- Due to another oversight, there was no way to disable lossless mode
if it was previously enabled in a particular API instance.
jpeg_set_defaults() now accomplishes this, which makes
TJ*PARAM_LOSSLESS behave as intended/documented.
- Due to yet another oversight, setCompDefaults() in the TurboJPEG API
library permanently modified the value of TJ*PARAM_SUBSAMP when
generating a lossless JPEG image, which affected subsequent lossy
compression operations. This issue was hidden by the issue above and
thus does not need to be publicly documented.
Fixes#792
The target data precision isn't known at the time that the calling
program sets TJPARAM_LOSSLESSPT, so tj3Set() needs to allow all possible
values (from 0 to 15.) jpeg_enable_lossless(), which is called within
the body of tj3Compress*(), will throw an error if the point transform
value is greater than {data precision} - 1.
Since LLVM/Windows emulates Visual Studio, CMake sets
CMAKE_C_SIMULATE_ID="MSVC" but does not set MSVC. Thus, our build
system needs to enable most (but not all) of the Visual Studio features
when CMAKE_C_SIMLUATE_ID="MSVC". Support for LLVM/Windows is currently
undocumented because it isn't a standalone build environment. (It
requires the Visual Studio and Windows SDK headers and link libraries.)
Closes#786
This just improves code readability by emphasizing that we don't care
about the destination image's level of subsampling unless
TJPARAM_NOREALLOC is set or lossless cropping will be performed.
With respect to tj3Transform(), this addresses an oversight from
bb1d540a80.
Note to self: A convenience function/method for computing the worst-case
transformed JPEG size for a particular transform would be nice.
- When transforming, the worst-case JPEG buffer size depends on the
subsampling level used in the destination image, since a grayscale
transform might have been applied.
- Parentheses Police
tj*Transform() relied upon the underlying transupp API to check the
cropping region. However, transupp uses unsigned integers for the
cropping region, whereas the tjregion structure uses signed integers.
Thus, casting negative values from a tjregion structure produced very
large unsigned values. In the case of the left and upper boundary, this
was innocuous, because jtransform_request_workspace() rejected the
values as being out of bounds. However, jtransform_request_workspace()
did not always reject very large width and height values, because it
supports expanding the destination image by specifying a cropping region
larger than the source image. In certain cases, it allowed those
values, and the libjpeg memory manager subsequently ran out of memory.
NOTE: Prior to this commit, image expansion technically worked with
tj*Transform() as long as the cropping width and height were valid and
automatic JPEG buffer (re)allocation was used. However, that behavior
is not a documented feature of the TurboJPEG API, nor do we have any way
of testing it at the moment. Official support for image expansion can
be added later, if there is sufficient demand for it.
Similarly, this commit modifies tj3SetCroppingRegion() so that it
explicitly checks for left boundary values exactly equal to the scaled
image width and upper boundary values exactly equal to the scaled image
height. If the specified cropping width or height was 0 (which is
interpreted as {scaled image width} - {left boundary} or
{scaled image height} - {upper boundary}), then such values caused a
cropping width or height of 0 to be passed to the libjpeg API. In the
case of the width, this was innocuous, because jpeg_crop_scanline()
rejected the value. In the case of the height, however, this caused
unexpected and hard-to-diagnose errors farther down the pipeline.
jcopy_markers_execute() has historically ignored its option argument,
which is OK for jpegtran, but tj*Transform() needs to be able to save a
set of markers from the source image and write a subset of those markers
to each destination image. Without that ability, the function
effectively behaved as if TJ*OPT_COPYNONE was not specified unless all
transforms specified it.
These images were only used with tjbenchtest and tjexampletest, but I
was concerned about introducing additional licensing provisions into the
libjpeg-turbo source tree. Thus, this commit replaces them with images
that I own and can thus make available under the libjpeg-turbo licenses
with no additional provisions.
Put all general functions at the top of the list, and ensure that all
functions are defined before they are mentioned. Also consistify the
function ordering between turbojpeg.h and turbojpeg.c
Extending the Bayer matrix past Column 80 seems like a lesser
readability sin than not putting a space after each comma, especially
since we had to explicitly whitelist the file in checkstyle.
- Eliminate unnecessary "www."
- Use HTTPS.
- Update Java, MSYS, tdm-gcc, and NSIS URLs.
- Update URL and title of Agner Fog's assembly language optimization
manual.
- Remove extraneous information about MASM and Borland Turbo Assembler
and outdated NASM URLs from the x86 assembly headers, and mention
Yasm.
Lossless cropping is performed after other lossless transform
operations, so the cropping region must be specified relative to the
destination image dimensions and level of chrominance subsampling, not
the source image dimensions and level of chrominance subsampling.
More specifically, if the lossless transform operation swaps the X and Y
axes, or if the image is converted to grayscale, then that changes the
cropping region requirements.
The JPEG-1 spec never uses the term "MCU block". That term is rarely
used in other literature to describe the equivalent of an MCU in an
interleaved JPEG image, but the libjpeg documentation uses "iMCU" to
describe the same thing. "iMCU" is a better term, since the equivalent
of an interleaved MCU can contain multiple DCT blocks (or samples in
lossless mode) that are only grouped together if the image is
interleaved.
In the case of restart markers, "MCU block" was used in the libjpeg
documentation instead of "MCU", but "MCU" is more accurate and less
confusing. (The restart interval is literally in MCUs, where one MCU
is one data unit in a non-interleaved JPEG image and multiple data units
in a multi-component interleaved JPEG image.)
In the case of 9b704f96b2, the issue was
actually with progressive JPEG images exactly two DCT blocks wide, not
two MCU blocks wide.
This commit also defines "MCU" and "MCU row" in the description of the
various restart marker options/parameters. Although an MCU row is
technically always a row of samples in lossless mode, "sample row" was
confusing, since it is used in other places to describe a row of samples
for a single component (whereas an MCU row in a typical lossless JPEG
image consists of a row of interleaved samples for all components.)
TJ*PARAM_RESTARTBLOCKS technically works with lossless compression, but
it is not useful, since the value must be equal to the number of samples
in a row. (In other words, it is no different than
TJ*PARAM_RESTARTINROWS, except that it requires the user to do more
math.)
For reasons I can't recall, fc01f4673b
(the TurboJPEG 3 API overhaul) changed the C version of TJBench, but not
the Java version, so that it requires an exact scaling factor match (as
opposed to allowing unreduced scaling factors, as djpeg does and prior
versions of TJBench did.) That might have been temporary testing code
that was accidentally committed.
Because the crop spec was parsed using unsigned 32-bit integers,
negative numbers were interpreted as values ~= UINT_MAX (4,294,967,295).
This had the following ramifications:
- If the cropping region width was negative and the adjusted width + the
adjusted left boundary was greater than 0, then the 32-bit unsigned
integer bounds checks in djpeg and jpeg_crop_scanline() overflowed and
failed to detect the out-of-bounds width, jpeg_crop_scanline() set
cinfo->output_width to a value ~= UINT_MAX, and a buffer overrun and
subsequent segfault occurred in the upsampling or color conversion
routine. The segfault occurred in the body of
jpeg_skip_scanlines() --> read_and_discard_scanlines() if the cropping
region upper boundary was greater than 0 and the JPEG image used
chrominance subsampling and in the body of jpeg_read_scanlines()
otherwise.
- If the cropping region width was negative and the adjusted width + the
adjusted left boundary was 0, then a zero-width output image was
generated.
- If the cropping region left boundary was negative, then an output
image with bogus data was generated.
This commit modifies djpeg and jpeg_crop_scanline() so that the
aforementioned bounds checks use 64-bit unsigned integers, thus guarding
against overflow. It similarly modifies jpeg_skip_scanlines(). In the
case of jpeg_skip_scanlines(), the issue was not reproducible with
djpeg, but passing a negative number of lines to jpeg_skip_scanlines()
caused a similar overflow if the number of lines +
cinfo->output_scanline was greater than 0. That caused
jpeg_skip_scanlines() to read past the end of the JPEG image, throw a
warning ("Corrupt JPEG data: premature end of data segment"), and fail
to return unless warnings were treated as fatal. Also, djpeg now parses
the crop spec using signed integers and checks for negative values.
Due to an oversight in the multi-precision feature,
TJCompressor.srcBuf12 and TJCompressor.srcBuf16 were not set to null
in TJCompressor.setSourceImage(YUVImage) or
TJCompressor.setSourceImage(BufferedImage, ...). Thus, if an
application set a 12-bit or 16-bit packed-pixel buffer as the source
image then set a BufferedImage with integer pixels as the source image,
TJCompress.compress() would compress from the 12-bit or 16-bit
packed-pixel buffer instead of the BufferedImage. The odds of an
application actually doing that are very slim, however.
There are two approaches to handling abbreviated command-line options:
1. If a new option is introduced that begins with the same letters as an
existing option, require a longer abbreviation for the existing option
in order to ensure that abbreviations are always unique.
2. Require a unique abbreviation only for new options, and match all
non-unique abbreviations with existing options, thus maintaining
backward compatibility.
keymatch() supports either approach, and Tom Lane historically seemed to
prefer Approach 2, whereas both approaches have been applied
inconsistently in the years since. This commit consistently applies
Approach 2.
More specific notes:
We unnecessarily required 'cjpeg -progressive' to be abbreviated as
'cjpeg -pro' rather than 'cjpeg -p' when the -precision option was
introduced in libjpeg-turbo 3.0 beta.
The IJG unnecessarily required 'cjpeg -scans' to be abbreviated as
'cjpeg -scan' rather than 'cjpeg -sc' when the -scale option was
introduced in jpeg-7. We even more unnecessarily adopted that
requirement, even though we never adopted the -scale option.
We unnecessarily required 'djpeg -scale' to be abbreviated as
'djpeg -sc' rather than 'djpeg -s' when the -skip option was introduced
in libjpeg-turbo 1.5 beta.
The IJG unnecessarily required 'jpegtran -copy' to be abbreviated as
'jpegtran -co' rather than 'jpegtran -c' when the -crop option was
introduced in jpeg-7.
The IJG unnecessarily required 'jpegtran -progressive' to be abbreviated
as 'jpegtran -pr' rather than 'jpegtran -p' when the -perfect option was
introduced in jpeg-7.
We need to use tj3Alloc() (which, when ZERO_BUFFERS is defined, calls
calloc() instead of malloc()) to allocate all destination buffers.
Otherwise, if the compression/decompression/transform operation fails,
then the buffer checksum (which is computed to prevent the compiler from
optimizing out the whole test, since the destination buffer is never
used otherwise) will depend upon values in the destination buffer that
were never written, and MSan will complain.
- "Optimized baseline entropy coding" = "Huffman table optimization"
"Optimized baseline entropy coding" was meant to emphasize that the
feature is only useful when generating baseline (single-scan lossy
8-bit-per-sample Huffman-coded) JPEG images, because it is
automatically enabled when generating Huffman-coded progressive
(multi-scan), 12-bit-per-sample, and lossless JPEG images. However,
Huffman table optimization isn't actually an integral part of those
non-baseline modes. You can forego Huffman table optimization with
12-bit data precision if you supply your own Huffman tables. The spec
doesn't require it with progressive or lossless mode, either, although
our implementation does. Furthermore, "baseline" describes more than
just the type of entropy coding used. It was incorrect to say that
optimized "baseline" entropy coding is automatically enabled for
Huffman-coded progressive, 12-bit-per-sample, and lossless JPEG
images, since those are clearly not baseline images.
- "Progressive entropy coding" = "Progressive JPEG"
"Progressive" describes more than just the type of entropy coding
used. (In fact, both Huffman-coded and arithmetic-coded images can be
progressive.)
- Mention that TJPARAM_OPTIMIZE/TJ.PARAM_OPTIMIZE can be used with
lossless transformation as well.
- General wordsmithing
- Formatting tweaks
Referring to
https://sourceforge.net/p/libjpeg-turbo/bugs/48,
https://sourceforge.net/p/libjpeg-turbo/bugs/82,
#15, #238, #253, and #619,
valgrind and MSan have failed to properly detect data initialization by
libjpeg-turbo's x86 SIMD extensions for the entire 14 years that
libjpeg-turbo has been a project, resulting in false positives unless
libjpeg-turbo is built with WITH_SIMD=0 or run with JSIMD_FORCENONE=1.
This commit introduces a new C preprocessor macro (ZERO_BUFFERS) that,
if set, causes libjpeg-turbo to zero certain buffers in order to work
around the specific valgrind/MSan test failures caused by the
aforementioned false positives. This allows us to more closely
approximate the production configuration of libjpeg-turbo when testing
with valgrind or MSan.
Closes#781
The dimensions in the PPM header of the output file generated by
'example decompress' were always 640 x 480, regardless of the size of
the JPEG image being decompressed. Our regression tests (which this
commit also fixes) missed the bug because they decompressed the
640 x 480 image generated by 'example compress'.
Fixes#778
(Regression introduced by 7bb958b732)
Because of 7bb958b732, the TurboJPEG
compression and encoding functions no longer transfer the value of
TJPARAM_OPTIMIZE into cinfo->data_precision unless the data precision
is 8. The intent of that was to prevent using_std_huff_tables() from
being called more than once when reusing the same compressor object to
generate multiple 12-bit-per-sample JPEG images. However, because
cinfo->optimize_coding is always set to TRUE by jpeg_set_defaults() if
the data precision is 12, calling applications that use 12-bit data
precision had to unset cinfo->optimize_coding if they set
cinfo->arith_code after calling jpeg_set_defaults(). Because of
7bb958b732, the TurboJPEG API stopped
doing that except with 8-bit data precision. Thus, attempting to
generate a 12-bit-per-sample arithmetic-coded lossy JPEG image using
the TurboJPEG API failed with "Requested features are incompatible."
Since the compressor will always fail if cinfo->arith_code and
cinfo->optimize_coding are both set, and since cinfo->optimize_coding
has no relevance for arithmetic coding, the most robust and user-proof
solution is for jinit_c_master_control() to set cinfo->optimize_coding
to FALSE if cinfo->arith_code is TRUE.
This commit also:
- modifies TJBench so that it no longer reports that it is using
optimized baseline entropy coding in modes where that setting
is irrelevant,
- amends the cjpeg documentation to clarify that -optimize is implied
when specifying -progressive or '-precision 12' without -arithmetic,
and
- prevents jpeg_set_defaults() from uselessly checking the value of
cinfo->arith_code immediately after it has been set to FALSE.
jpeg_enable_lossless() checks the point transform value against the data
precision, so we need to defer calling jpeg_enable_lossless() until
after all command-line options have been parsed.
This gives the user a hint as to whether converting the input file into
a JPEG file with a particular data precision will result in a loss of
precision. Also modify usage.txt and the cjpeg man page to warn the
user about that possibility.
- "bits per component" = "bits per sample"
Describing the data precision of a JPEG image using "bits per
component" is technically correct, but "bits per sample" is the
terminology that the JPEG-1 spec uses. Also, "bits per component" is
more commonly used to describe the precision of packed-pixel formats
(as opposed to "bits per pixel") rather than planar formats, in which
all components are grouped together.
- Unmention legacy display technologies. Colormapped and monochrome
displays aren't a thing anymore, and even when they were still a
thing, it was possible to display full-color images to them. In 1991,
when JPEG decompression time was measured in minutes per megapixel, it
made sense to keep a decompressed copy of JPEG images on disk, in a
format that could be displayed without further color conversion (since
color conversion was slow and memory-intensive.) In 2024, JPEG
decompression time is measured in milliseconds per megapixel, and
color conversion is even faster. Thus, JPEG images can be
decompressed, displayed, and color-converted (if necessary) "on the
fly" at speeds too fast for human vision to perceive. (In fact, your
TV performs much more complicated decompression algorithms at least 60
times per second.)
- Document that color quantization (and associated features), GIF
input/output, Targa input/output, and OS/2 BMP input/output are legacy
features. Legacy status doesn't necessarily mean that the features
are deprecated. Rather, it is meant to discourage users from using
features that may be of little or no benefit on modern machines (such
as low-quality modes that had significant performance advantages in
the early 1990s but no longer do) and that are maintained on a
break/fix basis only.
- General wordsmithing, grammar/punctuation policing, and formatting
tweaks
- Clarify which data precisions each cjpeg input format and each djpeg
output format supports.
- cjpeg.1: Remove unnecessary and impolitic statement about the -targa
switch.
- Adjust or remove performance claims to reflect the fact that:
* On modern machines, the djpeg "-fast" switch has a negligible effect
on performance.
* There is a measurable difference between the performance of Floyd-
Steinberg dithering and no dithering, but it is not likely
perceptible to most users.
* There is a measurable difference between the performance of 1-pass
and 2-pass color quantization, but it is not likely perceptible to
most users.
* There is a measurable difference between the performance of
full-color and grayscale output when decompressing a full-color JPEG
image, but it is not likely perceptible to most users.
* IDCT scaling does not necessarily improve performance. (It
generally does if the scaling factor is <= 1/2 and generally doesn't
if the scaling factor is > 1/2, at least on my machine. The
performance claim made in jpeg-6b was probably invalidated when we
merged the additional scaling factors from jpeg-7.)
- Clarify which djpeg switches/output formats cannot be used when
decompressing lossless JPEG images.
- Remove djpeg hints, since those involve quality vs. speed tradeoffs
that are no longer relevant for modern machines.
- Remove documentation regarding using color quantization with 16-bit
data precision. (Color quantization requires lossy mode.)
- Java: Fix typos in TJDecompressor.decompress12() and
TJDecompressor.decompress16() documentation.
- jpegtran.1: Fix truncated paragraph
In a man page, a single quote at the start of a line is interpreted as
a macro.
Closes#775
- libjpeg.txt:
* Mention J16SAMPLE data type (oversight.)
* Remove statement about extending jdcolor.c. (libjpeg-turbo is not
quite as DIY as libjpeg once was.)
* Remove paragraph about tweaking the various typedefs in jmorecfg.h.
It is no longer relevant for modern machines.
* Remove caveat regarding systems with ints less than 16 bits wide.
(ANSI/ISO C requires an int to be at least 16 bits wide, and
libjpeg-turbo has never supported non-ANSI compilers.)
- usage.txt:
* Add copyright header.
* Document cjpeg -icc, -memdst, -report, -strict, and -version
switches.
* Document djpeg -icc, -maxscans, -memsrc, -report, -skip, -crop,
-strict, and -version switches.
* Document jpegtran -icc, -maxscans, -report, -strict, and -version
switches.
The 8-bit-per-sample image I/O modules have always been built with BMP,
GIF, PBMPLUS, and Targa support, and the 12-bit-per-sample image I/O
modules have always been built with only GIF and PBMPLUS support. In
libjpeg-turbo 2.1.x and prior, cjpeg and djpeg were built with the same
image format support as the image I/O modules. However, because of the
multi-precision feature introduced in libjpeg-turbo 3.0.x, cjpeg and
djpeg are now always built with support for all image formats. Thus,
the error message table compiled into cjpeg and djpeg was a superset of
the error message table compiled into the 12-bit-per-sample and
16-bit-per-sample image I/O modules. If, for example, the
12-bit-per-sample PPM writer threw JERR_PPM_COLORSPACE (== 15, because
it was built with only GIF and PBMPLUS support), then djpeg interpreted
that as JERR_GIF_COLORSPACE (== 15, because it was built with support
for all image formats.) There was no chance of a buffer overrun, since
the error message table lookup was performed against the table compiled
into cjpeg and djpeg, which contained all possible entries. However,
this issue caused an incorrect error message to be displayed by cjpeg or
djpeg if a 12-bit-per-sample or 16-bit-per-sample image I/O module threw
an error. This commit simply removes the *_SUPPORTED #ifdefs from
cderror.h so that all image I/O error messages are always included in
every instance of the image I/O error message table.
Note that a similar issue could have also occurred with the
12-bit-per-sample and 16-bit-per-sample TurboJPEG image I/O functions,
since the 12-bit-per-sample and 16-bit-per-sample image I/O modules
supporting those functions are built with only PBMPLUS support whereas
the library as a whole is built with BMP and PBMPLUS support.
Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1898606,
attempting to decompress a specially-crafted malformed JPEG image
(specifically an image with a complete 12-bit Start Of Frame segment
followed by an incomplete 8-bit Start Of Frame segment) using the
default marker processor, buffered-image mode, and input prefetching
triggered the following sequence of events:
- When the 12-bit SOF segment was encountered (in the body of
jpeg_read_header()), the marker processor's read_markers() method
called the get_sof() function, which processed the 12-bit SOF segment
and set cinfo->data_precision to 12.
- If the application subsequently called jpeg_consume_input() in a loop
to prefetch input data, and it didn't stop calling
jpeg_consume_input() when the function returned JPEG_REACHED_SOS, then
the 8-bit SOF segment was encountered in the body of
jpeg_consume_input(). As a result, the marker processor's
read_markers() method called get_sof(), which started to process the
8-bit SOF segment and set cinfo->data_precision to 8.
- Since the 8-bit SOF segment was incomplete, the end of the JPEG data
stream was encountered when get_sof() attempted to read the image
height, width, and number of components.
- If the fill_input_buffer() method in the application's custom source
manager incorrectly returned FALSE in response to a prematurely-
terminated JPEG data stream, then get_sof() returned FALSE while
attempting to read the image height, width, and number of components
(before the duplicate SOF check was reached.) That caused the default
marker processor's read_markers() method, and subsequently
jpeg_consume_input(), to return JPEG_SUSPENDED.
- If the application failed to respond to the JPEG_SUSPENDED return
value and subsequently attempted to call jpeg_read_scanlines(),
then the data precision check in jpeg_read_scanlines() succeeded
(because cinfo->data_precision was now 8.) However, because
cinfo->data_precision had been 12 during the previous call to
jpeg_start_decompress(), only the 12-bit version of the main
controller was initialized, and the cinfo->main->process_data() method
was undefined. Thus, a segfault occurred when jpeg_read_scanlines()
attempted to invoke that method.
Scenarios in which the issue was thwarted:
1. The default source managers handle a prematurely-terminated JPEG data
stream by inserting a fake EOI marker into the data stream. Thus, when
using one of those source managers, the INPUT_2BYTES() and INPUT_BYTE()
macros (which get_sof() invokes to read the image height, width, and
number of components) succeeded-- albeit with bogus data, since the fake
EOI marker was read into those fields. The duplicate SOF check in
get_sof() then failed, generating a fatal libjpeg error.
2. When using a custom source manager that correctly returns TRUE in
response to a prematurely-terminated JPEG data stream, the
aforementioned INPUT_2BYTES() and INPUT_BYTE() macros also succeeded
(albeit with bogus data read from the previous bytes of the data
stream), and the duplicate SOF check failed.
3. If the application did not prefetch input data, or if it stopped
invoking jpeg_consume_input() when the function returned
JPEG_REACHED_SOS, then the duplicate SOF segment was not read prior to
the first call to jpeg_read_scanlines(). Thus, the data precision check
in jpeg_read_scanlines() failed. If the application instead called
jpeg12_read_scanlines() (that is, if it properly supported multiple data
precisions), then the duplicate SOF segment was not read until the body
of jpeg_finish_decompress(). At that point, its only negative effect
was to cause jpeg_finish_decompress() to return FALSE before the
duplicate SOF check was reached.
In other words, this issue depended not only upon an incorrectly-written
source manager but also upon a very specific sequence of API calls. It
also depended upon the multi-precision feature introduced in
libjpeg-turbo 3.0.x. When using an 8-bit-per-sample build of
libjpeg-turbo 2.1.x, jpeg_read_header() failed with "Unsupported JPEG
data precision 12" after the 12-bit SOF segment was processed. When
using a 12-bit-per-sample build of libjpeg-turbo 2.1.x, the behavior
was the same as if the application called jpeg12_read_scanlines() in
Scenario 3 above.
This commit simply moves the duplicate SOF check to the top of
get_sof() so the check will fail before the marker processor attempts to
read the duplicate SOF. It should be noted that this issue isn't a
libjpeg-turbo bug per se, because it occurs only when the calling
application does something it shouldn't. It is, rather, an issue of API
hardening/caller-proofing.
jpeg_std_message_table[] was never a documented part of the libjpeg API,
nor was it exposed in jpegint.h or the Windows libjpeg API DLL. Thus,
it was never a good idea (nor was it even remotely necessary) for
applications to use it. However, referring to #763 and #767, one
application (RawTherapee) did use it.
34c055851e hid the symbol, which broke the
Un*x builds of RawTherapee. (RawTherapee already did the right thing on
Windows, because jpeg_std_message_table[] was not exposed in the
Windows libjpeg API DLL. Referring to
https://github.com/Beep6581/RawTherapee/issues/7074, RawTherapee now
does the right thing on Un*x platforms as well.)
The comment in jerror.c indicated that Tom Lane intended the symbol to
be external "just in case any applications want to refer to it
directly." However, with respect to libjpeg-turbo, the comment was
effectively already a lie on Windows. My choices at this point are:
1. Revert 34c055851e and start exposing
the symbol on Windows, thus making the comment true again.
2. Remove the comment.
Option 1 would have created whiplash, since 3.0.3 has already been
released with the symbol hidden, and that option would have further
encouraged ill-advised behavior on the part of applications. Since the
issue has already been fixed in RawTherapee, and since it is known to
affect only that application, I chose Option 2.
In my professional opinion, a code comment does not rise to the level
of a contract between a library and a calling application. In other
words, the comment did not make the symbol part of the API, even though
the symbol was exposed on some platforms. Applications that use
internal symbols (even the symbols defined in jpegint.h) do so at their
own risk. There is no guarantee that those symbols will remain
unchanged from release to release, as it is sometimes necessary to
modify the internal (opaque) structures and methods in order to
implement new features or even fix bugs. Some implementations of the
libjpeg API (such as the one in Jpegli, for instance), do not provide
those internal symbols at all. Best practices are for applications that
use the internal libjpeg-turbo symbols to provide their own copy of
libjpeg-turbo (for instance, via static linking), so they can manage any
unexpected changes in those symbols. (In fact, that is how libjpeg was
originally used. Application developers built it from source with
compile-time options to exclude unneeded functionality. Thus, no two
builds of libjpeg were guaranteed to be API/ABI-compatible. The idea of
a libjpeg shared library that exposes a pseudo-standard ABI came later,
and that has always been an awkward fit for an API that was intended to
be modified at compile time to suit specific application needs.
libjpeg-turbo's colorspace extensions are but one example, among many,
of our attempts to reconcile that awkwardness while preserving backward
compatibility with the aforementioned pseudo-standard ABI.)
libjpeg.txt documents the need to "include system headers that define at
least the typedefs FILE and size_t" before including jpeglib.h.
However, some software developers unfortunately assume that any
downstream build failure is due to an upstream oversight (as if such an
oversight would have gone unnoticed for 14 years in a library as
ubiquitous as libjpeg-turbo) and "come in hot" with a proposal and
arguments that have already been carefully considered and rejected
multiple times (as opposed to grepping the documentation or searching
existing issues and PRs to find out whether the upstream behavior is
intentional.)
Multiple issues and PRs (including #17, #18, #116, #737, and #766) have
been filed regarding this, so this commit adds a comment to jpeglib.h in
the hope of heading off future issues and PRs. I suspect that some
people will still choose to waste my time by arguing about it, even
though the decision was made by Tom Lane nearly 20 years before
libjpeg-turbo even existed, the decision was supportable based on the
needs of early 1990s computing systems, and reversing that decision
would break backward API compatibility (which is one of the reasons that
many downstream projects adopted libjpeg-turbo in the first place.) At
least now, if someone posts an issue or PR about this again, I can just
link to this commit and close the issue/PR without comment.
If the calling application invokes jpeg_save_markers() to save a
particular type of marker, then the save_marker() function will be
invoked for every marker of that type that is encountered. Previously,
only the head of the marker linked list was stored (in
jpeg_decompress_struct), so save_marker() had to traverse the entire
linked list before it could add a new marker to the tail of the list.
That caused CPU usage to grow exponentially with the number of markers.
Referring to #764, it is possible to create a JPEG image that contains
an excessive number of markers. The specific reproducer that uncovered
this issue is a specially-crafted 1-megabyte malformed JPEG image with
tens of thousands of APP1 markers, which required approximately 30
seconds of CPU time (on a modern Intel processor) to process. However,
it should also be possible to create a legitimate JPEG image that
reproduces the issue (such as an image with tens of thousands of
duplicate EXIF tags.)
This commit introduces a new pointer (in jpeg_decomp_master, in order to
preserve backward ABI compatibility) that is used to store the tail of
the marker linked list whenever a marker is added to it. Thus, it is no
longer necessary to traverse the list when adding a marker, and CPU
usage will grow linearly rather than exponentially with the number of
markers.
Fixes#764
If an error manager instance is passed to jpeg_std_error(), then its
format_message() method will point to the format_message() function in
jerror.c. The format_message() function passes all eight values from
the jpeg_error_mgr::msg_parm.i[] array as arguments to
snprintf()/_snprintf_s(), even if the format string doesn't use all of
those values. Subsequently invoking one of the ERREXIT[1-6]() macros
will leave the unused values uninitialized, and if the
-fsanitize-memory-param-retval option (introduced in Clang 14) is
enabled (which it is by default in Clang 16 and later), then MSan will
complain when the format_message() function tries to pass the
uninitialized-but-unused values as function arguments.
This commit modifies jpeg_std_error() so that it zeroes out the error
manager instance passed to it, thus working around the warning as well
as simplifying the code.
Closes#761
(Regression introduced by 24e09baaf0)
The install() INCLUDES option is not an artifact option. It specifies
a list of directories that will be added to the
INTERFACE_INCLUDE_DIRECTORIES target property when the target is
exported using the install() EXPORT option, which occurs when CMake
package config files are generated. Specifying 'COMPONENT include' with
the install() INCLUDES option caused the INTERFACE_INCLUDE_DIRECTORIES
properties in our CMake package config files to contain
'${_IMPORT_PREFIX}/COMPONENT', which caused errors of the form 'Imported
target "libjpeg-turbo::XXXX" includes non-existent path' when downstream
build systems attempted to include libjpeg-turbo using find_package().
Fixes#759Closes#760
This makes it possible for downstream packagers and other integrators of
libjpeg-turbo to include only specific directories from the
libjpeg-turbo installation (or to install specific directories under a
different prefix, etc.) The names of the components correspond to the
directories into which they will be installed.
Refer to libvips/libvips#3931, #265, #338Closes#756
It is possible to craft a malformed JPEG image in which all of the
scans contain fewer components than the number of components specified
in the Start Of Frame (SOF) segment. Attempting to use such an image as
either an input image or a drop image with 'jpegtran -drop' caused a
NULL dereference and subsequent segfault in transupp.c:adjust_quant(),
so this commit adds appropriate checks to guard against that.
Since the issue involved an interface that is only exposed on the
jpegtran command line, it did not represent a security risk.
'jpegtran -drop' could not ever be used successfully with images such as
the ones described above. This commit simply makes jpegtran fail
gracefully rather than crash.
Fixes#758
Referring to actions/runner-images#9491, the sanitizers in LLVM 14 that
ships with Ubuntu 22.04 are incompatible with high-entropy address space
layout randomization (ASLR), which is enabled in the GitHub runners via
their use of a newer kernel than ubuntu 22.04 uses.
Because of 1644bdb7d2, we are now
effectively using the NEW behavior for all CMake policies introduced in
all CMake versions up to and including CMake 3.28. The NEW behavior for
CMP0025, introduced in CMake 3.0, sets CMAKE_C_COMPILER_ID to
"AppleClang" instead of "Clang" when using Apple's variant of Clang (in
Xcode), so we need to match all values of CMAKE_C_COMPILER_ID that
contain "Clang".
This fixes three issues:
- -O2 was not replaced with -O3 in CMAKE_C_FLAGS_RELWITHDEBINFO. This
was a minor issue, since -O3 is now the default in
CMAKE_C_FLAGS_RELEASE, and we use CMAKE_BUILD_TYPE=Release in our
official builds.
- The build system erroneously set the default value of FLOATTEST8 and
FLOATTEST12 to no-fp-contract when compiling for PowerPC or Arm using
Apple Clang 14+ (effectively reverting
5b2beb4bc4f41dd9dd2a905cb931b8d5054d909b.) Because Clang 14+ now
enables -ffp-contract=on by default, this issue caused floating point
test failures unless FLOATTEST8 and FLOATTEST12 were overridden.
- The build system set MD5_PPM_3x2_FLOAT_FP_CONTRACT as appropriate for
GCC, not as appropriate for Clang (effectively reverting
47656a082091f9c9efda054674522513f4768c6c.) This also caused floating
point test failures.
Fixes#753Closes#755
Several TurboJPEG functions store their return value in an unsigned
long long intermediate and compare it against the maximum value of
unsigned long or size_t in order to avoid integer overflow. However,
such comparisons are tautological (always true, i.e. redundant) unless
the size of unsigned long or size_t is less than the size of unsigned
long long. Explicitly guarding the comparisons with #if avoids compiler
warnings with -Wtautological-constant-in-range-compare in Clang and also
makes it clear to the reader that the comparisons are only intended for
32-bit code.
Refer to #752
(regression introduced by e8b40f3c2b)
The documented behavior of the libjpeg API is to compute optimal Huffman
tables when generating 12-bit lossy Huffman-coded JPEG images, unless
the calling application supplies its own Huffman tables. However,
e8b40f3c2b and
96bc40c1b3 modified
jinit_c_master_control() so that it always set cinfo->optimize_coding to
TRUE when generarating 12-bit lossy Huffman-coded JPEG images, which
prevented calling applications from supplying custom Huffman tables for
such images.
This commit modifies jinit_c_master_control() so that it only overrides
cinfo->optimize_coding when generating 12-bit lossy Huffman-coded JPEG
images if all Huffman table slots are empty or all slots contain default
Huffman tables. Determining whether the latter is true requires using
memcmp() to compare the allocated Huffman tables with the default
Huffman tables, because:
- The documented behavior of jpeg_set_defaults() is to initialize any
empty Huffman table slot with the default Huffman table corresponding
to that slot, regardless of the data precision. There is also no
requirement that the data precision be specified prior to calling
jpeg_set_defaults(). Thus, there is no reliable way to prevent
jpeg_set_defaults() from initializing empty Huffman table slots with
default Huffman tables, which are useless for 12-bit data precision.
- There is no requirement that custom Huffman tables be defined prior to
calling jpeg_set_defaults(). A calling application could call
jpeg_set_defaults() and modify the values in the default Huffman
tables rather than allocating new tables. Thus, there is no reliable
way to detect whether the allocated Huffman tables contain default
values without comparing the tables with the default Huffman tables.
Fortunately, comparing the allocated Huffman tables with the default
Huffman tables is the last stop on the logic train, so it won't happen
unless cinfo->data_precision == 12, cinfo->arith_code == FALSE,
cinfo->optimize_coding == FALSE, and one or more Huffman tables are
allocated. (If the compressor object is reused, this ensures that the
full comparison will be performed at most once.) Custom Huffman tables
will be flagged as non-default when the first non-default value is
encountered, and the worst case (comparing 400 bytes) is very fast on
modern CPUs anyhow.
Fixes#751
- Detect at configure time, via the __CET__ C preprocessor macro,
whether the C compiler will include either indirect branch tracking
(IBT) or shadow stack support, and define a NASM macro (__CET__) if
so.
- Modify the x86-64 SIMD code so that it includes appropriate endbr64
instructions (to support IBT) and an appropriate .note.gnu.property
section (to support both IBT and shadow stack) when __CET__ is
defined.
Closes#350
While QEMU will run executables from the current working directory,
other emulators may not. It is more reliable to pass the full
executable path to the emulator. The add_test(NAME ... COMMAND ...)
syntax automatically invokes the emulator (e.g. the command specified
in CMAKE_CROSSCOMPILING_EMULATOR) and passes the full executable path to
it, as long as the first COMMAND argument is the name of a target. This
cleans up the CMake code somewhat as well, since it is no longer
necessary to manually invoke CMAKE_CROSSCOMPILING_EMULATOR.
Closes#747
Disclaimer: I am not a lawyer, nor do I play one on TV.
Referring to #744, mentioning the zlib License as a license that applies
to libjpeg-turbo is confusing, and it isn't actually necessary, since
the IJG License subsumes the terms of the zlib License in the context of
the libjpeg API library and associated programs. This was presumably
understood to be the case by Miyasaka-san when he chose the zlib License
for the first libjpeg SIMD extensions. The libjpeg/SIMD web site
(https://cetus.sakura.ne.jp/softlab/jpeg-x86simd/jpegsimd.html) states
(translated from Japanese): "The terms of use of this SIMD enhanced
version of IJG JPEG software are subject to the terms of use of the
original version of IJG JPEG software."
Detailed analysis of the zlib License terms:
This software is provided 'as-is', without any express or implied
warranty. In no event will the authors be held liable for any
damages arising from the use of this software.
This text is an almost literal subset of the warranty disclaimer text in
the IJG License. The IJG License states everything above with only
slight differences in wording, and it further clarifies that the user
assumes all risk as to the software's quality and accuracy and that
vendors of commercial products based on the software must assume all
warranty and liability claims.
Permission is granted to anyone to use this software for any
purpose, including commercial applications, and to alter it and
redistribute it freely, subject to the following restrictions:
This is semantically the same as the permission text in the IJG License,
since "use, copy, modify, and distribute this software (or portions
thereof) for any purpose, without fee" covers "use" for "any purpose,
including commercial applications" as well as alteration and
redistribution.
1. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
The IJG License requirement that "If any part of the source code for
this software is distributed, then this README file must be included,
with this copyright and no-warranty notice unaltered; and any additions,
deletions, or changes to the original files must be clearly indicated in
accompanying documentation" (Clause 1), as well as the requirement that
"If only executable code is distributed, then the accompanying
documentation must state that 'this software is based in part on the
work of the Independent JPEG Group'" (Clause 2), satisfies the
requirement of Clause 1 of the zlib License.
2. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
Since Clause 1 of the IJG License applies only to the distribution of
source code, the copyright headers in the source code are effectively
"accompanying documentation" in that case. This is why we ensure that
the copyright headers of individual source files indicate the year(s) in
which modifications were made by each contributor. Doing so satisfies
the requirements of both Clause 2 of the zlib License and Clause 1 of
the IJG License.
3. This notice may not be removed or altered from any source
distribution.
Clauses 2 and 3 of the zlib License apply only to the source code that
bears that license. Thus, as applied to the software as a whole, those
requirements of the inbound zlib License are compatible with the
outbound IJG License as long as the IJG License does not contradict
them (which it doesn't.)
NOTE: To be clear, existing source code that bears the zlib License
cannot literally be re-licensed under the IJG License, since that would
violate Clause 3 of the zlib License. However, when considering the
terms under which the overall library is made available, the IJG License
effectively subsumes the terms of the zlib License.
https://www.gnu.org/licenses/license-compatibility.en.html is a
thorough, albeit somewhat GPL-biased, discussion of license
compatibility.
(regression introduced by 1644bdb7d2)
Setting a maximum version in cmake_minimum_required() effectively sets
the behavior to NEW for all policies introduced in all CMake versions up
to and including that maximum version. The NEW behavior for CMP0091,
introduced in CMake 3.15, uses CMake variables to specify the MSVC
runtime library against which to link, rather than placing the relevant
flags in CMAKE_C_FLAGS*. Thus, replacing /MD with /MT in CMAKE_C_FLAGS*
no longer has any effect when using CMake 3.15+.
- Integrate
c07bba2730 (diff-1e2deb5301e9481203102fcddd1b2d0d2bf0ddc1cbb445c7f4b6414a3b869ce8)
so that the default man directory is <CMAKE_INSTALL_PREFIX>/share/man
on FreeBSD systems.
- For good measure, integrate
f835f189ae
so that the default info directory is
<CMAKE_INSTALL_PREFIX>/share/info on FreeBSD systems, even though we
don't use that directory.
- Automatically set the CMake variable type to PATH for any
user-specified CMAKE_INSTALL_*DIR variables.
Addresses concerns raised in #326, #346, #648Closes#648
libjpeg-turbo always includes Adobe APP14 markers in the lossless JPEG
images that it generates, but some compressors (e.g. accusoft PICTools
Medical) do not.
Fixes#743
ef9a4e05ba (libjpeg-turbo 1.4.x), which
was based on
https://bug815473.bmoattachments.org/attachment.cgi?id=692126
(https://bugzilla.mozilla.org/show_bug.cgi?id=815473), modified the C
baseline Huffman encoder so that it precomputes jpeg_nbits_table, in
order to facilitate sharing the table among multiple processes.
However, libjpeg-turbo never shared the table, and because the table was
implemented as a static array, f3a8684cd1
(libjpeg-turbo 1.5.x) and 37bae1a0e9
(libjpeg-turbo 2.0.x) each introduced a duplicate copy of the table for
(respectively) the SSE2 baseline Huffman encoder and the C progressive
Huffman encoder.
This commit does the following:
- Move the duplicated code in jchuff.c and jcphuff.c, originally
introduced in 0cfc4c17b7 and
37bae1a0e9, into a header
(jpeg_nbits.h).
- Credit the co-author of 0cfc4c17b7.
(Refer to https://sourceforge.net/p/libjpeg-turbo/patches/57).
- Modify the SSE2 baseline Huffman encoder so that the C Huffman
encoders can share its definition of jpeg_nbits_table.
- Move the definition of jpeg_nbits_table into a C source file
(jpeg_nbits.c) rather than a header, and define the table only if
USE_CLZ_INTRINSIC is undefined and the SSE2 baseline Huffman encoder
will not be built.
- Apply hidden symbol visibility to the shared definition of
jpeg_nbits_table, if the compiler supports the necessary attribute.
(In practice, only Visual C++ doesn't.)
Closes#114
See also:
https://bugzilla.mozilla.org/show_bug.cgi?id=1501523
- Clarify that lossless JPEG is slower than and doesn't compress as well
as lossy JPEG. (That should be obvious, because "lossy" literally
means that data is thrown away.)
- Re-generate TurboJPEG C API documentation using Doxygen 1.9.8.
- Clarify that setting the data_precision field in jpeg_compress_struct
to 16 requires lossless mode.
- Explain what the predictor selection value actually does. (Refer to
Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994, Section
H.1.2.1.)
TJPARAM_MAXPIXELS was previously hidden and used only for fuzz testing,
but it is potentially useful for calling applications as well,
particularly if they want to guard against excessive memory consumption
by the tj3LoadImage*() functions. The parameter has also been extended
to decompression and lossless transformation functions/methods, mainly
as a convenience. (It was already possible for calling applications to
impose their own JPEG image size limits by reading the JPEG header prior
to decompressing or transforming the image.)
Unfortunately, most of the GitHub security advisories filed against
libjpeg-turbo thus far have been the result of non-exploitable API
abuses triggered by randomly-generated test programs and accompanied by
wild claims of denials of service with no demonstrable or even probable
exploit that might cause such a DoS (assuming a service even existed
that used the API in question.) Security advisories remain private
unless accepted, and I cannot accept them if they do not describe an
actual security issue. Thus, it's best to steer most users toward
regular bug reports.
Because of the TurboJPEG 3 API overhaul, the legacy decompression and
lossless transformation functions now wrap the new TurboJPEG 3
functions. For performance reasons, we don't want to read the JPEG
header more than once during the same operation, so the wrapped
functions do not read the header if it has already been read by a
wrapper function. Initially the TurboJPEG 3 functions used a state
variable to track whether the header had already been read, but
b94041390c made this more robust by using
the libjpeg global decompression state instead. If a wrapper function
has already read the JPEG header successfully, then the global
decompression state will be DSTATE_READY, and the logic introduced in
b94041390c will prevent the header from
being read again.
A subtle issue arises because tj3DecompressHeader() does not call
jpeg_abort_decompress() if jpeg_read_header() fails. (That is arguably
a bug, but it has existed since the very first implementation of the
function.) Depending on the nature of the failure, this can cause
tj3DecompressHeader() to return an error code and leave the libjpeg
global decompression state set to DSTATE_INHEADER. If a misbehaved
application ignored the error and subsequently called a TurboJPEG
decompression or lossless transformation function, then the function
would fail to read the JPEG header because the global decompression
state was greater than DSTATE_START. In the case of the decompression
functions, this was innocuous, because jpeg_calc_output_dimensions()
and jpeg_start_decompress() both sanity check the global decompression
state. However, it was possible for a misbehaved application to call
tj3DecompressHeader() with junk data, ignore the return value, and pass
the same junk data into tj3Transform(). Because tj3DecompressHeader()
left the global decompression state set to DSTATE_INHEADER,
tj3Transform() failed to detect the junk data (because it didn't try to
read the JPEG header), and it called jtransform_request_workspace() with
dinfo->image_width and dinfo->image_height still initialized to 0.
Because jtransform_request_workspace() does not sanity check the
decompression state, a division-by-zero error occurred with certain
combinations of transform options in which TJXOPT_TRIM or TJXOPT_CROP
was specified. However, it should be noted that TJXOPT_TRIM and
TJXOPT_CROP cannot be expected to work properly without foreknowledge of
the JPEG source image dimensions, which cannot be gained except by
calling tj3DecompressHeader() successfully. Thus, a calling application
is inviting trouble if it does not check the return value of
tj3DecompressHeader() and sanity check the JPEG source image dimensions
before calling tj3Transform(). This commit softens the failure, but the
failure is still due to improper API usage.
(regression introduced by 300a344d65)
300a344d65 fixed the recompression code
path but also broke the pure decompression code path, because the fix
caused the TurboJPEG decompression instance to be destroyed before
tj3SaveImage() could use it. Furthermore, the fix in
300a344d65 prevented pixel density
information from being transferred from the input image to the output
image when recompressing. This commit does the following:
- Modify tjexample.c so that a single TurboJPEG instance is initialized
for lossless transformation and shared by all code paths. In addition
to fixing both of the aforementioned issues, this makes the code more
readable.
- Extend tjexampletest to test the recompression code path, thus
ensuring that the issues fixed by this commit and
300a344d65 are not reintroduced.
- Modify tjexample.c to remove redundant fclose(), tj3Destroy(), and
tj3Free() calls.
This corresponds to max_memory_to_use in the jpeg_memory_mgr struct in
the libjpeg API, except that the TurboJPEG parameter is specified in
megabytes. Because this is 2023 and computers with less than 1 MB of
memory are not a thing (at least not within the scope of libjpeg-turbo
support), it isn't useful to allow a limit less than 1 MB to be
specified. Furthermore, because TurboJPEG parameters are signed
integers, if we allowed the memory limit to be specified in bytes, then
it would be impossible to specify a limit larger than 2 GB on 64-bit
machines. Because max_memory_to_use is a long signed integer,
effectively we can specify a limit of up to 2 petabytes on 64-bit
machines if the TurboJPEG parameter is specified in megabytes. (2 PB
should be enough for anybody, right?)
This commit also bumps the TurboJPEG API version to 3.0.1. Since the
TurboJPEG API version no longer tracks the libjpeg-turbo version, it
makes sense to increment the API revision number when adding constants,
to increment the minor version number when adding functions, and to
increment the major version number for a complete overhaul.
This commit also removes the vestigial TJ_NUMPARAM macro, which was
never defined because it proved unnecessary.
Partially implements #735
This is very subtle, but if a user specifies a libjpeg virtual array
memory limit via the JPEGMEM environment variable and one of the
tj3Compress*() functions hits that limit, the libjpeg error handler
will be invoked in jpeg_start_compress() (more specifically in
realize_virt_arrays() in jinit_compress_master()) before the libjpeg
global compression state can be incremented. Thus,
jpeg_abort_compress() will not be called before the tj3Compress*()
function exits, the unrealized virtual arrays will not be freed, and if
the TurboJPEG compression instance is reused, those unrealized virtual
arrays will count against the specified memory limit. This could cause
subsequent compression operations that require smaller virtual arrays
(or even no virtual arrays at all) to fail when they would otherwise
succeed. In reality, the vast majority of calling programs would abort
and free the TurboJPEG compression instance if one of the tj3Compress*()
functions failed, but TJBench is a rare exception. This issue does not
bear documenting because of its subtlety and rarity and because JPEGMEM
is not a documented feature of the TurboJPEG API.
Note that the issue does not exist in the tj3Encode*() and tj3Decode*()
functions, because realize_virt_arrays() is never called in the body of
those functions. The issue also does not exist in the tj3Decompress*()
and tj3Transform() functions, because those functions ensure that the
JPEG header is read (and thus the libjpeg global decompression state is
incremented) prior to calling a function that calls
realize_virt_arrays() (i.e. jpeg_start_decompress() or
jpeg_read_coefficients().) If realize_virt_arrays() failed in the body
of jpeg_write_coefficients(), then tj3Transform() would abort without
calling jpeg_abort_compress(). However, since jpeg_start_compress() is
never called in the body of tj3Transform(), no virtual arrays are ever
requested from the compression object, so failing to call
jpeg_abort_compress() would be innocuous.
Those obsolete memory managers have never been included in
libjpeg-turbo, nor has libjpeg-turbo ever claimed to support MS-DOS
or Mac operating systems prior to OS X 10.4 "Tiger." Note that we
retain the ability to use temp files, even though libjpeg-turbo does
not use them. This allows downstream implementations to write their own
memory managers that use temp files, if necessary.
If the align parameter was set to an unreasonably large value, such as
0x2000000, strides[0] * ph0 and strides[1] * ph1 could have overflowed
the int datatype and wrapped around when computing (src|dst)Planes[1]
and (src|dst)Planes[2] (respectively.) This would have caused
(src|dst)Planes[1] and (src|dst)Planes[2] to point to lower addresses in
the YUV buffer than expected, so the worst case would have been a
visually incorrect output image, not a buffer overrun or other
exploitable issue.
Because of 47656a0820, we can now
reliably determine the correct default values for FLOATTEST8 and
FLOATTEST12 when using Clang or GCC to build for AArch64 or PowerPC
platforms. (Testing confirms that this is the case with GCC 5-13 and
Clang 5-14 on Ubuntu/AArch64, GCC 4 on CentOS 7/PPC, and GCC 8-10 and
Clang 6-12 on Ubuntu/PPCLE.) Other CPU architectures and compilers can
be added on a case-by-case basis as they are tested.
Because of bf01ed2fbc, the simd field in
huff_entropy_encoder (and, by extension, the simd field in
savable_state) is only initialized if WITH_SIMD is defined. Due to an
oversight, the simd field in savable_state was queried in flush_bits()
regardless of whether WITH_SIMD was defined. In most cases, both
branches of the query have identical code, and the optimizer removes the
branch. However, because the legacy Neon GAS Huffman encoder uses the
older bit buffer logic from libjpeg-turbo 2.0.x and prior (refer to
087c29e07f), the branches do not have
identical code when building for AArch64 with NEON_INTRINSICS undefined
(which will be the case if WITH_SIMD is undefined.) Thus, if
libjpeg-turbo was built for AArch64 with the SIMD extensions disabled
at build time, it was possible for the Neon GAS branch in flush_bits()
to be taken, which would have set put_bits to a value that is incorrect
for the C Huffman encoder. Referring to #728, a user reported that this
issue sometimes caused libjpeg-turbo to generate bogus JPEG images if it
was built for AArch64 without SIMD extensions and subsequently used
through the Qt framework. (It should be noted, however, that disabling
the SIMD extensions in AArch64 builds of libjpeg-turbo is inadvisable
for performance reasons.)
I was unable to reproduce the issue on Linux/AArch64 using libjpeg-turbo
alone, despite testing various versions of GCC and Clang and various
optimization levels. However, the issue is reproducible using MSan with
-O0, so this commit also modifies the GitHub Actions workflow so that
compiler optimization is disabled in the linux-msan job. That should
prevent the issue or similar issues from re-emerging.
Fixes#728
libjpeg-turbo implements 4:4:0 "fancy" (smooth) upsampling, which is
enabled by default when decompressing JPEG images that use 4:4:0
chrominance subsampling. libjpeg did not and does not implement fancy
4:4:0 upsampling.
The MD5 sums associated with FLOATTEST8=fp-contract and
FLOATTEST12=fp-contract are appropriate for GCC (tested v5 through v13)
with -ffp-contract=fast, which is the default when compiling for an
architecture that has fused multiply-add (FMA) instructions. However,
different MD5 sums are needed for Clang (tested v5 through v14) with
-ffp-contract=on, which is now the default in Clang 14 when compiling
for an architecture that has FMA instructions.
Refer to #705, #709, #710
* libjpeg-turbo/2.1.x:
ChangeLog.md: List CVE ID fixed by ccaba5d7
jpeglib.h: Document that JCS_RGB565 is decomp-only
Fix block smoothing w/vert.-subsampled prog. JPEGs
# By DRC
# Via DRC
* commit 'eadd243':
Fix interblock smoothing with narrow prog. JPEGs
jchuff.c/flush_bits(): Guard against free_bits < 0
jchuff.c/flush_bits(): Guard against put_bits < 0
Restore xform fuzzer behavior from before 19f9d8f0
xform fuzz: Use src subsamp to calc dst buf size
Doc: Mention that we are a JPEG ref implementation
jchuff.c: Test for out-of-range coefficients
turbojpeg.h: Make customFilter() proto match doc
ChangeLog.md: Fix typo
tjTransform(): Calc dst buf size from xformed dims
Fix build warnings/errs w/ -DNO_GETENV/-DNO_PUTENV
GitHub: Fix x32 build
tjexample.c: Prevent integer overflow
jpeg_crop_scanline: Fix calc w/sclg + 2x4,4x2 samp
Decomp: Don't enable 2-pass color quant w/ RGB565
TJBench: w/JPEG input imgs, set min tile= MCU size
Bump version to 2.1.6 to prepare for new commits
GitHub: Add pull request template
Build: Clarify CMAKE_OSX_ARCHITECTURES error
Build: Fail if included with add_subdirectory()
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# release/deb-control.in
# By DRC
# Via DRC
* tag '2.1.5.1':
ChangeLog.md: Document d743a2c1
OSS-Fuzz: Bail out immediately on decomp failure
SIMD/x86: Initialize simd_support before every use
Build: Define THREAD_LOCAL even if !WITH_TURBOJPEG
# Conflicts:
# CMakeLists.txt
The 5x5 interblock smoothing implementation, introduced in libjpeg-turbo
2.1, improperly extended the logic from the traditional 3x3 smoothing
implementation. Both implementations point prev_block_row and
next_block_row to the current block row when processing, respectively,
the first and the last block row in the image:
if (block_row > 0 || cinfo->output_iMCU_row > 0)
prev_block_row =
buffer[block_row - 1] + cinfo->master->first_MCU_col[ci];
else
prev_block_row = buffer_ptr;
if (block_row < block_rows - 1 ||
cinfo->output_iMCU_row < last_iMCU_row)
next_block_row =
buffer[block_row + 1] + cinfo->master->first_MCU_col[ci];
else
next_block_row = buffer_ptr;
6d91e950c8 naively extended that logic to
accommodate a 5x5 smoothing window:
if (block_row > 1 || cinfo->output_iMCU_row > 1)
prev_prev_block_row =
buffer[block_row - 2] + cinfo->master->first_MCU_col[ci];
else
prev_prev_block_row = prev_block_row;
if (block_row < block_rows - 2 ||
cinfo->output_iMCU_row + 1 < last_iMCU_row)
next_next_block_row =
buffer[block_row + 2] + cinfo->master->first_MCU_col[ci];
else
next_next_block_row = next_block_row;
However, this new logic was only correct if block_rows == 1, so the
values of prev_prev_block_row and next_next_block_row were incorrect
when processing, respectively, the second and second to last iMCU rows
in a vertically-subsampled progressive JPEG image.
The intent was to:
- point prev_block_row to the current block row when processing the
first block row in the image,
- point prev_prev_block_row to prev_block_row when processing the first
two block rows in the image,
- point next_block_row to the current block row when processing the
last block row in the image, and
- point next_next_block_row to next_block_row when processing the last
two block rows in the image.
This commit modifies decompress_smooth_data() so that it computes the
current block row's position relative to the whole image and sets
the block row pointers based on that value.
This commit also restores a line of code that was accidentally deleted
by 6d91e950c8:
access_rows += compptr->v_samp_factor; /* prior iMCU row too */
access_rows is merely a sanity check that tells the access_virt_barray()
method to generate an error if accessing the specified number of rows
would cause a buffer overrun. Essentially, it is a belt-and-suspenders
measure to ensure that j*init_d_coef_controller() allocated enough rows
for the full-image virtual array. Thus, excluding that line of code did
not cause an observable issue.
This commit also documents eadd2436ee in
the change log.
Fixes#721
The 5x5 interblock smoothing implementation, introduced in libjpeg-turbo
2.1, improperly extended the logic from the traditional 3x3 smoothing
implementation. Both implementations point prev_block_row and
next_block_row to the current block row when processing, respectively,
the first and the last block row in the image:
if (block_row > 0 || cinfo->output_iMCU_row > 0)
prev_block_row =
buffer[block_row - 1] + cinfo->master->first_MCU_col[ci];
else
prev_block_row = buffer_ptr;
if (block_row < block_rows - 1 ||
cinfo->output_iMCU_row < last_iMCU_row)
next_block_row =
buffer[block_row + 1] + cinfo->master->first_MCU_col[ci];
else
next_block_row = buffer_ptr;
6d91e950c8 naively extended that logic to
accommodate a 5x5 smoothing window:
if (block_row > 1 || cinfo->output_iMCU_row > 1)
prev_prev_block_row =
buffer[block_row - 2] + cinfo->master->first_MCU_col[ci];
else
prev_prev_block_row = prev_block_row;
if (block_row < block_rows - 2 ||
cinfo->output_iMCU_row + 1 < last_iMCU_row)
next_next_block_row =
buffer[block_row + 2] + cinfo->master->first_MCU_col[ci];
else
next_next_block_row = next_block_row;
However, this new logic was only correct if block_rows == 1, so the
values of prev_prev_block_row and next_next_block_row were incorrect
when processing, respectively, the second and second to last iMCU rows
in a vertically-subsampled progressive JPEG image.
The intent was to:
- point prev_block_row to the current block row when processing the
first block row in the image,
- point prev_prev_block_row to prev_block_row when processing the first
two block rows in the image,
- point next_block_row to the current block row when processing the
last block row in the image, and
- point next_next_block_row to next_block_row when processing the last
two block rows in the image.
This commit modifies decompress_smooth_data() so that it computes the
current block row's position relative to the whole image and sets
the block row pointers based on that value.
This commit also restores a line of code that was accidentally deleted
by 6d91e950c8:
access_rows += compptr->v_samp_factor; /* prior iMCU row too */
access_rows is merely a sanity check that tells the access_virt_barray()
method to generate an error if accessing the specified number of rows
would cause a buffer overrun. Essentially, it is a belt-and-suspenders
measure to ensure that j*init_d_coef_controller() allocated enough rows
for the full-image virtual array. Thus, excluding that line of code did
not cause an observable issue.
This commit also documents dbae59281f in
the change log.
Fixes#721
This allows debuggers and profilers to reliably capture backtraces from
within the x86-64 SIMD functions.
In places where rbp was previously used to access temporary variables
(after stack alignment), we now use r15 and save/restore it accordingly.
The total amount of work is approximately the same, because the previous
code pushed the pre-alignment stack pointer to the aligned stack. The
new prologue and epilogue actually have fewer instructions.
Also note that the {un}collect_args macros now use rbp instead of rax to
access arguments passed on the stack, so we save a few instructions
there as well.
Based on:
debcc7c3b4Closes#707Closes#708
Due to an oversight, the assignment of DC05, DC10, DC15, DC20, and DC25
(the right edge coefficients in the 5x5 interblock smoothing window) in
decompress_smooth_data() was incorrect for images exactly two MCU blocks
wide. For such images, DC04, DC09, DC14, DC19, and DC24 were assigned
values based on the last MCU column, but DC05, DC10, DC15, DC20, and
DC25 were assigned values based on the first MCU column (because
block_num + 1 was never less than last_block_column.) This commit
modifies jdcoefct.c so that, for images at least two MCU blocks wide,
DC05, DC10, DC15, DC20, and DC25 are assigned the same values as DC04,
DC09, DC14, DC19, and DC24 (respectively.) DC05, DC10, DC15, DC20, and
DC25 are then immediately overwritten for images more than two MCU
blocks wide.
Since this issue was minor and not likely obvious to an end user, the
fix is undocumented.
Fixes#700
Due to an oversight, the assignment of DC05, DC10, DC15, DC20, and DC25
(the right edge coefficients in the 5x5 interblock smoothing window) in
decompress_smooth_data() was incorrect for images exactly two MCU blocks
wide. For such images, DC04, DC09, DC14, DC19, and DC24 were assigned
values based on the last MCU column, but DC05, DC10, DC15, DC20, and
DC25 were assigned values based on the first MCU column (because
block_num + 1 was never less than last_block_column.) This commit
modifies jdcoefct.c so that, for images at least two MCU blocks wide,
DC05, DC10, DC15, DC20, and DC25 are assigned the same values as DC04,
DC09, DC14, DC19, and DC24 (respectively.) DC05, DC10, DC15, DC20, and
DC25 are then immediately overwritten for images more than two MCU
blocks wide.
Since this issue was minor and not likely obvious to an end user, the
fix is undocumented.
Fixes#700
(regression introduced by fc01f4673b)
Because of the TurboJPEG 3 API overhaul, the pure compression code path
in tjexample.c now creates a TurboJPEG compression instance prior to
calling tj3LoadImage(). However, due to an oversight, a compression
instance was no longer created in the recompression code path. Thus,
that code path attempted to reuse the TurboJPEG decompression instance,
which caused an error ("tj3Set(): TJPARAM_QUALITY is not applicable to
decompression instances.") This commit modifies tjexample.c so that the
recompression code path creates a new TurboJPEG compression instance if
one has not already been created.
Fixes#716
This fixes a buffer overrun, reported by OSS-Fuzz, that occurred when
attempting to transform a specially-crafted malformed arithmetic-coded
JPEG image into a baseline Huffman-coded JPEG destination image with
default Huffman tables. This issue probably had a similar root cause to
the issue fixed in 31a301389b, but in this
case, the issue only occurred with the SIMD baseline Huffman encoder in
libjpeg-turbo 2.1.x and 2.0.x. It was not reproducible in 3.0.x or when
using the C baseline Huffman encoder. (NOTE: In order to reproduce the
issue with 2.1.x, it was necessary to revert
58cee6d90cd616c86fae142891b987c289ea055a.)
Because libjpeg-turbo 3.0.x now supports multiple data precisions in the
same build, the regression test system can test the 8-bit and 12-bit
floating point DCT/IDCT algorithms separately. The expected MD5 sums
for those tests are communicated to the test system using the FLOATTEST8
and FLOATTEST12 CMake variables. Whereas it is possible to
intelligently set a default value for FLOATTEST8 when building for
x86[-64] and a default value for FLOATTEST12 when building for x86-64,
it is not possible with other architectures. (Refer to #705, #709,
and #710.) Clang 14, for example, now enables FMA (fused multiply-add)
instructions by default on architectures that support them, but with
AArch64 builds, the results are not the same as when using GCC/AArch64
with FMA instructions enabled. Thus, setting FLOATTEST12=fp-contract
doesn't make the tests pass. It was already impossible to intelligently
set a default for FLOATTEST8 with i386 builds, but referring to #710,
that appears to be the case with other non-x86-64 builds as well.
Back in 1991, when Tom Lane first released libjpeg, some CPUs had
floating point units and some didn't. It could take minutes to compress
or decompress a 1-megapixel JPEG image using the "slow" integer DCT/IDCT
algorithms, and the floating point algorithms were significantly faster
on systems that had an FPU. On systems without FPUs, floating point
math was emulated and really slow, so Tom also developed "fast" integer
DCT/IDCT algorithms to speed up JPEG performance, at the expense of
accuracy, on those systems. Because of libjpeg-turbo's SIMD extensions,
the floating point algorithms are now significantly slower than the
"slow" integer algorithms without being significantly more accurate, and
the "fast" integer algorithms fail the ISO/ITU-T conformance tests
without being any faster than the "slow" integer algorithms on x86
systems. Thus, the floating point and "fast" integer algorithms are
considered legacy features.
In order for the floating point regression tests to be useful, the
results of the tests must be validated against an independent metric.
(In other words, it wouldn't be useful to use the floating point
DCT/IDCT algorithms to determine the expected results of the floating
point DCT/IDCT algorithms.) In the past, I attempted without success to
develop a low-level floating point test that would run at configure time
and determine the appropriate default value of FLOATTEST*. Barring that
approach, the only other possibilities would be:
1. Develop a test framework that compares the floating point results
with a margin of error, as TJUnitTest does. However, that effort isn't
justified unless it could also benefit non-legacy features.
2. Compare the floating point results against an expected MD5 sum, as we
currently do. However, as previously described, it isn't possible in
most cases to determine an appropriate default value for the expected
MD5 sum.
For the moment, it makes the most sense to disable the 8-bit floating
point tests by default except with x86[-64] builds and to disable the
12-bit floating point tests by default except with x86-64 builds. That
means that the floating point algorithms will still be regression tested
when performing x86[-64] builds, but other types of builds will have to
opt in to the same regression tests. Since the floating point DCT/IDCT
algorithms are unlikely to change ever again (the only reason they still
exist at all is to maintain backward compatibility with libjpeg), this
seems like a reasonable tradeoff.
This fixes a UBSan negative shift warning, reported by OSS-Fuzz, that
occurred when attempting to transform a specially-crafted malformed
arithmetic-coded JPEG image into a baseline Huffman-coded JPEG
destination image with default Huffman tables. This issue probably
had a similar root cause to the issue fixed in
31a301389b, but in this case, the issue
only occurred with the SIMD baseline Huffman encoder in libjpeg-turbo
2.1.x. It was not reproducible in 2.0.x or 3.0.x or when using the
C baseline Huffman encoder.
- The example-*bit-*-decompress test must run after the
example-*bit-*-compress test, since the latter generates
testout*-example.jpg.
- Add -static to the filenames of all output files generated by the
"static" regression tests, to avoid conflicts with the "shared"
regression tests.
- Add the PID to the filenames of all files generated by the tjunittest
packed-pixel image I/O tests.
- Check the return value of MD5File() in tjunittest to avoid a segfault
if the file doesn't exist. (Prior to the fix described above, that
could occur if two instances of tjunittest ran concurrently from the
same directory with the same -bmp and -precision arguments.)
Fixes#705
The intent was for the final transform operation to be the same as the
first transform operation but without TJXOPT_COPYNONE or
TJPARAM_NOREALLOC. Unrolling the transform operations in
c8d52f1c4c accidentally changed that.
The intent was for the final transform operation to be the same as the
first transform operation but without TJXOPT_COPYNONE or
TJFLAG_NOREALLOC. Unrolling the transform operations in
19f9d8f0fd accidentally changed that.
Referring to
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60379
there are some specially-crafted malformed JPEG images that, when
transformed to grayscale, will exceed the worst-case transformed
grayscale JPEG image size. This is similar in nature to the issue fixed
by 19f9d8f0fd, except that in this case,
the issue occurs regardless of the amount of metadata in the source
image. Also, the tjTransform() function, the
Java_org_libjpegturbo_turbojpeg_TJTransformer_transform() JNI function,
and TJBench were behaving correctly in this case, because the TurboJPEG
API documentation specifies that the source image's subsampling type
should be used when computing the worst-case transformed JPEG image
size. (However, only the Java API documentation specified that. Oops.
The C API documentation now does as well.) The documented usage
mitigates the issue, and only the transform fuzzer did not adhere to
that. Thus, this was an issue with the fuzzer itself rather than an
issue with the library.
Referring to
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60379
there are some specially-crafted malformed JPEG images that, when
transformed to grayscale, will exceed the worst-case transformed
grayscale JPEG image size. This is similar in nature to the issue fixed
by c8d52f1c4c, except that in this case,
the issue occurs regardless of the amount of metadata in the source
image. Also, the tj3Transform() function, the
Java_org_libjpegturbo_turbojpeg_TJTransformer_transform() JNI function,
and TJBench were behaving correctly in this case, because the TurboJPEG
API documentation specifies that the source image's subsampling type
should be used when computing the worst-case transformed JPEG image
size. (However, only the Java API documentation specified that. Oops.
The C API documentation now does as well.) The documented usage
mitigates the issue, and only the transform fuzzer did not adhere to
that. Thus, this was an issue with the fuzzer itself rather than an
issue with the library.
Unlike its predecessors, tj3DecompressHeader() does not fail if passed
a JPEG image with an unknown subsampling type. This led me to believe
that it was OK for the fuzzers to abort if tj3DecompressHeader()
returned an error. However, there are apparently some malformed JPEG
images that can expose issues in libjpeg-turbo while also causing
tj3DecompressHeader() to complain about header errors. Thus, it is best
to ignore the return value of tj3DecompressHeader(), as the fuzzers in
libjpeg-turbo 2.1.x and prior did.
The SignPath release certificate for our project is not yet available as
of this writing, but this commit prepares the CI system to use the
release certificate whenever it becomes available. It also restricts
signing only to tags, which correspond to official releases. (That
mimics our traditional policy of not signing pre-release builds.)
Restore two coefficient range checks from libjpeg to the C baseline
Huffman encoder. This fixes an issue
(https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60253) whereby
the encoder could read from uninitialized memory when attempting to
transform a specially-crafted malformed arithmetic-coded JPEG source
image into a baseline Huffman-coded JPEG destination image with default
Huffman tables. More specifically, the out-of-range coefficients caused
r to equal 256, which overflowed the actbl->ehufsi[] array. Because the
overflow was contained within the huff_entropy_encoder structure, this
issue was not exploitable (nor was it observable at all on x86 or Arm
CPUs unless JSIMD_NOHUFFENC=1 or JSIMD_FORCENONE=1 was set in the
environment or unless libjpeg-turbo was built with WITH_SIMD=0.)
The fix is performance-neutral (+/- 1-2%) for x86-64 code and causes a
0-4% (avg. 1-2%, +/- 1-2%) compression regression for i386 code on Intel
CPUs when the C baseline Huffman encoder is used (JSIMD_NOHUFFENC=1).
The fix is performance-neutral (+/- 1-2%) on Intel CPUs when all of the
libjpeg-turbo SIMD extensions are disabled (JSIMD_FORCENONE=1). The fix
causes a 0-2% (avg. <1%, +/- 1%) compression regression for PowerPC
code.
This is subtle, but the tj3DecompressHeader() function sets the values
of TJPARAM_ARITHMETIC, TJPARAM_OPTIMIZE, and TJPARAM_PROGRESSIVE.
Unless we unset those values, the entropy algorithm used in the
transformed JPEG image will be determined by the union of the parameter
values and the transform options, which isn't what we want.
Restore two coefficient range checks from libjpeg to the C baseline
Huffman encoder. This fixes an issue
(https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=60253) whereby
the encoder could read from uninitialized memory when attempting to
transform a specially-crafted malformed arithmetic-coded JPEG source
image into a baseline Huffman-coded JPEG destination image with default
Huffman tables. More specifically, the out-of-range coefficients caused
r to equal 256, which overflowed the actbl->ehufsi[] array. Because the
overflow was contained within the huff_entropy_encoder structure, this
issue was not exploitable (nor was it observable at all on x86 or Arm
CPUs unless JSIMD_NOHUFFENC=1 or JSIMD_FORCENONE=1 was set in the
environment or unless libjpeg-turbo was built with WITH_SIMD=0.)
The fix is performance-neutral (+/- 1-2%) for x86-64 code and causes a
0-4% (avg. 1-2%, +/- 1-2%) compression regression for i386 code on Intel
CPUs when the C baseline Huffman encoder is used (JSIMD_NOHUFFENC=1).
The fix is performance-neutral (+/- 1-2%) on Intel CPUs when all of the
libjpeg-turbo SIMD extensions are disabled (JSIMD_FORCENONE=1). The fix
causes a 0-2% (avg. <1%, +/- 1%) compression regression for PowerPC
code.
Color quantization is a legacy feature that serves little or no purpose
with lossless JPEG images. 9f756bc67a
eliminated interaction issues between the lossless decompressor and the
color quantizers related to out-of-range 12-bit samples, but referring
to #701, other interaction issues apparently still exist. Such issues
are likely, given the fact that the color quantizers were not designed
with lossless decompression in mind.
This commit reverts 9f756bc67a, since the
issues it fixed are no longer relevant because of this commit and
2192560d74.
Fixed#672Fixes#673Fixes#674Fixes#676Fixes#677Fixes#678Fixes#679Fixes#681Fixes#683Fixes#701
When used with TJFLAG_NOREALLOC and with TJXOP_TRANSPOSE,
TJXOP_TRANSVERSE, TJXOP_ROT90, or TJXOP_ROT270, tjTransform()
incorrectly based the destination buffer size for a transform on the
source image dimensions rather than the transformed image dimensions.
This was apparently a long-standing bug that had existed in the
tjTransform() function since its inception. As initially implemented in
the evolving libjpeg-turbo v1.2 code base, tjTransform() required
dstSizes[i] to be set regardless of whether TJFLAG_NOREALLOC was set.
ff78e37595, which was introduced later in
the evolving libjpeg-turbo v1.2 code base, removed that requirement and
planted the seed for the bug. However, the bug was not activated until
9b49f0e4c7 was introduced still later in
the evolving libjpeg-turbo v1.2 code base, adding a subsampling type
argument to the (new at the time) tjBufSize() function and thus making
the width and height arguments no longer commutative.
The bug opened up the possibility that a JPEG source image could cause
tjTransform() to overflow the destination buffer for a transform if all
of the following were true:
- The JPEG source image used 4:2:2, 4:4:0, or 4:1:1 subsampling. (These
are the only supported subsampling types for which the width and
height arguments to tjBufSize() are not commutative.)
- The width and height of the JPEG source image were such that
tjBufSize(height, width, subsamplingType) returned a smaller value
than tjBufSize(width, height, subsamplingType).
- The JPEG source image contained enough metadata that the size of the
transformed image was larger than
tjBufSize(height, width, subsamplingType).
- TJFLAG_NOREALLOC was set.
- TJXOP_TRANSPOSE, TJXOP_TRANSVERSE, TJXOP_ROT90, or TJXOP_ROT270 was
used.
- TJXOPT_COPYNONE was not set.
- TJXOPT_CROP was not set.
- The calling program allocated
tjBufSize(height, width, subsamplingType) bytes for the
destination buffer, as the API documentation instructs.
The API documentation cautions that JPEG source images containing a
large amount of extraneous metadata (EXIF, IPTC, ICC, etc.) cannot
reliably be transformed if TJFLAG_NOREALLOC is set and TJXOPT_COPYNONE
is not set. Irrespective of the bug, there are still cases in which a
JPEG source image with a large amount of metadata can, when transformed,
exceed the worst-case transformed JPEG image size. For instance, if you
try to losslessly crop a JPEG image with 3 kB of EXIF data to 16x16
pixels, then you are guaranteed to exceed the worst-case 16x16 JPEG
image size unless you discard the EXIF data.
Even without the bug, tjTransform() will still fail with "Buffer passed
to JPEG library is too small" when attempting to transform JPEG source
images that meet the aforementioned criteria. The bug is that the
function segfaults rather than failing gracefully, but the chances of
that occurring in a real-world application are very slim. Any
real-world application developers who attempted to transform arbitrary
JPEG source images with TJFLAG_NOREALLOC set would very quickly realize
that they cannot reliably do that without also setting TJXOPT_COPYNONE.
Thus, I posit that the actual risk posed by this bug is low.
Applications such as web browsers that are the most exposed to security
risks from arbitrary JPEG source images do not use the TurboJPEG
lossless transform feature. (None of those applications even use the
TurboJPEG API, to the best of my knowledge, and the public libjpeg API
has no equivalent transform function.) Our only command-line interface
to the tjTransform() function, TJBench, was not exposed to the bug
because it had a compatible bug whereby it allocated the JPEG
destination buffer to the same size that tjTransform() erroneously
expected. The TurboJPEG Java API was also not exposed to the bug
because of a similar compatible bug in the
Java_org_libjpegturbo_turbojpeg_TJTransformer_transform() JNI function.
(This commit fixes both compatible bugs.)
In short, best practices for tjTransform() are to use TJFLAG_NOREALLOC
only with JPEG source images that are known to be free of metadata (such
as images generated by tjCompress*()) or to use TJXOPT_COPYNONE along
with TJFLAG_NOREALLOC. Still, however, the function shouldn't segfault
as long as the calling program allocates the suggested amount of space
for the JPEG destination buffer.
Usability notes:
tjTransform() could hypothetically require dstSizes[i] to be set
regardless of whether TJFLAG_NOREALLOC is set, but there are usability
pitfalls either way. The main pitfall I sought to avoid with
ff78e37595 was a calling program failing
to set dstSizes[i] at all, thus leaving its value undefined. It could
be argued that requiring dstSizes[i] to be set in all cases is more
consistent, but it could also be argued that not requiring it to be set
when TJFLAG_NOREALLOC is set is more user-proof. tjTransform() could
also hypothetically set TJXOPT_COPYNONE automatically when
TJFLAG_NOREALLOC is set, but that could lead to user confusion.
When used with TJPARAM_NOREALLOC and with TJXOP_TRANSPOSE,
TJXOP_TRANSVERSE, TJXOP_ROT90, or TJXOP_ROT270, tj3Transform()
incorrectly based the destination buffer size for a transform on the
source image dimensions rather than the transformed image dimensions.
This was apparently a long-standing bug that had existed in the
tj*Transform() function since its inception. As initially implemented
in the evolving libjpeg-turbo v1.2 code base, tjTransform() required
dstSizes[i] to be set regardless of whether TJFLAG_NOREALLOC (the
predecessor to TJPARAM_NOREALLOC) was set.
ff78e37595, which was introduced later in
the evolving libjpeg-turbo v1.2 code base, removed that requirement and
planted the seed for the bug. However, the bug was not activated until
9b49f0e4c7 was introduced still later in
the evolving libjpeg-turbo v1.2 code base, adding a subsampling type
argument to the (new at the time) tjBufSize() function and thus making
the width and height arguments no longer commutative.
The bug opened up the possibility that a JPEG source image could cause
tj3Transform() to overflow the destination buffer for a transform if all
of the following were true:
- The JPEG source image used 4:2:2, 4:4:0, 4:1:1, or 4:4:1 subsampling.
(These are the only subsampling types for which the width and height
arguments to tj3JPEGBufSize() are not commutative.)
- The width and height of the JPEG source image were such that
tj3JPEGBufSize(height, width, subsamplingType) returned a smaller
value than tj3JPEGBufSize(width, height, subsamplingType).
- The JPEG source image contained enough metadata that the size of the
transformed image was larger than
tj3JPEGBufSize(height, width, subsamplingType).
- TJPARAM_NOREALLOC was set.
- TJXOP_TRANSPOSE, TJXOP_TRANSVERSE, TJXOP_ROT90, or TJXOP_ROT270 was
used.
- TJXOPT_COPYNONE was not set.
- TJXOPT_CROP was not set.
- The calling program allocated
tj3JPEGBufSize(height, width, subsamplingType) bytes for the
destination buffer, as the API documentation instructs.
The API documentation cautions that JPEG source images containing a
large amount of extraneous metadata (EXIF, IPTC, ICC, etc.) cannot
reliably be transformed if TJPARAM_NOREALLOC is set and TJXOPT_COPYNONE
is not set. Irrespective of the bug, there are still cases in which a
JPEG source image with a large amount of metadata can, when transformed,
exceed the worst-case transformed JPEG image size. For instance, if you
try to losslessly crop a JPEG image with 3 kB of EXIF data to 16x16
pixels, then you are guaranteed to exceed the worst-case 16x16 JPEG
image size unless you discard the EXIF data.
Even without the bug, tj3Transform() will still fail with "Buffer passed
to JPEG library is too small" when attempting to transform JPEG source
images that meet the aforementioned criteria. The bug is that the
function segfaults rather than failing gracefully, but the chances of
that occurring in a real-world application are very slim. Any
real-world application developers who attempted to transform arbitrary
JPEG source images with TJPARAM_NOREALLOC set would very quickly realize
that they cannot reliably do that without also setting TJXOPT_COPYNONE.
Thus, I posit that the actual risk posed by this bug is low.
Applications such as web browsers that are the most exposed to security
risks from arbitrary JPEG source images do not use the TurboJPEG
lossless transform feature. (None of those applications even use the
TurboJPEG API, to the best of my knowledge, and the public libjpeg API
has no equivalent transform function.) Our only command-line interface
to the tj3Transform() function, TJBench, was not exposed to the bug
because it had a compatible bug whereby it allocated the JPEG
destination buffer to the same size that tj3Transform() erroneously
expected. The TurboJPEG Java API was also not exposed to the bug
because of a similar compatible bug in the
Java_org_libjpegturbo_turbojpeg_TJTransformer_transform() JNI function.
(This commit fixes both compatible bugs.)
In short, best practices for tj3Transform() are to use TJPARAM_NOREALLOC
only with JPEG source images that are known to be free of metadata (such
as images generated by tj3Compress*()) or to use TJXOPT_COPYNONE along
with TJPARAM_NOREALLOC. Still, however, the function shouldn't segfault
as long as the calling program allocates the suggested amount of space
for the JPEG destination buffer.
Usability notes:
tj3Transform() could hypothetically require dstSizes[i] to be set
regardless of the value of TJPARAM_NOREALLOC, but there are usability
pitfalls either way. The main pitfall I sought to avoid with
ff78e37595 was a calling program failing
to set dstSizes[i] at all, thus leaving its value undefined. It could
be argued that requiring dstSizes[i] to be set in all cases is more
consistent, but it could also be argued that not requiring it to be set
when TJPARAM_NOREALLOC is set is more user-proof. tj3Transform() could
also hypothetically set TJXOPT_COPYNONE automatically when
TJPARAM_NOREALLOC is set, but that could lead to user confusion.
Ultimately, I would like to address these issues in TurboJPEG v4 by
using managed buffer objects, but that would be an extensive overhaul.
- strtest.c: Fix unused variable warnings if both -DNO_GETENV and
-DNO_PUTENV are specified or if only -DNO_GETENV is specified.
- jinclude.h: Fix build error if only -DNO_GETENV is specified.
Fixes#697
- strtest.c: Fix unused variable warnings if both -DNO_GETENV and
-DNO_PUTENV are specified or if only -DNO_GETENV is specified.
- jinclude.h: Fix build error if only -DNO_GETENV is specified.
Fixes#697
Because width, height, and tjPixelSize[] are signed integers, signed
integer overflow will occur if width * height *
tjPixelSize[pixelFormat] > INT_MAX or jpegSize > INT_MAX, which would
cause an incorrect value to be passed to tjAlloc(). This commit
modifies tjexample.c in the following ways:
- Use malloc() rather than tjAlloc() to allocate the uncompressed image
buffer. (tjAlloc() is only necessary for JPEG buffers that will
potentially be reallocated by the TurboJPEG API library.)
- Implicitly promote width, height, and tjPixelSize[pixelFormat] to
size_t before multiplying them.
- If size_t is 32-bit, throw an error if width * height *
tjPixelSize[pixelFormat] would overflow the data type.
- Throw an error if jpegSize would overflow a signed integer (which
could only happen on a 64-bit machine.)
Since tjexample is not installed or packaged, the worst case for this
issue was that a downstream application might interpret tjexample.c
literally and introduce a similar overflow issue into its own code.
However, it's worth noting that such issues could also be introduced
when using malloc().
Because width, height, and tjPixelSize[] are signed integers, signed
integer overflow will occur if width * height *
tjPixelSize[pixelFormat] > INT_MAX, which would cause an incorrect value
to be passed to tj3Alloc(). This commit modifies tjexample.c in the
following ways:
- Implicitly promote width, height, and tjPixelSize[pixelFormat] to
size_t before multiplying them.
- Use malloc() rather than tj3Alloc() to allocate the uncompressed image
buffer. (tj3Alloc() is only necessary for JPEG buffers that will
potentially be reallocated by the TurboJPEG API library.)
- If size_t is 32-bit, throw an error if width * height *
tjPixelSize[pixelFormat] would overflow the data type.
Since tjexample is not installed or packaged, the worst case for this
issue was that a downstream application might interpret tjexample.c
literally and introduce a similar overflow issue into its own code.
However, it's worth noting that such issues could also be introduced
when using malloc().
Colorspace conversion is explicitly not supported with lossless JPEG
images. Merged upsampling implies YCbCr-to-RGB colorspace conversion,
so allowing it with lossless decompression was an oversight.
9f756bc67a eliminated interaction issues
between the lossless decompressor and the merged upsampler related to
out-of-range 12-bit samples, but referring to #690, other interaction
issues apparently still exist. Such issues are likely, given the fact
that the merged upsampler was never designed with lossless decompression
in mind.
This commit also extends the decompress fuzzer so that it catches the
issue reported in #690.
Fixes#690
Redundantly fixes#670
Redundantly fixes#675
- Clarify that encrypted e-mail is optional.
- Mention the new GitHub security advisory system.
- Clarify that vulnerabilities against new features that are not yet in
a Stable release series need not be reported securely.
When computing the downsampled width for a particular component,
jpeg_crop_scanline() needs to take into account the fact that the
libjpeg code uses a combination of IDCT scaling and upsampling to
implement 4x2 and 2x4 upsampling with certain decompression scaling
factors. Failing to account for that led to incomplete upsampling of
4x2- or 2x4-subsampled components, which caused the color converter to
read from uninitialized memory. With 12-bit data precision, this caused
a buffer overrun or underrun and subsequent segfault if the
uninitialized memory contained a value that was outside of the valid
sample range (because the color converter uses the value as an array
index.)
Fixes#669
When computing the downsampled width for a particular component,
jpeg_crop_scanline() needs to take into account the fact that the
libjpeg code uses a combination of IDCT scaling and upsampling to
implement 4x2 and 2x4 upsampling with certain decompression scaling
factors. Failing to account for that led to incomplete upsampling of
4x2- or 2x4-subsampled components, which caused the color converter to
read from uninitialized memory. With 12-bit data precision, this caused
a buffer overrun or underrun and subsequent segfault if the
uninitialized memory contained a value that was outside of the valid
sample range (because the color converter uses the value as an array
index.)
Fixes#669
The 2-pass color quantization algorithm assumes 3-sample pixels. RGB565
is the only 3-component colorspace that doesn't have 3-sample pixels, so
we need to treat it as a special case when determining whether to enable
2-pass color quantization. Otherwise, attempting to initialize 2-pass
color quantization with an RGB565 output buffer could cause
prescan_quantize() to read from uninitialized memory and subsequently
underflow/overflow the histogram array.
djpeg is supposed to fail gracefully if both -rgb565 and -colors are
specified, because none of its destination managers (image writers)
support color quantization with RGB565. However, prescan_quantize() was
called before that could occur. It is possible but very unlikely that
these issues could have been reproduced in applications other than
djpeg. The issues involve the use of two features (12-bit precision and
RGB565) that are incompatible, and they also involve the use of two
rarely-used legacy features (RGB565 and color quantization) that don't
make much sense when combined.
Fixes#668Fixes#671Fixes#680
The 2-pass color quantization algorithm assumes 3-sample pixels. RGB565
is the only 3-component colorspace that doesn't have 3-sample pixels, so
we need to treat it as a special case when determining whether to enable
2-pass color quantization. Otherwise, attempting to initialize 2-pass
color quantization with an RGB565 output buffer could cause
prescan_quantize() to read from uninitialized memory and subsequently
underflow/overflow the histogram array.
djpeg is supposed to fail gracefully if both -rgb565 and -colors are
specified, because none of its destination managers (image writers)
support color quantization with RGB565. However, prescan_quantize() was
called before that could occur. It is possible but very unlikely that
these issues could have been reproduced in applications other than
djpeg. The issues involve the use of two features (12-bit precision and
RGB565) that are incompatible, and they also involve the use of two
rarely-used legacy features (RGB565 and color quantization) that don't
make much sense when combined.
Fixes#668Fixes#671Fixes#680
12-bit is the only data precision for which the range of the sample data
type exceeds the valid sample range, so it is possible to craft a 12-bit
lossless JPEG image that contains out-of-range 12-bit samples.
Attempting to decompress such an image using color quantization or merged
upsampling (NOTE: libjpeg-turbo cannot generate YCbCr or subsampled
lossless JPEG images, but it can decompress them) caused segfaults or
buffer overruns when those algorithms attempted to use the out-of-range
sample values as array indices. This commit modifies the lossless
decompressor so that it range-limits the output of the scaler when using
12-bit samples.
Fixes#670Fixes#672Fixes#673Fixes#674Fixes#675Fixes#676Fixes#677Fixes#678Fixes#679Fixes#681Fixes#683
If PIC isn't enabled for the entire build (using
CMAKE_POSITION_INDEPENDENT_CODE), then we need to enable it for any of
the objects in the libjpeg-turbo shared libraries. Ideally, however, we
don't want to enable PIC for any of the objects in the libjpeg-turbo
static libraries, unless CMAKE_POSITION_INDEPENDENT_CODE is set. Thus,
we need to build separate static and shared jpeg12 and jpeg16 object
libraries.
Fixes#684
(oversight from 386ec0abc7)
Tiled decompression will ultimately fail if the subsampling type of the
JPEG input image is unknown, but the C version of TJBench needs to fail
earlier in order to avoid using -1 (TJSAMP_UNKNOWN) as an array index
for tjMCUWidth[]/tjMCUHeight[]. The Java version now fails earlier as
well, although there is no benefit to that other than making the error
message less cryptic.
When -tile is used with a JPEG input image, TJBench generates the tiles
using lossless cropping, which will fail if the cropping region doesn't
align with an MCU boundary. Furthermore, there is no reason to avoid
8x8 tiles when decompressing 4:4:4 or grayscale JPEG images.
If we have transformed a 4:1:1 or 4:4:1 JPEG input image in such a way
that the horizontal and vertical dimensions are transposed, then we need
to change the subsampling type that is passed to the decomp() function.
Otherwise, tj3YUVBufSize() may return an incorrect value.
(oversight from fc881ebb21)
When -tile is used with a JPEG input image, TJBench generates the tiles
using lossless cropping, which will fail if the cropping region doesn't
align with an MCU boundary. Furthermore, there is no reason to avoid
8x8 tiles when decompressing 4:4:4 or grayscale JPEG images.
This allows losslessly transposed or rotated 4:1:1 JPEG images to be
losslessly cropped, partially decompressed, or decompressed to planar
YUV images.
Because tj3Transform() allows multiple lossless transformations to be
chained together, all subsampling options need to have a corresponding
transposed subsampling option. (This is why 4:4:0 was originally
implemented as well.) Otherwise, the documentation would be technically
incorrect. It says that images with unknown subsampling types cannot be
losslessly cropped, partially decompressed, or decompressed to planar
YUV images, but it doesn't say anything about images with known
subsampling types whose subsampling type becomes unknown if the image is
rotated or transposed. This is one of those situations in which it is
easier to implement a feature that works around the problem than to
document the problem.
Closes#659
It's not that the build system doesn't support multiple values in
CMAKE_OSX_ARCHITECTURES. It's that libjpeg-turbo, because of its SIMD
extensions, *cannot* support multiple values in CMAKE_OSX_ARCHITECTURES.
It's not that the build system doesn't support multiple values in
CMAKE_OSX_ARCHITECTURES. It's that libjpeg-turbo, because of its SIMD
extensions, *cannot* support multiple values in CMAKE_OSX_ARCHITECTURES.
Even though BUILDING.md and CONTRIBUTING.md explicitly state that the
libjpeg-turbo build system does not and will not support being
integrated into downstream build systems using add_subdirectory() (see
0565548191), people continue to file bug
reports, feature requests, and pull requests regarding that (see #265,
#637, and #653 in addition to the issues listed in
05655481917a2d2761cf2fe19b76f639b7f159ef.) Responding to those
issues wastes our project's limited resources. Hopefully people will
get the hint if the build system explicitly tells them that it can't be
included using add_subdirectory(), which will prompt them to read the
comments in CMakeLists.txt explaining why. To anyone stumbling upon
this commit message, please refer to the discussions under the issues
listed above, as well as the issues listed in
0565548191. Our project's position on
this has been stated, explained, and defended numerous times.
Even though BUILDING.md and CONTRIBUTING.md explicitly state that the
libjpeg-turbo build system does not and will not support being
integrated into downstream build systems using add_subdirectory() (see
0565548191), people continue to file bug
reports, feature requests, and pull requests regarding that (see #265,
#637, and #653 in addition to the issues listed in
05655481917a2d2761cf2fe19b76f639b7f159ef.) Responding to those
issues wastes our project's limited resources. Hopefully people will
get the hint if the build system explicitly tells them that it can't be
included using add_subdirectory(), which will prompt them to read the
comments in CMakeLists.txt explaining why. To anyone stumbling upon
this commit message, please refer to the discussions under the issues
listed above, as well as the issues listed in
0565548191. Our project's position on
this has been stated, explained, and defended numerous times.
Don't keep trying to decompress the same image if tj3Decompress*() has
already thrown an error. Otherwise, if the image has an excessive
number of scans, then each iteration of the loop will try to decompress
up to the scan limit, which may cause the overall test to time out even
if one iteration doesn't time out.
Don't keep trying to decompress the same image if tj3Decompress*() has
already thrown an error. Otherwise, if the image has an excessive
number of scans, then each iteration of the loop will try to decompress
up to the scan limit, which may cause the overall test to time out even
if one iteration doesn't time out.
* tag '2.1.5': (41 commits)
BUILDING.md: Specify install prefix for MinGW/Un*x
Java: Guard against int overflow in size methods
turbojpeg.c: Fix UBSan warning
tjPlane*(): Guard against int overflow
Java doc: TJ.pixelSize --> TJ.getPixelSize()
TJBench: Unset TJ*OPT_CROP when disabling tiling
TJExample: Remove "underlying codec" references
GitHub: Update to actions/checkout@v3
TJBench: Set TJ*OPT_PROGRESSIVE with -progressive
TJBench/Java: Fix parsing of quality ranges
TJBench: Strictly check all non-boolean arguments
TurboJPEG: More documentation improvements
TJDecompressor.java: Exception message tweak
12-bit: Set alpha channel to 4095 rather than 255
TJDecompressor.java: "YUV" = "planar YUV"
Java: Don't allow int overflow in buf size methods
tjDecompressToYUV2: Use scaled dims for plane calc
TurboJPEG: Numerous documentation improvements
TurboJPEG: Don't use backward compatibility macros
TurboJPEG: Ensure 'pad' arg is a power of 2
...
As long as a libjpeg instance is only used by one thread at a time, a
program is technically within its rights to call jpeg_start_*compress()
in one thread and jpeg_(read|write)_*(), with the same libjpeg instance,
in a second thread. However, because the various jsimd_can*() functions
are called within the body of jpeg_start_*compress() and simd_support is
now thread-local (due to f579cc11b3), that
led to a situation in which simd_support was initialized in the first
thread but not the second. The uninitialized value of simd_support is
0xFFFFFFFF, which the second thread interpreted to mean that it could
use any instruction set, and when it attempted to use AVX2 instructions
on a CPU that didn't support them, an illegal instruction error
occurred.
This issue was known to affect libvips.
This commit modifies the i386 and x86-64 SIMD dispatchers so that the
various jsimd_*() functions always call init_simd(), if simd_support is
uninitialized, prior to dispatching based on the value of simd_support.
Note that the other SIMD dispatchers don't need this, because only the
x86 SIMD extensions currently support multiple instruction sets.
This patch has been verified to be performance-neutral to within
+/- 0.4% with 32-bit and 64-bit code running on a 2.8 GHz Intel Xeon
W3530 and a 3.6 GHz Intel Xeon W2123.
Fixes#649
As long as a libjpeg instance is only used by one thread at a time, a
program is technically within its rights to call jpeg_start_*compress()
in one thread and jpeg_(read|write)_*(), with the same libjpeg instance,
in a second thread. However, because the various jsimd_can*() functions
are called within the body of jpeg_start_*compress() and simd_support is
now thread-local (due to f579cc11b3), that
led to a situation in which simd_support was initialized in the first
thread but not the second. The uninitialized value of simd_support is
0xFFFFFFFF, which the second thread interpreted to mean that it could
use any instruction set, and when it attempted to use AVX2 instructions
on a CPU that didn't support them, an illegal instruction error
occurred.
This issue was known to affect libvips.
This commit modifies the i386 and x86-64 SIMD dispatchers so that the
various jsimd_*() functions always call init_simd(), if simd_support is
uninitialized, prior to dispatching based on the value of simd_support.
Note that the other SIMD dispatchers don't need this, because only the
x86 SIMD extensions currently support multiple instruction sets.
This patch has been verified to be performance-neutral to within
+/- 0.4% with 32-bit and 64-bit code running on a 2.8 GHz Intel Xeon
W3530 and a 3.6 GHz Intel Xeon W2123.
Fixes#649
(regression introduced by fc01f4673b)
Oops. In the process of migrating the fuzzers to the TurboJPEG 3 API,
I accidentally left out the code in decompress.cc that updates the width
and height based on the scaling factor (but I apparently included that
code in decompress_yuv.cc.)
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55573
Since tj3Alloc() now accepts a size_t argument rather than an int
argument, it is no longer necessary to check for signed integer overflow
in the C version of TJBench.
The default install prefix when building under MinGW is chosen based on
the needs of the official build system, which uses MSYS2 to generate
Windows installer packages that install under c:\libjpeg-turbo-gcc[64].
However, attempting to configure the build with that install prefix on
a Un*x machine causes a CMake error.
Fixes#641
Those changes worked around an innocuous UBSan warning that was
exposed by the new TurboJPEG 3 transform fuzz target, due to the fact
that tj3Transform() no longer rejects images with unknown subsampling
configurations. That UBSan warning was a false positive, and attempting
to fix it introduced a buffer overrun triggered by a malformed input
image that causes jpeg_write_marker() to be called with datalen == 0. I
suspect that the UBSan false positive was only reproducible on my local
machine, but I guess we'll see.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=55413
- Don't report which function was being called when a TurboJPEG error
occurred, because the TurboJPEG error message already contains that
information.
- Use THROW_TJG() for functions that report errors globally instead of
through an instance handle.
- Use THROW_TJ() for the image I/O functions.
- Formatting tweaks
In decompression and transform functions, use the libjpeg API state
rather than a TurboJPEG instance variable to determine whether
jpeg_mem_src_tj() and jpeg_read_header() have already been called by a
wrapper function.
This actually works and apparently always has worked. It only failed
because the libjpeg code, which did not originally support arithmetic
coding, assumed that optimize_coding should always be TRUE for 12-bit
data precision.
(ChangeLog update forthcoming)
- Prefix all function names with "tj3" and remove version suffixes from
function names. (Future API overhauls will increment the prefix to
"tj4", etc., thus retaining backward API/ABI compatibility without
versioning each individual function.)
- Replace stateless boolean flags (including TJ*FLAG_ARITHMETIC and
TJ*FLAG_LOSSLESS, which were never released) with stateful integer
parameters, the value of which persists between function calls.
* Use parameters for the JPEG quality and subsampling as well, in
order to eliminate the awkwardness of specifying function arguments
that weren't relevant for lossless compression.
* tj3DecompressHeader() now stores all relevant information about the
JPEG image, including the width, height, subsampling type, entropy
coding type, etc. in parameters rather than returning that
information in its arguments.
* TJ*FLAG_LIMITSCANS has been reimplemented as an integer parameter
(TJ*PARAM_SCANLIMIT) that allows the number of scans to be
specified.
- Use the const keyword for all pointer arguments to unmodified
buffers, as well as for both dimensions of 2D pointers. Addresses
#395.
- Use size_t rather than unsigned long to represent buffer sizes, since
unsigned long is a 32-bit type on Windows. Addresses #24.
- Return 0 from all buffer size functions if an error occurs, rather
than awkwardly trying to return -1 in an unsigned data type.
- Implement 12-bit and 16-bit data precision using dedicated
compression, decompression, and image I/O functions/methods.
* Suffix the names of all data-precision-specific functions with 8,
12, or 16.
* Because the YUV functions are intended to be used for video, they
are currently only implemented with 8-bit data precision, but they
can be expanded to 12-bit data precision in the future, if
necessary.
* Extend TJUnitTest and TJBench to test 12-bit and 16-bit data
precision, using a new -precision option.
* Add appropriate regression tests for all of the above to the 'test'
target.
* Extend tjbenchtest to test 12-bit and 16-bit data precision, and
add separate 'tjtest12' and 'tjtest16' targets.
* BufferedImage I/O in the Java API is currently limited to 8-bit
data precision, since the BufferedImage class does not
straightforwardly support higher data precisions.
* Extend the PPM reader to convert 12-bit and 16-bit PBMPLUS files
to grayscale or CMYK pixels, as it already does for 8-bit files.
- Properly accommodate lossless JPEG using dedicated parameters
(TJ*PARAM_LOSSLESS, TJ*PARAM_LOSSLESSPSV, and TJ*PARAM_LOSSLESSPT),
rather than using a flag and awkwardly repurposing the JPEG quality.
Update TJBench to properly reflect whether a JPEG image is lossless.
- Re-organize the TJBench usage screen.
- Update the Java docs using Java 11, to improve the formatting and
eliminate HTML frames.
- Use the accurate integer DCT algorithm by default for both
compression and decompression, since the "fast" algorithm is a legacy
feature, it does not pass the ISO compliance tests, and it is not
actually faster on modern x86 CPUs.
* Remove the -accuratedct option from TJBench and TJExample.
- Re-implement the 'tjtest' target using a CMake script that enables
the appropriate tests, depending on the data precision and whether or
not the Java API is part of the build.
- Consolidate the C and Java versions of tjbenchtest into one script.
- Consolidate the C and Java versions of tjexampletest into one script.
- Combine all initialization functions into a single function
(tj3Init()) that accepts an integer parameter specifying the
subsystems to initialize.
- Enable decompression scaling explicitly, using a new function/method
(tj3SetScalingFactor()/TJDecompressor.setScalingFactor()), rather
than implicitly using awkward "desired width"/"desired height"
parameters.
- Introduce a new macro/constant (TJUNSCALED/TJ.UNSCALED) that maps to
a scaling factor of 1/1.
- Implement partial image decompression, using a new function/method
(tj3SetCroppingRegion()/TJDecompressor.setCroppingRegion()) and
TJBench option (-crop). Extend tjbenchtest to test the new feature.
Addresses #1.
- Allow the JPEG colorspace to be specified explicitly when
compressing, using a new parameter (TJ*PARAM_COLORSPACE). This
allows JPEG images with the RGB and CMYK colorspaces to be created.
- Remove the error/difference image feature from TJBench. Identical
images to the ones that TJBench created can be generated using
ImageMagick with
'magick composite <original_image> <output_image> -compose difference <diff_image>'
- Handle JPEG images with unknown subsampling types. TJ*PARAM_SUBSAMP
is set to TJ*SAMP_UNKNOWN (== -1) for such images, but they can still
be decompressed fully into packed-pixel images or losslessly
transformed (with the exception of lossless cropping.) They cannot
be partially decompressed or decompressed into planar YUV images.
Note also that TJBench, due to its lack of support for imperfect
transforms, requires that the subsampling type be known when
rotating, flipping, or transversely transposing an image. Addresses
#436
- The Java version of TJBench now has identical functionality to the C
version. This was accomplished by (somewhat hackishly) calling the
TurboJPEG C image I/O functions through JNI and copying the pixels
between the C heap and the Java heap.
- Add parameters (TJ*PARAM_RESTARTROWS and TJ*PARAM_RESTARTBLOCKS) and
a TJBench option (-restart) to allow the restart marker interval to
be specified when compressing. Eliminate the undocumented TJ_RESTART
environment variable.
- Add a parameter (TJ*PARAM_OPTIMIZE), a transform option
(TJ*OPT_OPTIMIZE), and a TJBench option (-optimize) to allow
optimized baseline Huffman coding to be specified when compressing.
Eliminate the undocumented TJ_OPTIMIZE environment variable.
- Add parameters (TJ*PARAM_XDENSITY, TJ*PARAM_DENSITY, and
TJ*DENSITYUNITS) to allow the pixel density to be specified when
compressing or saving a Windows BMP image and to be queried when
decompressing or loading a Windows BMP image. Addresses #77.
- Refactor the fuzz targets to use the new API.
* Extend decompression coverage to 12-bit and 16-bit data precision.
* Replace the awkward cjpeg12 and cjpeg16 targets with proper
TurboJPEG-based compress12, compress12-lossless, and
compress16-lossless targets
- Fix innocuous UBSan warnings uncovered by the new fuzzers.
- Implement previous versions of the TurboJPEG API by wrapping the new
functions (tested by running the 2.1.x versions of TJBench, via
tjbenchtest, and TJUnitTest against the new implementation.)
* Remove all JNI functions for deprecated Java methods and implement
the deprecated methods using pure Java wrappers. It should be
understood that backward API compatibility in Java applies only to
the Java classes and that one cannot mix and match a JAR file from
one version of libjpeg-turbo with a JNI library from another
version.
- tj3Destroy() now silently accepts a NULL handle.
- tj3Alloc() and tj3Free() now return/accept void pointers, as malloc()
and free() do.
- The image I/O functions now accept a TurboJPEG instance handle, which
is used to transmit/receive parameters and to receive error
information.
Closes#517
Because Java array sizes are ints, the various size methods in the TJ
class have int return values. Thus, we have to guard against signed
int overflow at the JNI level, because the C functions can return sizes
greater than INT_MAX.
This also adds a test for TJ.planeWidth() and TJ.planeHeight(), in order
to validate 8a1526a442 in Java.
tjPlaneWidth() and tjPlaneHeight() could overflow a signed int and
return a negative value if passed a width/height argument of INT_MAX and
a subsampling type for which the MCU block size is larger than 8x8.
The documented behavior of the -progressive option is to use progressive
entropy coding in JPEG images generated by compression and transform
operations. However, setting TJFLAG_PROGRESSIVE was insufficient to
accomplish that, because TJBench doesn't enable lossless transformation
if xformOpt == 0.
- TJBench/TJUnitTest: Wordsmith command-line output
- Java: "decompress operations"="decompression operations"
- tjLoadImage(): Error message tweak
- Don't mention compression performance in the description of
TJXOPT_PROGRESSIVE/TJTransform.OPT_PROGRESSIVE, because the image has
already been compressed at that point.
(Oversights from 9a146f0f23)
This is similar to the fix that 2a9e3bd743
applied to the C API. We have to apply it separately at the JNI level
because the Java API always stores buffer sizes in 32-bit integers, and
the C buffer size functions could overflow an int when using 64-bit
code. (NOTE: The Java API stores buffer sizes in 32-bit integers
because Java itself always uses 32-bit integers for array sizes.) Since
Java don't allow no buffer overruns 'round here, this commit doesn't
change the ultimate outcome. It just makes the inevitable exception
easier to diagnose.
The documented behavior of the function is to use decompression scaling
to generate the largest possible image that will fit within the desired
image dimensions. Thus, if the desired image dimensions are larger than
the scaled image dimensions, then tjDecompressToYUV2() should use the
scaled image dimensions when computing the plane pointers and strides to
pass to tjDecompressToYUVPlanes().
Note that this bug was not previously detected, because tjunittest and
tjbench always passed the scaled image dimensions to
tjDecompressToYUV2().
- Wordsmithing, formatting, and grammar tweaks
- Various clarifications and corrections, including specifying whether
a particular buffer or image is used as a source or destination
- Accommodate/mention features that were introduced since the API
documentation was created.
- For clarity, use "packed-pixel" to describe uncompressed
source/destination images that are not planar YUV.
- Use "row" rather than "line" to refer to a single horizontal group of
pixels or component values, for consistency with the libjpeg API
documentation. (libjpeg also uses "scanline", which is a more archaic
term.)
- Use "alignment" rather than "padding" to refer to the number of bytes
by which a row's width is evenly divisible. This consistifies the
documention of the YUV functions and tjLoadImage(). ("Padding"
typically refers to the number of bytes added to each row, which is
not the same thing.)
- Remove all references to "the underlying codec." Although the
TurboJPEG API originated as a cross-platform wrapper for the Intel
Integrated Performance Primitives, Sun mediaLib, QuickTime, and
libjpeg, none of those TurboJPEG implementations has been maintained
since 2009. Nothing would prevent someone from implementing the
TurboJPEG API without libjpeg-turbo, but such an implementation would
not necessarily have an "underlying codec." (It could be fully
self-contained.)
- Use "destination image" rather than "output image", for consistency,
or describe the type of image that will be output.
- Avoid the term "image buffer" and instead use "byte buffer" to
refer to buffers that will hold JPEG images, or describe the type of
image that will be contained in the buffer. (The Java documentation
doesn't use "byte buffer", because the buffer arrays literally have
"byte" in front of them, and since Java doesn't have pointers, it is
not possible for mere mortals to store any other type of data in those
arrays.)
- C: Use "unified" to describe YUV images stored in a single buffer, for
consistency with the Java documentation.
- Use "planar YUV" rather than "YUV planar". Is is our convention to
describe images using {component layout} {colorspace/pixel format}
{image function}, e.g. "packed-pixel RGB source image" or "planar YUV
destination image."
- C: Document the TurboJPEG API version in which a particular function
or macro was introduced, and reorder the backward compatibility
function stubs in turbojpeg.h alphabetically by API version.
- C: Use Markdown rather than HTML tags, where possible, in the Doxygen
comments.
Macros from older versions of the TurboJPEG API are supported but not
documented, so using the current version of those macros makes the code
more readable.
Because the PAD() macro can only handle powers of 2, this is a necessary
restriction (and a documented one, except in the case of
tjCompressFromYUV()-- oops.) Failing to check the 'pad' argument
caused tjBufSizeYUV2() to return bogus results if 'pad' was less than 1
or otherwise not a power of 2. tjEncodeYUV3() and tjDecodeYUV()
effectively treated a 'pad' value of 0 as unpadded, but that was subtle
and undocumented behavior. tjCompressFromYUV() did not check whether
'pad' was a power of 2, so the strides passed to
tjCompressFromYUVPlanes() would have been incorrect if 'pad' was not a
power of 2. That would not have caused tjCompressFromYUV() to overrun
the source buffer, as long as the calling application allocated the
buffer based on the return value of tjBufSizeYUV2() (which computes the
strides in the same manner as tjCompressFromYUV().) However, if the
calling application attempted to initialize the source buffer using
correctly-computed strides, then it could have overrun its own
buffer in certain cases or produced incorrect JPEG images in others.
Realistically, there is no reason why an application would want to pass
a non-power-of-2 'pad' value to a TurboJPEG API function, so this commit
is about user-proofing the API rather than fixing any known issue.
(oversight from 2241434eb9)
This fixes segfaults and other issues that occurred when attempting to
decompress a 16-bit lossless JPEG image using color quantization
(which is used when decompressing to a GIF image.)
requant_comp() in transupp.c, a function that supports the jpegtran
-drop option, borrows code from the C quantization function in order to
re-quantize the coefficients from the dropped image. However, the
function does not guard against the possibility that a corrupt source
image could inject quantization table values equal to 0, thus causing a
divide-by-zero error. Since this error affected only jpegtran and not
any of the libraries (the tjTransform() function in the TurboJPEG API
does not expose the image drop feature), it did not represent a security
risk. In fact, this commit does not change the output of jpegtran when
attempting to transform the aforementioned corrupt source image. It
merely eliminates the floating point exception. Like most issues of
this type, however, eliminating the error prevents it from hiding
legitimate security issues that may later be introduced.
Fixes#635Fixes#636
Unfortunately, iOS builds cannot be used with the iOS simulator on Macs
with Apple silicon CPUs. Even more unfortunately, universal binaries
can only have one slice for each CPU architecture, so it would not be
possible to add a dedicated Arm64 iOS simulator slice to the existing
libjpeg-turbo iOS binaries. (It would be necessary to release a
separate package solely for the iOS simulator.) Because the Arm Neon
SIMD extensions for libjpeg-turbo now use compiler intrinsics when
building with Xcode, it is easy to build libjpeg-turbo from source when
targeting Arm64-based Apple platforms. Thus, for the moment, I have
chosen to document how to avoid the pothole rather than to fill it in.
cjpeg relies on the various file I/O modules to range-limit the input
samples, but no range limiting is performed by the
jpeg_write_scanlines() function itself. With 8-bit samples, that isn't
a problem, because sample values > MAXJSAMPLE will overflow the data
type and wrap around to 0. With 12-bit samples, however, it is possible
to pass sample values < 0 or > 4095 to jpeg_write_scanlines(), which
would cause the RGB-to-YCbCr color converter to underflow or overflow
the RGB-to-YCbCr conversion tables. That issue has existed in libjpeg
all along. This commit mitigates the issue by masking off all but the
lowest 12 bits of each 12-bit input sample prior to using the input
sample value to index the RGB-to-YCbCr conversion tables.
Fixes#633
Because of e8b40f3c2b, we now support
multiple data precisions in the same libjpeg API library, so
BITS_IN_JSAMPLE is no longer used to specify the data precision at build
time. However, some downstream software expects the macro to be
defined. Since 12-bit data precision is an opt-in feature that requires
explicitly calling 12-bit-specific libjpeg API functions and using
12-bit-specific data types, the unmodified portion of the libjpeg API
still behaves as if it were built for 8-bit precision, and JSAMPLE is
still literally an 8-bit data type. Thus, it is correct to externally
define BITS_IN_JSAMPLE to 8 in jconfig.h. Since the build system also
uses BITS_IN_JSAMPLE internally to build both 8-bit and 12-bit versions
of relevant modules, the definition of BITS_IN_JSAMPLE in jconfig.h is
now guarded by #ifndef BITS_IN_JSAMPLE. (Hopefully that doesn't cause
any problems.)
Fixes#632
That macro has been defined in jconfig.h since libjpeg-turbo 1.3.x, so
it is possible that some downstream software conditions the use of the
jpeg_mem_*() functions on whether MEM_SRCDST_SUPPORTED is defined or
JPEG_LIB_VERSION >= 80 (as libjpeg-turbo 2.1.x and prior did
internally.)
TJFLAG_LOSSLESS is irrelevant to planar YUV encoding, and setting the
flag caused tjEncode*() to fail with "Invalid lossless parameters"
because tjEncodeYUVPlanes() passes a JPEG quality value of -1 to
setCompDefaults(). This commit modifies setCompDefaults() so that it
takes no action related to the jpegQual parameter unless jpegQual >= 0.
Add a new TurboJPEG C API function (tjDecompressHeader4()) and Java API
method (TJDecompressor.getFlags()) that return the bitwise OR of any
flags that are relevant to the JPEG image being decompressed (currently
TJFLAG_PROGRESSIVE, TJFLAG_ARITHMETIC, TJFLAG_LOSSLESS, and their Java
equivalents.) This allows a calling program to determine whether the
image being decompressed is a lossless JPEG image, which means that the
decompression scaling feature will not be available and that a
full-sized destination buffer should be allocated.
More specifically, this fixes a buffer overrun in TJBench, TJExample,
and the decompress* fuzz targets that occurred when attempting (in vain)
to decompress a lossless JPEG image with decompression scaling enabled.
Lossless: Accommodate LJT colorspace/SIMD exts
In libjpeg-turbo, grayscale_convert() and null_convert() aren't the only
lossless color conversion algorithms. We can also losslessly convert
RGB to and from any of the extended RGB colorspaces, and some platforms
have SIMD-accelerated null color conversion.
This commit also disallows RGB565 output in lossless mode, and it moves
the IsExtRGB() macro from cdjpeg.h to jpegint.h and repurposes it to
make jinit_color_converter() and jinit_color_deconverter() more
readable.
- Rename jpeg_simple_lossless() to jpeg_enable_lossless() and modify the
function so that it stores the lossless parameters directly in the Ss
and Al fields of jpeg_compress_struct rather than using a scan script.
- Move the cjpeg -lossless switch into "Switches for advanced users".
- Document the libjpeg API and run-time features that are unavailable in
lossless mode, and ensure that all parameters, functions, and switches
related to unavailable features are ignored or generate errors in
lossless mode.
- Defer any action that depends on whether lossless mode is enabled
until jpeg_start_compress()/jpeg_start_decompress() is called.
- Document the purpose of the point transform value.
- "Codec" stands for coder/decoder, so it is a bit awkward to say
"lossless compression codec" and "lossless decompression codec".
Use "lossless compressor" and "lossless decompressor" instead.
- Restore backward API/ABI compatibility with libjpeg v6b:
* Move the new 'lossless' field from the exposed jpeg_compress_struct
and jpeg_decompress_struct structures into the opaque
jpeg_comp_master and jpeg_decomp_master structures, and allocate the
master structures in the body of jpeg_create_compress() and
jpeg_create_decompress().
* Remove the new 'process' field from jpeg_compress_struct and
jpeg_decompress_struct and replace it with the old
'progressive_mode' field and the new 'lossless' field.
* Remove the new 'data_unit' field from jpeg_compress_struct and
jpeg_decompress_struct and replace it with a locally-computed
data unit variable.
* Restore the names of macros and fields that refer to DCT blocks, and
document that they have a different meaning in lossless mode. (Most
of them aren't very meaningful in lossless mode anyhow.)
* Remove the new alloc_darray() method from jpeg_memory_mgr and
replace it with an internal macro that wraps the alloc_sarray()
method.
* Move the JDIFF* data types from jpeglib.h and jmorecfg.h into
jpegint.h.
* Remove the new 'codec' field from jpeg_compress_struct and
jpeg_decompress_struct and instead reuse the existing internal
coefficient control, forward/inverse DCT, and entropy
encoding/decoding structures for lossless compression/decompression.
* Repurpose existing error codes rather than introducing new ones.
(The new JERR_BAD_RESTART and JWRN_MUST_DOWNSCALE codes remain,
although JWRN_MUST_DOWNSCALE will probably be removed in
libjpeg-turbo, since we have a different way of handling multiple
data precisions.)
- Automatically enable lossless mode when a scan script with parameters
that are only valid for lossless mode is detected, and document the
use of scan scripts to generate lossless JPEG images.
- Move the sequential and shared Huffman routines back into jchuff.c and
jdhuff.c, and document that those routines are shared with jclhuff.c
and jdlhuff.c as well as with jcphuff.c and jdphuff.c.
- Move MAX_DIFF_BITS from jchuff.h into jclhuff.c, the only place where
it is used.
- Move the predictor and scaler code into jclossls.c and jdlossls.c.
- Streamline register usage in the [un]differencers (inspired by similar
optimizations in the color [de]converters.)
- Restructure the logic in a few places to reduce duplicated code.
- Ensure that all lossless-specific code is guarded by
C_LOSSLESS_SUPPORTED or D_LOSSLESS_SUPPORTED and that the library can
be built successfully if either or both of those macros is undefined.
- Remove all short forms of external names introduced by the lossless
JPEG patch. (These will not be needed by libjpeg-turbo, so there is
no use cleaning them up.)
- Various wordsmithing, formatting, and punctuation tweaks
- Eliminate various compiler warnings.
Fix segfault when decomp lossless JPEG w/ restarts
The predict_process_restart() method in jpeg_lossless_decompressor was
unset, because we now use the start_pass() method in jpeg_inverse_dct
instead. Thus, a segfault occurred when attempting to decompress a
lossless JPEG that contained restart markers.
- Rename jpeg_simple_lossless() to jpeg_enable_lossless() and modify the
function so that it stores the lossless parameters directly in the Ss
and Al fields of jpeg_compress_struct rather than using a scan script.
- Move the cjpeg -lossless switch into "Switches for advanced users".
- Document the libjpeg API and run-time features that are unavailable in
lossless mode, and ensure that all parameters, functions, and switches
related to unavailable features are ignored or generate errors in
lossless mode.
- Defer any action that depends on whether lossless mode is enabled
until jpeg_start_compress()/jpeg_start_decompress() is called.
- Document the purpose of the point transform value.
- "Codec" stands for coder/decoder, so it is a bit awkward to say
"lossless compression codec" and "lossless decompression codec".
Use "lossless compressor" and "lossless decompressor" instead.
- Restore backward API/ABI compatibility with libjpeg v6b:
* Move the new 'lossless' field from the exposed jpeg_compress_struct
and jpeg_decompress_struct structures into the opaque
jpeg_comp_master and jpeg_decomp_master structures, and allocate the
master structures in the body of jpeg_create_compress() and
jpeg_create_decompress().
* Remove the new 'process' field from jpeg_compress_struct and
jpeg_decompress_struct and replace it with the old
'progressive_mode' field and the new 'lossless' field.
* Remove the new 'data_unit' field from jpeg_compress_struct and
jpeg_decompress_struct and replace it with a locally-computed
data unit variable.
* Restore the names of macros and fields that refer to DCT blocks, and
document that they have a different meaning in lossless mode. (Most
of them aren't very meaningful in lossless mode anyhow.)
* Remove the new alloc_darray() method from jpeg_memory_mgr and
replace it with an internal macro that wraps the alloc_sarray()
method.
* Move the JDIFF* data types from jpeglib.h and jmorecfg.h into
jpegint.h.
* Remove the new 'codec' field from jpeg_compress_struct and
jpeg_decompress_struct and instead reuse the existing internal
coefficient control, forward/inverse DCT, and entropy
encoding/decoding structures for lossless compression/decompression.
* Repurpose existing error codes rather than introducing new ones.
(The new JERR_BAD_RESTART and JWRN_MUST_DOWNSCALE codes remain,
although JWRN_MUST_DOWNSCALE will probably be removed in
libjpeg-turbo, since we have a different way of handling multiple
data precisions.)
- Automatically enable lossless mode when a scan script with parameters
that are only valid for lossless mode is detected, and document the
use of scan scripts to generate lossless JPEG images.
- Move the sequential and shared Huffman routines back into jchuff.c and
jdhuff.c, and document that those routines are shared with jclhuff.c
and jdlhuff.c as well as with jcphuff.c and jdphuff.c.
- Move MAX_DIFF_BITS from jchuff.h into jclhuff.c, the only place where
it is used.
- Move the predictor and scaler code into jclossls.c and jdlossls.c.
- Streamline register usage in the [un]differencers (inspired by similar
optimizations in the color [de]converters.)
- Restructure the logic in a few places to reduce duplicated code.
- Ensure that all lossless-specific code is guarded by
C_LOSSLESS_SUPPORTED or D_LOSSLESS_SUPPORTED and that the library can
be built successfully if either or both of those macros is undefined.
- Remove all short forms of external names introduced by the lossless
JPEG patch. (These will not be needed by libjpeg-turbo, so there is
no use cleaning them up.)
- Various wordsmithing, formatting, and punctuation tweaks
- Eliminate various compiler warnings.
llvm-rc doesn't like the copyright symbol in the LegalCopyright string.
Possible solutions:
1. Replace the copyright symbol with "(C)".
2. Prefix the string literal with L, thus making it a wide character
string.
3. Pass -c65001 to the resource compiler to enable the UTF-8 code page.
Option 2 is the least disruptive, since it doesn't change the behavior
of the official builds or require any build system modifications.
Closes#627
Regression introduced by 16bd984557 and
5b177b3cab
The pre-computed absolute values used in encode_mcu_AC_first() and
encode_mcu_AC_refine() were stored in a JCOEF (signed short) array.
When attempting to losslessly transform a specially-crafted malformed
12-bit JPEG image with a coefficient value of -32768 into a progressive
12-bit JPEG image, the progressive Huffman encoder attempted to store
the absolute value of -32768 in the JCOEF array, thus overflowing the
16-bit signed data type. Therefore, at this point in the code:
8c5e78ce29/jcphuff.c (L889)
the absolute value was read as -32768, which caused the test at
8c5e78ce29/jcphuff.c (L896)
to fail, falling through to
8c5e78ce29/jcphuff.c (L908)
with an overly large value of r (46) that, when shifted left four
places, incremented, and passed to emit_symbol(), exceeded the maximum
index (255) for the derived code tables. Fortunately, the buffer
overrun was fully contained within phuff_entropy_encoder, so the issue
did not generate a segfault or other user-visible errant behavior, but
it did cause a UBSan failure that was detected by OSS-Fuzz.
This commit introduces an unsigned JCOEF (UJCOEF) data type and uses it
to store the absolute values of DCT coefficients computed by the
AC_first_prepare() and AC_refine_prepare() methods.
Note that the changes to the Arm Neon progressive Huffman encoder
extensions cause signed 16-bit instructions to be replaced with
equivalent unsigned 16-bit instructions, so the changes should be
performance-neutral.
Based on:
bbf61c0382Closes#628
- Because of b5a9ef64ea, "by default" is
no longer applicable. (12-bit-per-component JPEG support is now part
of the core libjpeg-turbo functionality and cannot be disabled.)
- Change awkward "can be used to enable the creation of" to less awkward
"can be used to create".
- Rename jpeg_simple_lossless() to jpeg_enable_lossless() and modify the
function so that it stores the lossless parameters directly in the Ss
and Al fields of jpeg_compress_struct rather than using a scan script.
- Move the cjpeg -lossless switch into "Switches for advanced users".
- Document the libjpeg API and run-time features that are unavailable in
lossless mode, and ensure that all parameters, functions, and switches
related to unavailable features are ignored or generate errors in
lossless mode.
- Defer any action that depends on whether lossless mode is enabled
until jpeg_start_compress()/jpeg_start_decompress() is called.
- Document the purpose of the point transform value.
- "Codec" stands for coder/decoder, so it is a bit awkward to say
"lossless compression codec" and "lossless decompression codec".
Use "lossless compressor" and "lossless decompressor" instead.
- Restore backward API/ABI compatibility with libjpeg v6b:
* Move the new 'lossless' field from the exposed jpeg_compress_struct
and jpeg_decompress_struct structures into the opaque
jpeg_comp_master and jpeg_decomp_master structures, and allocate the
master structures in the body of jpeg_create_compress() and
jpeg_create_decompress().
* Remove the new 'process' field from jpeg_compress_struct and
jpeg_decompress_struct and replace it with the old
'progressive_mode' field and the new 'lossless' field.
* Remove the new 'data_unit' field from jpeg_compress_struct and
jpeg_decompress_struct and replace it with a locally-computed
data unit variable.
* Restore the names of macros and fields that refer to DCT blocks, and
document that they have a different meaning in lossless mode. (Most
of them aren't very meaningful in lossless mode anyhow.)
* Remove the new alloc_darray() method from jpeg_memory_mgr and
replace it with an internal macro that wraps the alloc_sarray()
method.
* Move the JDIFF* data types from jpeglib.h and jmorecfg.h into
jpegint.h.
* Remove the new 'codec' field from jpeg_compress_struct and
jpeg_decompress_struct and instead reuse the existing internal
coefficient control, forward/inverse DCT, and entropy
encoding/decoding structures for lossless compression/decompression.
* Repurpose existing error codes rather than introducing new ones.
(The new JERR_BAD_RESTART and JWRN_MUST_DOWNSCALE codes remain,
although JWRN_MUST_DOWNSCALE will probably be removed in
libjpeg-turbo, since we have a different way of handling multiple
data precisions.)
- Automatically enable lossless mode when a scan script with parameters
that are only valid for lossless mode is detected, and document the
use of scan scripts to generate lossless JPEG images.
- Move the sequential and shared Huffman routines back into jchuff.c and
jdhuff.c, and document that those routines are shared with jclhuff.c
and jdlhuff.c as well as with jcphuff.c and jdphuff.c.
- Move MAX_DIFF_BITS from jchuff.h into jclhuff.c, the only place where
it is used.
- Move the predictor and scaler code into jclossls.c and jdlossls.c.
- Streamline register usage in the [un]differencers (inspired by similar
optimizations in the color [de]converters.)
- Restructure the logic in a few places to reduce duplicated code.
- Ensure that all lossless-specific code is guarded by
C_LOSSLESS_SUPPORTED or D_LOSSLESS_SUPPORTED and that the library can
be built successfully if either or both of those macros is undefined.
- Remove all short forms of external names introduced by the lossless
JPEG patch. (These will not be needed by libjpeg-turbo, so there is
no use cleaning them up.)
- Various wordsmithing, formatting, and punctuation tweaks
- Eliminate various compiler warnings.
In libjpeg-turbo 2.1.x and prior, the WITH_12BIT CMake variable was used
to enable 12-bit JPEG support at compile time, because the libjpeg API
library could not handle multiple JPEG data precisions at run time. The
initial approach to handling multiple JPEG data precisions at run time
(7fec5074f9) created a whole new API,
library, and applications for 12-bit data precision, so it made sense to
repurpose WITH_12BIT to allow 12-bit data precision to be disabled.
e8b40f3c2b made it so that the libjpeg API
library can handle multiple JPEG data precisions at run time via a
handful of straightforward API extensions. Referring to
6c2bc901e2, it hasn't been possible to
build libjpeg-turbo with both forward and backward libjpeg API/ABI
compatibility since libjpeg-turbo 1.4.x. Thus, whereas we retain full
backward API/ABI compatibility with libjpeg v6b-v8, forward libjpeg
API/ABI compatibility ceased being realistic years ago, so it no longer
makes sense to provide compile-time options that give a false sense of
forward API/ABI compatibility by allowing some (but not all) of our
libjpeg API extensions to be disabled. Such options are difficult to
maintain and clutter the code with #ifdefs.
cjpeg doesn't accept image format arguments other than -targa (it
auto-detects the others), and since Targa images aren't supported with
12-bit precision, we don't need a second pass.
The Gordian knot that 7fec5074f9 attempted
to unravel was caused by the fact that there are several
data-precision-dependent (JSAMPLE-dependent) fields and methods in the
exposed libjpeg API structures, and if you change the exposed libjpeg
API structures, then you have to change the whole API. If you change
the whole API, then you have to provide a whole new library to support
the new API, and that makes it difficult to support multiple data
precisions in the same application. (It is not impossible, as example.c
demonstrated, but using data-precision-dependent libjpeg API structures
would have made the cjpeg, djpeg, and jpegtran source code hard to read,
so it made more sense to build, install, and package 12-bit-specific
versions of those applications.)
Unfortunately, the result of that initial integration effort was an
unreadable and unmaintainable mess, which is a problem for a library
that is an ISO/ITU-T reference implementation. Also, as I dug into the
problem of lossless JPEG support, I realized that 16-bit lossless JPEG
images are a thing, and supporting yet another version of the libjpeg
API just for those images is untenable.
In fact, however, the touch points for JSAMPLE in the exposed libjpeg
API structures are minimal:
- The colormap and sample_range_limit fields in jpeg_decompress_struct
- The alloc_sarray() and access_virt_sarray() methods in
jpeg_memory_mgr
- jpeg_write_scanlines() and jpeg_write_raw_data()
- jpeg_read_scanlines() and jpeg_read_raw_data()
- jpeg_skip_scanlines() and jpeg_crop_scanline()
(This is subtle, but both of those functions use JSAMPLE-dependent
opaque structures behind the scenes.)
It is much more readable and maintainable to provide 12-bit-specific
versions of those six top-level API functions and to document that the
aforementioned methods and fields must be type-cast when using 12-bit
samples. Since that eliminates the need to provide a 12-bit-specific
version of the exposed libjpeg API structures, we can:
- Compile only the precision-dependent libjpeg modules (the
coefficient buffer controllers, the colorspace converters, the
DCT/IDCT managers, the main buffer controllers, the preprocessing
and postprocessing controller, the downsampler and upsamplers, the
quantizers, the integer DCT methods, and the IDCT methods) for
multiple data precisions.
- Introduce 12-bit-specific methods into the various internal
structures defined in jpegint.h.
- Create precision-independent data type, macro, method, field, and
function names that are prefixed by an underscore, and use an
internal header to convert those into precision-dependent data
type, macro, method, field, and function names, based on the value
of BITS_IN_JSAMPLE, when compiling the precision-dependent libjpeg
modules.
- Expose precision-dependent jinit*() functions for each of the
precision-dependent libjpeg modules.
- Abstract the precision-dependent libjpeg modules by calling the
appropriate precision-dependent jinit*() function, based on the
value of cinfo->data_precision, from top-level libjpeg API
functions.
By default, libjpeg-turbo 1.3.x and later have enabled the in-memory
source/destination manager functions from libjpeg v8 when emulating the
libjpeg v6b or v7 API/ABI, which has allowed operating system
distributors to provide those functions without adopting the
backward-incompatible libjpeg v8 API/ABI.
Prior to libjpeg-turbo 1.5.x, it made sense to allow users to disable
the in-memory source/destination manager functions at build time and
thus retain both backward and forward API/ABI compatibility relative to
libjpeg v6b or v7. Since then, however, we have introduced several new
libjpeg API functions that break forward API/ABI compatibility, so it no
longer makes sense to allow the in-memory source/destination managers to
be disabled. libjpeg-turbo only claims to be
backward-API/ABI-compatible, i.e. to allow applications built against
libjpeg or an older version of libjpeg-turbo to work properly with the
current version of libjpeg-turbo.
- Fix an issue whereby a build with ENABLE_SHARED=0 could not be
installed when using the Ninja Multi-Config CMake generator.
- Fix an issue whereby a Windows installer could not be built when using
the Ninja Multi-Config CMake generator.
- Fix an issue whereby the Java regression tests failed when using the
Ninja Multi-Config CMake generator.
Based on:
4f169deeb0Closes#626
... on platforms that support TLS, which should include all
currently-supported platforms
(https://libjpeg-turbo.org/Documentation/OfficialBinaries)
Addresses a concern raised in #87
Although it is still my opinion that the data race in init_simd() was
innocuous, we can now fix it for free thanks to
ae87a95861, so why not?
This commit reverts 4dbc293125 and
9f8f683e74 (the previous two commits) and
fixes#613 the correct way. The crux of the issue wasn't the size of
the whole_image virtual array but rather that, since last_iMCU_row is
unsigned, (last_iMCU_row - 1) wrapped around to 0xFFFFFFFF when
last_iMCU_row was 0. This caused the interblock smoothing algorithm
introduced in 6d91e950c8 to erroneously
try to access the next two iMCU rows, neither of which existed. The
first attempt at a fix (4dbc293125)
exposed a NULL dereference, detected by OSS-Fuzz, that occurred when
attempting to decompress a specially-crafted malformed JPEG image to a
YUV buffer using tjDecompressToYUV*() with 1/4 IDCT scaling.
Fixes#613 (again)
Also fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49898
Regression introduced by 6d91e950c8
Because we're now using a 5x5 smoothing window when decompressing
progressive JPEG images, we need to ensure that the whole_image virtual
array contains at least five rows. Previously that was not always the
case unless the progressive JPEG image being decompressed had at least
five iMCU rows. Since an iMCU has a height of (8 * the vertical
sampling factor), attempting to decompress 4:2:2 and 4:4:4 images <= 32
pixels in height or 4:2:0 images <= 64 pixels in height triggered a
JERR_BAD_VIRTUAL_ACCESS error in decompress_smooth_data(), because
access_rows exceeded the number of rows in the virtual array.
Fixes#613
libjpeg-turbo's AltiVec SIMD extensions previously assumed that AltiVec
instructions were available on all Power Macs that supported OS X 10.4
"Tiger" (the earliest version of OS X that libjpeg-turbo has ever
supported), but Tiger can actually run on PowerPC G3 processors, which
lack AltiVec instructions. This commit enables run-time detection of
AltiVec instructions on OS X/PowerPC systems if AltiVec instructions are
not force-enabled at compile time (using -maltivec). This allows the
same build of libjpeg-turbo to support G3, G4, and G5 Power Macs.
Closes#609
The macros in jerror.h refer to j_common_ptr, so it is unfortunately
necessary to introduce a 12-bit-specific version of that header file
(j12error.h) with 12-bit specific ERREXIT*(), WARNMS*(), and
TRACEMS*() macros. (The message table is still shared between 8-bit and
12-bit implementations.)
Fixes#607
Although it is uncommon, some downstream implementations undefine one or
more of the *_SUPPORTED macros in jmorecfg.h in order to reduce the size
of the library. In the interest of maintaining backward compatibility
with libjpeg, this is still a supported use case.
Regression introduced by 9120a24743
Based on:
74c4d032f0Closes#601
MinGW defines strcasecmp() and strncasecmp() as macros in string.h if
__CRT__NO_INLINE is defined, which will be the case when including any
of the Win32 API headers.
Closes#594
Referring to https://github.com/google/oss-fuzz/issues/7575, if the
fuzzer suffix contains periods, it can cause ClusterFuzz to misinterpret
the file extension of the fuzzer executables and thus misidentify them.
When building without the SIMD extensions, memory allocations are
currently aligned to sizeof(double). However, this may be insufficient
on architectures such as Arm Morello or 64-bit CHERI-RISC-V where
pointers require 16-byte rather than 8-byte alignment. This patch
causes memory allocations to be aligned to
MAX(sizeof(void *), sizeof(double)) when building without the SIMD
extensions.
(NOTE: With C11 we could instead use alignof(max_align_t), but C89
compatibility is still necessary in libjpeg-turbo.)
Closes#587
Referring to the conversation in
https://github.com/google/oss-fuzz/issues/7479 and #559, there was a
misunderstanding regarding how CIFuzz works. It cannot be used to fuzz
arbitrary PRs or code branches, and it has a 90-day delay in downloading
corpora from OSS-Fuzz. That makes it unsuitable for libjpeg-turbo.
With x86-64 builds, the default value of FLOATTEST works with both the
8-bit-per-sample and 12-bit-per-sample flavors of the libjpeg API
library. However, that is not the case with x86 builds. Thus, we need
separate 8-bit-per-sample and 12-bit-per-sample FLOATTEST variables.
Newer versions of the 32-bit x86 Visual Studio compiler produce results
compatible with FLOATTEST=no-fp-contract, so we can no longer
intelligently set a default FLOATTEST value for that platform.
When 12-bit-per-component JPEG support is enabled (WITH_12BIT=1) or the
TurboJPEG API library and associated test programs are disabled
(WITH_TURBOJPEG=0), the Windows installer target should not depend on
the turbojpeg, turbojpeg-static, and tjbench targets.
(broken by 607b668ff9)
- Visual Studio 2010 apparently doesn't have the snprintf() inline
function, so restore the macro that emulates that function using
_snprintf_s().
- Explicitly include errno.h in strtest.c, since jinclude.h doesn't
include it when building with Visual Studio.
- Remove the section in libjpeg.txt that advised against building
libjpeg as a shared library. We obviously do not follow that advice,
and libjpeg-turbo does guarantee backward ABI compatibility in our
libjpeg API library, even though libjpeg did not and does not.
(Future expansion of our libjpeg API library, if necessary, will be
accomplished using get/set functions that store the new parameters
in the opaque master structs. Refer to
db2986c96f.)
- Unmention install.txt, which was never relevant to libjpeg-turbo and
was removed in v1.3 (6f96153c67).
- Remove extraneous spaces.
- Document the fact that TWO_FILE_COMMANDLINE must be defined in order
to use the two-file interface with cjpeg, djpeg, jpegtran, and
wrjpgcom. libjpeg-turbo never enables that interface by default.
People keep trying to include libjpeg-turbo into downstream CMake-based
build systems by way of the add_subdirectory() function and requesting
upstream support when something inevitably breaks.
(Refer to: #122, #173, #176, #202, #241, #349, #353, #412, #504,
a3d4aadd0d (commitcomment-67575889)).
libjpeg-turbo has never supported that method of sub-project
integration, because doing so would require that we (minimally):
1. avoid using certain CMake variables, such as CMAKE_SOURCE_DIR,
CMAKE_BINARY_DIR, and CMAKE_PROJECT_NAME;
2. avoid using implicit include directories and relative paths;
3. provide a way to optionally skip the installation of libjpeg-turbo
components in response to 'make install';
4. provide a way to optionally postfix target names, to avoid namespace
conflicts;
5. restructure the top-level CMakeLists.txt so that it properly sets
the PROJECT_VERSION variable; and
6. design automated CI tests to ensure that new commits don't break
any of the above.
Even if we did all of that, issues would still arise, because it is
impossible for one upstream build system to anticipate the widely
varying needs of every downstream build system. That's why the CMake
ExternalProject_Add() function exists, and it is my sincere hope that
adding a blurb to BUILDING.md mentioning the need to use that function
will head off future GitHub issues on this topic. If not, then I can at
least post a link to this commit and the blurb and avoid doing the same
song and dance over and over again.
The loop in jsimd_quantize_neon() is only executed twice and should be
unrolled for AArch64 targets. GCC does that by default, but Clang 11
and later versions available at the time of this writing do not. This
patch adds an unroll pragma when targetting AArch64 with Clang. We do
not use the unroll pragma for AArch32 targets, because it causes the
Clang-generated assembly code to exhaust the available Neon registers
(32 x 64-bit) and spill to the stack. (DRC: Referring to the discussion
in #570, this is likely due to compiler confusion that results in poor
register allocation. It is possible to eliminate the spillage and
reduce the instruction count by loading the data on a just-in-time
basis, thus explicitly interleaving compute and I/O, but the performance
implications of that are currently unknown.)
The effects of unrolling the quantization loop are:
1) elimination of the loop control flow overhead and
2) enabling the use of LDP/STP instructions that work from a single
base pointer, instead of using double the number of LDR/STR
instructions, each requiring an address calculation.
Closes#570
- Suppress a UBSan warning regarding storing a 64-bit value to a
non-64-bit-aligned address. That behavior is technically undefined
per the C spec but is supported in the context of the AArch64
architecture and compilers.
- Explicitly promote block_diff[i] to unsigned int prior to left
shifting it, in order to avoid a UBSan warning. This warning also
described behavior that is technically undefined per the C spec but is
supported in the context of the AArch64 architecture and compilers.
Changing the type cast order eliminated the warning without changing
the generated assembly code.
Closes#582
- Make better use of 128-bit vector registers, thus reducing the number
of Neon instructions required to construct the AC coefficient bitmap.
- Refactor the Neon computations of 'nbits' and 'diff' to use shorter
and higher-throughput instruction sequences.
DRC's notes:
This commit partially integrates #570. Arm reported a 1-4% speedup on
Cortex-A55 and Neoverse-N1 cores when using recent compilers but little
or no speedup with Clang 10. I observed no speedup with Clang 10 on my
Cortex-A53 and Cortex-A72 cores. Thus, referring to #582, the primary
purpose of this commit is to fix UBSan warnings regarding the shift
operations previously located at Line 253:
d640a45730/simd/arm/aarch64/jchuff-neon.c (L253)
The primary purpose of this is to encourage adoption of libjpeg-turbo in
downstream Windows projects that forbid the use of "deprecated"
functions. libjpeg-turbo's usage of those functions was not actually
unsafe, because:
- libjpeg-turbo always checks the return value of fopen() and ensures
that a NULL filename can never be passed to it.
- libjpeg-turbo always checks the return value of getenv() and never
passes a NULL argument to it.
- The sprintf() calls in format_message() (jerror.c) could never
overflow the destination string buffer or leave it unterminated as
long as the buffer was at least JMSG_LENGTH_MAX bytes in length, as
instructed. (Regardless, this commit replaces those calls with
snprintf() calls.)
- libjpeg-turbo never uses sscanf() to read strings or multi-byte
character arrays.
- Because of b7d6e84d6a, wrjpgcom
explicitly checks the bounds of the source and destination strings
before calling strcat() and strcpy().
- libjpeg-turbo always ensures that the destination string is
terminated when using strncpy().
(548490fe5e made this explicit.)
Regarding thread safety:
Technically speaking, getenv() is not thread-safe, because the returned
pointer may be invalidated if another thread sets the same environment
variable between the time that the first thread calls getenv() and the
time that that thread uses the return value. In practice, however, this
could only occur with libjpeg-turbo if:
(1) A multithreaded calling application used the deprecated and
undocumented TJFLAG_FORCEMMX/TJFLAG_FORCESSE/TJFLAG_FORCESSE2 flags in
the TurboJPEG API or set one of the corresponding environment variables
(which are only intended for testing purposes.) Since the TurboJPEG API
library only ever passed string constants to putenv(), the only inherent
risk (i.e. the only risk introduced by the library and not the calling
application) was that the SIMD extensions may have read an incorrect
value from one of the aforementioned environment variables.
or
(2) A multithreaded calling application modified the value of the
JPEGMEM environment variable in one thread while another thread was
reading the value of that environment variable (in the body of
jpeg_create_compress() or jpeg_create_decompress().) Given that the
libjpeg API provides a thread-safe way for applications to modify the
default memory limit without using the JPEGMEM environment variable,
direct modification of that environment variable by calling applications
is not supported.
Microsoft's implementation of getenv_s() does not claim to be
thread-safe either, so this commit uses getenv_s() solely to mollify
Visual Studio. New inline functions and macros (GETENV_S() and
PUTENV_S) wrap getenv_s()/_putenv_s() when building for Visual Studio
and getenv()/setenv() otherwise, but GETENV_S()/PUTENV_S() provide no
advantages over getenv()/setenv() other than parameter validation. They
are implemented solely for convenience.
Technically speaking, strerror() is not thread-safe, because the
returned pointer may be invalidated if another thread changes the locale
and/or calls strerror() between the time that the first thread calls
strerror() and the time that that thread uses the return value. In
practice, however, this could only occur with libjpeg-turbo if a
multithreaded calling application encountered a file I/O error in
tjLoadImage() or tjSaveImage(). Since both of those functions
immediately copy the string returned from strerror() into a thread-local
buffer, the risk is minimal, and the worst case would involve an
incorrect error string being reported to the calling application.
Regardless, this commit uses strerror_s() in the TurboJPEG API library
when building for Visual Studio. Note that strerror_r() could have been
used on Un*x systems, but it would have been necessary to handle both
the POSIX and GNU implementations of that function and perform
widespread compatibility testing. Such is left as an exercise for
another day.
Fixes#568
- Since the ERREXITS() and TRACEMSS() macros are never used internally
(they are a relic of the legacy memory managers that libjpeg
provided), the only risk was that an external program might have
invoked one of those macros with a string longer than 79 characters
(JMSG_STR_PARM_MAX - 1).
- TJBench never invokes the THROW_TJ() macro with a string longer than
199 (JMSG_LENGTH_MAX - 1) characters, so there was no risk. However,
it's a good idea to explicitly terminate the destination strings so
that anyone looking at the code can immediately tell that it is safe.
The h2v2 (4:2:0) merged upsampler uses a spare row buffer so that it can
upsample two rows at a time but return only one row to the application,
if necessary. merged_2v_upsample() copies from this spare row buffer
into the application-supplied output buffer, using the out_row_width
field in the my_merged_upsampler struct to determine how many samples to
copy. out_row_width is set in jinit_merged_upsampler(), which is called
within the body of jpeg_start_decompress(). Since jpeg_crop_scanline()
must be called after jpeg_start_decompress(), jpeg_crop_scanline() must
modify the value of out_row_width if the h2v2 merged upsampler will be
used. Otherwise, merged_2v_upsample() can overflow the output buffer if
the number of bytes between the current output buffer position and the
end of the buffer is less than the number of bytes required to represent
an uncropped scanline of the output image. All of the destination
managers used by djpeg allocate either a whole image buffer or a
scanline buffer based on the uncropped output image width, so this issue
is not reproducible using djpeg.
Fixes#574
libjpeg-turbo has never supported non-ANSI C compilers. Per the spec,
ANSI C compilers must have locale.h, stddef.h, stdlib.h, memset(),
memcpy(), unsigned char, and unsigned short. They must also handle
undefined structures.
If NEON_INTRINSICS=0, then run the GAS sanity check from libjpeg-turbo
2.0.x and force-enable NEON_INTRINSICS if the test fails. This fixes
the AArch32 build when using Clang 6.0 on Linux or the Clang toolchain
in the Android NDK r15*-r16*. It also prevents users from manually
disabling NEON_INTRINSICS if doing so would break the build (such as
with Xcode 5.)
- Use check_c_source_compiles() rather than check_symbol_exists() to
detect the presence of vld1_s16_x3(), vld1_u16_x2(), and
vld1q_u8_x4(). check_symbol_exists() is unreliable for detecting
intrinsics, and in practice, it did not detect the presence of the
aforementioned intrinsics in versions of GCC that support them.
- Set DEFAULT_NEON_INTRINSICS=0 for GCC < 12, even if the aforementioned
intrinsics are available. The AArch64 back end in GCC 10 and 11
supports the necessary intrinsics, but the GAS implementation is still
faster when using those compilers.
Fixes#547
- Use JERR_NOTIMPL ("Not implemented yet") rather than JERR_NOT_COMPILED
("Requested feature was omitted at compile time") to indicate that
arithmetic coding is incompatible with Huffman table optimization.
This is more consistent with other parts of the libjpeg API code.
JERR_NOT_COMPILED is typically used to indicate that a major feature
was not compiled in, whereas JERR_NOTIMPL is typically used to
indicate that two features were compiled in but are incompatible with
each other (such as, for instance, two-pass color quantization and
partial image decompression.)
- Change the text of JERR_NOTIMPL to "Requested features are
incompatible". This is a more accurate description of the situation.
"Not implemented yet" implies that it may be possible to support the
requested combination of features in the future, but that is not true
in most of the cases where JERR_NOTIMPL is used.
Fixes#567
(regression introduced by aa7459050d)
cjpeg sets cinfo.in_color_space to JCS_RGB as an "arbitrary guess."
Since tjLoadImage() never uses JCS_RGB, the PGM reader should treat
JCS_RGB the same as JCS_UNKNOWN.
Fixes#566
This fixes an oversight from the integration of the arithmetic entropy
codec from libjpeg (66f97e6820). I chose
to integrate the latest implementation available at the time, which was
from jpeg-8b. However, I naively replaced cinfo->lim_Se with
DCTSIZE2 - 1, not realizing that-- because of SmartScale-- jpeg-8b
contains additional code
(https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-8b/jdinput.c#L249-L334)
that guards against illegal values of cinfo->Se >= DCTSIZE2. Thus,
libjpeg-turbo's implementation of arithmetic decoding has never guarded
against such illegal values. This commit restores the relevant check
from the original jpeg-6b arithmetic entropy codec patch ("jpeg-ari",
1e247ac854).
Fixes#564
CIFuzz runs the project's fuzzers for a limited period of time any time
a commit is pushed or a PR is submitted. This is not intended to
replace OSS-Fuzz but rather to allow us to more quickly catch some
fuzzing failures, including fuzzer build regressions like the one
introduced in ecf021bc0d.
Closes#559
Recent FreeBSD/PowerPC compilers, such as Clang 11.0.x on FreeBSD 13, do
the equivalent of passing -maltivec to the compiler by default, so
run-time AltiVec detection is unnecessary. However, it becomes
necessary when using other compilers or when passing -mno-altivec to the
compiler.
Closes#552
- Don't check for exceptions immediately after invoking the
GetPrimitiveArrayCritical() method. That method does not throw
exceptions, and checking for them caused -Xcheck:jni to warn about
calling other JNI functions in the scope of
Get/ReleasePrimitiveArrayCritical().
- Check for exceptions immediately after invoking the
CallStaticObjectMethod() method in the PROP2ENV() macro.
- Don't use the Get/ReleasePrimitiveArrayCritical() methods for small
arrays. -Xcheck:jni didn't complain about that, but there is no
performance advantage to using those methods rather than the
Get*ArrayRegion() methods for small arrays, and using
Get*ArrayRegion() makes the code less error-prone.
- Don't release the source/destination planes arrays in the YUV methods
until after the corresponding C TurboJPEG functions have returned.
Referring to https://cmake.org/cmake/help/latest/policy/CMP0065.html,
CMake 3.3 and earlier automatically added compiler/linker flags such as
-rdynamic/-export-dynamic, which caused symbols to be exported from
executables. The primary purpose of this is to allow plugins loaded via
dlopen() to access symbols from the calling program. libjpeg-turbo
does not need this functionality, and enabling it needlessly increases
the size of the libjpeg-turbo executables.
Setting CMP0065 to NEW when using CMake 3.4 and later prevents CMake
from automatically adding the aforementioned compiler/linker flags
unless the ENABLE_EXPORTS property is set for a target (or the
CMAKE_ENABLE_EXPORTS variable is set, which causes ENABLE_EXPORTS to be
set for all targets.)
Closes#554
When building for 32-bit Arm platforms, test whether basic Neon
intrinsics will compile with the specified compiler and C flags. This
prevents the build system from enabling the Neon SIMD extensions when
targetting Armv6 and other legacy architectures that do not support Neon
instructions.
Regression introduced by bbd8089297.
(Checking whether gas-preprocessor.pl was needed for 32-bit Arm builds
had the effect of checking whether Neon instructions were supported.)
Fixes#553
We would have been perfectly happy for AppVeyor to delete all artifacts
other than those from the latest builds in each branch. Instead, they
chose to change the global artifact retention policy to 1 month. In
addition to being unworkable, that new policy uses more storage than the
policy we requested. Lose/lose, so we'll just deploy to S3 like we do
with other platforms.
This issue was introduced in 5557fd2217
due to an oversight, so it has existed in libjpeg-turbo since the
project's inception. However, the issue is effectively a non-issue.
Although #325 proposes allowing programs to override jpeg_get_*() and
jpeg_free_*() externally, there is currently no way to override those
functions without modifying the libjpeg-turbo source code.
libjpeg-turbo only includes the malloc()/free() memory manager from
libjpeg, and the implementation of jpeg_free_*() in that memory manager
ignores the size argument. libjpeg had several additional memory
managers for legacy systems (MS-DOS, System 7, etc.), but those memory
managers ignored the size argument to jpeg_free_*() as well. Thus, this
issue would have only potentially affected custom memory managers in
downstream libjpeg-turbo forks, and since no one has complained until
now, apparently those are rare.
Fixes#542
Attempting to losslessly transform certain malformed JPEG images can
cause the nbits table index in the Huffman encoder to exceed 32768, so
we need to pad the SSE2 implementation of that table to 65536 entries as
we do with the C implementation.
Regression introduced by 087c29e07fFixes#543
This is the same error that d147be83e9
suppressed in decode_mcu_slow(). The image that reproduces this error
in decode_mcu_fast() has been added to the libjpeg-turbo seed corpora.
Closes#537
Although sizeof(void *) == sizeof(size_t) for all architectures that are
currently supported by libjpeg-turbo, such is not guaranteed by the C
standard. Specifically, CHERI-enabled architectures (e.g. CHERI-RISC-V
or Arm's Morello) use capability pointers that are twice the size of
size_t (128 bits for Morello and RV64), so casting to size_t strips the
upper bits of the pointer (including the validity bit) and makes it
non-deferenceable, as indicated by the following compiler warning:
warning: cast from provenance-free integer type to pointer type will
give pointer that can not be dereferenced
[-Werror,-Wcheri-capability-misuse]
cvalue = values = (JCOEF *)PAD((size_t)values_unaligned, 16);
Ignoring this warning results in a run-time crash. Casting pointers to
uintptr_t, if it is available, avoids this problem, since uintptr_t is
defined as an unsigned integer type that can hold a pointer value.
Since C89 compatibility is still necessary in libjpeg-turbo, this commit
introduces a new typedef for pointer-to-integer casts that uses a
GNU-specific extension available in GCC 4.6+ and Clang 3.0+ and falls
back to using size_t if the extension is unavailable. The only other
options would require C99 or Clang-specific builtins.
Closes#538
'buffer' is both passed into the inline assembly code and modified by
it. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, 6.47.2.3.
With GCC 4, this commit does not change the generated assembly code at
all.
With GCC 8, this commit fixes an assembly error:
/tmp/{foo}.s: Assembler messages:
/tmp/{foo}.s:775: Error: registers may not be the same --
`str r9,[r9],#4'
I'm not sure why that error went unnoticed, since I definitely
benchmarked the previous commit with GCC 8. Anyhow, this commit changes
the generated assembly code slightly but does not alter performance.
With Clang 10, this commit changes the generated assembly code slightly
but does not alter performance.
Refer to #529
Referring to the C standard
(http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf,
J.2 Undefined behavior), the behavior of the compiler is undefined if
"conversion between two pointer types produces a result that is
incorrectly aligned." Thus, the behavior of this code
*((uint32_t *)buffer) = BUILTIN_BSWAP32(put_buffer);
in the AArch32 version of the FLUSH() macro is undefined unless 'buffer'
is 32-bit-aligned. Referring to
https://bugs.llvm.org/show_bug.cgi?id=50785, certain versions of Clang,
when generating Thumb (T32) instructions, miscompile that code into an
assembly instruction (stm) that requires the destination to be
32-bit-aligned. Since such alignment cannot be guaranteed within the
Huffman encoder, this reportedly led to crashes (SIGBUS: illegal
alignment) with AArch32/Thumb builds of libjpeg-turbo running on Android
devices, although thus far I have been unable to reproduce those crashes
with a plain Linux/Arm system.
The miscompilation is visible with the Compiler Explorer:
https://godbolt.org/z/rv1ccx1Pb
However, it goes away when removing the return statement from the
function. Thus, it seems that Clang's behavior in this regard is
somewhat variable, which may explain why the crashes are only
reproducible on certain platforms.
The suggested workaround is to use memcpy(), but whereas Clang and
recent GCC releases are smart enough to compile a 4-byte memcpy() call
into a str instruction, GCC < 6 is not. Referring to
https://godbolt.org/z/ae7Wje3P6, the only way to consistently produce
the desired str instruction across all supported compilers is to use
inline assembly. Visual C++ presumably does not miscompile the code in
question, since no issues have been reported with it, but since the code
relies on undefined compiler behavior, prudence dictates that
e4ec23d7ae should be reverted for Visual
C++, which this commit does. The performance impact of
e4ec23d7ae for Visual C++/Arm builds is
unknown (I have no ability to test such builds), but regardless, this
commit reverts the Visual C++/Arm performance to that of libjpeg-turbo
2.1 beta1.
Closes#529
This resolves a conflict between the RPM generated by the libjpeg-turbo
build system and the Red Hat 'filesystem' RPM if
CMAKE_INSTALL_LIBDIR=/usr/lib[64]. This code was largely borrowed from
the VirtualGL RPM spec. (I can legally do that because I hold the
copyright on VirtualGL's implementation.)
Fixes#532
Arm compilers have three floating point ABI options:
'soft' compiles floating point operations as function calls into a
software floating point library, which emulates floating point
operations using integer operations. Floating point function arguments
are passed using integer registers.
'softfp' also compiles floating point operations as function calls into
a floating point library and passes floating point function arguments
using integer registers, but the floating point library functions can
use FPU instructions if the CPU supports them.
'hard' compiles floating point operations into inline FPU instructions,
similarly to x86 and other architectures, and passes floating point
function arguments using FPU registers.
Not all AArch32 CPUs have FPUs or support Neon instructions, so on Linux
and Android platforms, the AArch32 SIMD dispatcher in libjpeg-turbo only
enables the Neon SIMD extensions at run time if /proc/cpuinfo indicates
that the CPU supports Neon instructions or if Neon instructions are
explicitly enabled (e.g. by passing -mfpu=neon to the compiler.) In
order to support all AArch32 CPUs using the same code base, i.e. to
support run-time FPU and Neon auto-detection, it is necessary to compile
the scalar C source code using -mfloat-abi=soft. However, the 'soft'
floating point ABI cannot be used when compiling Neon intrinsics, so the
intrinsics implementation of the Neon SIMD extensions must be compiled
using -mfloat-abi=softfp if the scalar C source code is compiled using
-mfloat-abi=soft.
This commit modifies the build system so that it detects whether
-mfloat-abi=softfp must be explicitly added to the compiler flags when
building the intrinsics implementation of the Neon SIMD extensions.
This will be necessary if the build is using the 'soft' floating
point ABI along with run-time auto-detection of Neon instructions.
Fixes#523
The existing /*FALLTHROUGH*/ comments work with GCC but not Clang, so
this commit adds a FALLTHROUGH macro that uses the 'fallthrough'
attribute if the compiler supports it.
Refer to https://bugs.chromium.org/p/chromium/issues/detail?id=995993
NOTE: All versions of GCC that support -Wimplicit-fallthrough also
support the 'fallthrough' attribute, but certain other compilers (Oracle
Solaris Studio, for instance) support /*FALLTHROUGH*/ but not the
'fallthrough' attribute. Thus, this commit retains the /*FALLTHROUGH*/
comments, which have existed in the libjpeg code base in some form since
1994 (libjpeg v5.)
Closes#531
Referring to #527, the security community did not assign this CVE ID
until more than 8 months after the fix for the issue was released. By
the time they assigned the ID, libjpeg-turbo already had two production
releases containing the fix. This calls into question the usefulness of
assigning a CVE ID to the issue, particularly given that the buffer
overrun in question was fully contained in the stack, not detectable
with valgrind, and confined to lossless transformation (it did not
affect JPEG compression or decompression.)
https://vuldb.com/?id.176175
says that "the exploitability is told to be easy" but provides no
clarification, and given that the author of that page does not seem to
be aware that a fix for the issue has been available since early
December of 2019, it calls into question the accuracy of everything else
on the page.
It would really be nice if the security community approached me about
these things before wasting my time, but I guess it's my lot in life to
modify a change log entry from 2019 to include a CVE ID from 2020.
So it goes...
In theory, all objects that will be included in a Un*x shared library
must be built using PIC. In practice, most compilers don't require PIC
to be explicitly specified for jsimd_none.o, either because the compiler
automatically enables PIC in all cases (Ubuntu) or because the size of
the generated object is too small. But some rare compilers do require
PIC to be explicitly specified for jsimd_none.o.
Fixes#520
Our workflow script does not currently work with tags, and there is no
point to building tags anyhow, since we do not use the CI system to spin
official builds.
When using the in-memory destination manager, it is necessary to
explicitly call the destination manager's term_destination() method if
an error occurs. That method is called by jpeg_finish_compress() but
not by jpeg_abort_compress().
This fixes a potential double free() that could occur if tjCompress*()
or tjTransform() returned an error and the calling application tried to
clean up a JPEG buffer that was dynamically re-allocated by one of those
functions.
- UBSan complained that entropy->restarts_to_go was underflowing an
unsigned integer when it was decremented while
cinfo->restart_interval == 0. That was, of course, completely
innocuous behavior, since the result of the underflowing computation
was never used.
- d3a3a73f64 and
7bc9fca430 silenced a UBSan signed
integer overflow error, but unfortunately other malformed JPEG images
have been discovered that cause unsigned integer overflow in the same
computation. Since, to the best of our understanding, this behavior
is innocuous, this commit reverts the commits listed above, suppresses
the UBSan errors, and adds code comments to document the issue.
Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1050342,
there are certain very rare circumstances under which a malformed JPEG
image can cause different Huffman decoder output to be produced,
depending on the size of the source manager's I/O buffer. (More
specifically, the fast Huffman decoder didn't handle invalid codes in
the same manner as the slow decoder, and since the fast decoder requires
all data to be memory-resident, the buffering strategy determines
whether or not the fast decoder can be used on a particular MCU block.)
After extensive experimentation, the Mozilla and Chrome developers and I
determined that this truly was an innocuous issue. The patch that both
browsers adopted as a workaround caused a performance regression with
32-bit code, which is why it was not accepted into libjpeg-turbo. This
commit fixes the problem in a less disruptive way with no performance
regression.
After the completion of the start_input() method, it's too late to check
the image size, because the image readers may have already tried to
allocate memory for the image. If the width and height are excessively
large, then attempting to allocate memory for the image could slow
performance or lead to out-of-memory errors prior to the fuzz target
checking the image size.
NOTE: Specifically, the aforementioned OOM errors and slow units were
observed with the compression fuzz targets when using MSan.
Certain rare malformed input images can cause the Huffman encoder to
generate a value for nbits that corresponds to an uninitialized member
of the DC code table. The ramifications of this are minimal and would
basically amount to a different bogus JPEG image being generated from a
particular bogus input image.
- Referring to 3311fc0001, we need to use
unsigned intermediate math in order to make UBSan happy, even though
(JDIMENSION)(A * B) is effectively the same as
(JDIMENSION)A *(JDIMENSION)B, regardless of intermediate overflow.
- Because of the previous commit, it is now possible for bfOffBits to be
INT_MIN, which would cause the initial computation of bPad to
underflow a signed integer. Thus, we need to check for that
possibility as soon as we know the values of bfOffBits and headerSize.
The worst case from this regression is that bPad could wrap around to
a large positive value, which would cause a "Premature end of input
file" error in the subsequent read_byte() loop. Thus, this issue was
effectively innocuous as well, since it resulted in catching the same
error later and in a different way. Also, the issue was very
well-contained, since it was both introduced and fixed as part of the
ongoing OSS-Fuzz integration project.
- rdbmp.c: Because of 8fb37b8171,
bfOffBits, biClrUsed, and headerSize were made into unsigned ints.
Thus, if bPad would eventually be negative due to a malformed header,
UBSan complained about unsigned math being used in the intermediate
computations. It was unnecessary to make those variables unsigned,
since they are only meant to hold small values, so this commit makes
them signed again. The UBSan error was innocuous, because it is
effectively (if not officially) the case that
(int)((unsigned int)a - (unsigned int)b) == (int)a - (int)b.
- rdbmp.c: If (biWidth * source->bits_per_pixel / 8) would overflow an
unsigned int, then UBSan complained at the point at which row_width
was set in start_input_bmp(), even though the overflow would have been
detected later in the function. This commit adds overflow checks
prior to setting row_width.
- rdppm.c: read_pbm_integer() now bounds-checks the intermediate
value computations in order to catch integer overflow caused by a
malformed text PPM. It's possible, though extremely unlikely, that
the intermediate value computations could have wrapped around to a
value smaller than maxval, but the worst case is that this would have
generated a bogus pixel in the uncompressed image rather than throwing
an error.
A fuzzing test case that was effectively a 1-pixel PGM file with a
maximum value of 1 and an actual value of 8 caused an uninitialized
member of the rescale[] array to be accessed in get_gray_rgb_row() or
get_gray_cmyk_row(). Since, for performance reasons, those functions do
not perform bounds checking on the PPM values, we need to ensure that
unused members of the rescale[] array are initialized.
A fuzzing test case with an image width of 838860946 triggered a UBSan
error:
rdbmp.c:633:34: runtime error: signed integer overflow:
838860946 * 3 cannot be represented in type 'int'
Because the result is cast to an unsigned int (JDIMENSION), this error
is irrelevant, because
(unsigned int)((int)838860946 * (int)3) ==
(unsigned int)838860946 * (unsigned int)3
This limits the tjLoadImage() behavioral changes to the scope of the
compress_fuzzer target. Otherwise, TJBench in fuzzer builds would
refuse to load images larger than 1 Mpixel.
- The PPM reader now throws an error rather than segfaulting (due to a
buffer overrun) if an application attempts to load a 16-bit PPM file
into a grayscale uncompressed image buffer. No known applications
allowed that (not even the test applications in libjpeg-turbo),
because that mode of operation was never expected to work and did not
work under any circumstances. (In fact, it was necessary to modify
TJBench in order to reproduce the issue outside of a fuzzing
environment.) This was purely a matter of making the library bow out
gracefully rather than crash if an application tries to do something
really stupid.
- The PPM reader now throws an error rather than generating incorrect
pixels if an application attempts to load a 16-bit PGM file into an
RGB uncompressed image buffer.
- The PPM reader now correctly loads 16-bit PPM files into extended
RGB uncompressed image buffers. (Previously it generated incorrect
pixels unless the input colorspace was JCS_RGB or JCS_EXT_RGB.)
The only way that users could have potentially encountered these issues
was through the tjLoadImage() function. cjpeg and TJBench were
unaffected.
The non-default options were not being tested because of a pixel format
comparison buglet. This commit also changes the code in both
decompression fuzz targets such that non-default options are tested
based on the pixel format index rather than the pixel format value,
which is a bit more idiot-proof.
Otherwise, the targets will require libstdc++, the i386 version of which
is not available in the OSS-Fuzz runtime environment. The OSS-Fuzz
build environment passes -stdlib:libc++ in the CXXFLAGS environment
variable in order to mitigate this issue, since the runtime environment
has the i386 version of libc++, but using that compiler flag requires
using the C++ compiler.
This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo
source tree, thus obsoleting and improving code coverage relative to
Google's OSS-Fuzz target for libjpeg-turbo (previously available here:
https://github.com/google/oss-fuzz).
I hope to eventually create fuzz targets for the BMP, GIF, and PPM
readers as well, which would allow for fuzz-testing compression, but
since those readers all require an input file, it is unclear how to
build an efficient fuzzer around them. It doesn't make sense to
fuzz-test compression in isolation, because compression can't accept
arbitrary input data.
Define compiler-independent byte-swap macros and use them instead of
executing 'rev' via inline assembly code with GCC-compatible compilers
or a slow shift-store sequence with Visual C++.
* This produces identical assembly code with:
- 64-bit GCC 8.4.0 (Linux)
- 64-bit GCC 9.3.0 (Linux)
- 64-bit Clang 10.0.0 (Linux)
- 64-bit Clang 10.0.0 (MinGW)
- 64-bit Clang 12.0.0 (Xcode 12.2, macOS)
- 64-bit Clang 12.0.0 (Xcode 12.2, iOS)
* This produces different assembly code with:
- 64-bit GCC 4.9.1 (Linux)
- 32-bit GCC 4.8.2 (Linux)
- 32-bit GCC 8.4.0 (Linux)
- 32-bit GCC 9.3.0 (Linux)
Since the intrinsics implementation of Huffman encoding is not used
by default with these compilers, this is not a concern.
- 32-bit Clang 10.0.0 (Linux)
Verified performance neutrality
Closes#507
(regression introduced by 16bd984557)
This implements the same fix for
jsimd_encode_mcu_AC_refine_prepare_sse2() that
a81a8c137b implemented for
jsimd_encode_mcu_AC_first_prepare_sse2().
Based on:
1a59587397eb176a91d8Fixes#509Closes#510
Referring to https://bugzilla.redhat.com/show_bug.cgi?id=1937385#c2,
it is my opinion that the severity of this bug was grossly overstated
and that a CVE never should have been assigned to it, but since one was
assigned, users need to know which version of libjpeg-turbo contains
the fix.
Dear security community, please learn what "DoS" actually means and stop
misusing that term for dramatic effect. Thanks.
* commit '8a2cad020171184a49fa8696df0b9e267f1cf2f6': (99 commits)
Build: Handle CMAKE_OSX_ARCHITECTURES=(i386|ppc)
Add Sponsor button for GitHub repository
Build: Support CMAKE_OSX_ARCHITECTURES
cjpeg: Fix FPE when compressing 0-width GIF
Fix build with Visual C++ and /std:c11 or /std:c17
Neon: Fix Huffman enc. error w/Visual Studio+Clang
Use CLZ compiler intrinsic for Windows/Arm builds
Build: Use correct SIMD exts w/VStudio IDE + Arm64
jcphuff.c: Fix compiler warning with clang-cl
Migrate from Travis CI to GitHub Actions
tjexample.c: Fix mem leak if tjTransform() fails
Build: Officially support Ninja
decompress_smooth_data(): Fix another uninit. read
LICENSE.md: Remove trailing whitespace
Build: Test for correct AArch32 RPM/DEBARCH value
LICENSE.md: Formatting tweak
Fix uninitialized read in decompress_smooth_data()
Fix buffer overrun with certain narrow prog JPEGs
Bump revision to 2.0.91 for post-beta fixes
Travis: Use Docker tag that matches Git branch
...
We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo
anymore, but they still work (bearing in mind that PowerPC builds
require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or
earlier.) Referring to #495, apparently MacPorts needs this
functionality.
The GNU builtin function __builtin_clzl() accepts an unsigned long
argument, which is 8 bytes wide on LP64 systems (most Un*x systems,
including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused
the Neon intrinsics implementation of Huffman encoding to produce
mathematically incorrect results when compiled using Visual Studio with
Clang.
This commit changes all invocations of __builtin_clzl() in the Neon SIMD
extensions to __builtin_clzll(), which accepts an unsigned long long
argument that is guaranteed to be 8 bytes wide on all systems.
Fixes#480Closes#490
The __builtin_clz() compiler intrinsic was already used in the C Huffman
encoders when building libjpeg-turbo for Arm CPUs using a GCC-compatible
compiler. This commit modifies the C Huffman encoders so that they also
use__builtin_clz() when building for Arm CPUs using Visual Studio +
Clang, as well as the equivalent _CountLeadingZeros() compiler intrinsic
when building for Arm CPUs using Visual C++.
In addition to making the C Huffman encoders faster on Windows/Arm, this
also prevents jpeg_nbits_table from being included in Windows/Arm builds,
thus saving 128 KB of memory.
When configuring a Visual Studio IDE build and passing -A arm64 to
CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE
based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of
CMAKE_SYSTEM_PROCESSOR.
Note that this removes our ability to regression test the Armv8 and
PowerPC SIMD extensions, effectively reverting
a524b9b06b and
02227e48a9, but at the moment, there is no
other way.
Regression introduced by 42825b68d5
The test case
https://user-images.githubusercontent.com/3491627/101376530-fde56180-38b0-11eb-938d-734119a5b5ba.jpg
is a malformed progressive JPEG image containing an interleaved Y/Cb/Cr
DC scan followed by two non-interleaved Y DC scans. Thus, the
prev_coef_bits[] array was initialized for the Y component but not the
other components, the uninitialized values for Cb and Cr were
transferred to the prev_coef_bits_latch[] array in smoothing_ok(), and
because cinfo->master->last_good_iMCU_row was 0,
decompress_smooth_data() read those uninitialized values when attempting
to smooth the second iMCU row.
Possibly fixes#478
Regression introduced by 6d91e950c8
last_block_column in decompress_smooth_data() can be 0 if, for instance,
decompressing a 4:4:4 image of width 8 or less or a 4:2:2 or 4:2:0 image
of width 16 or less. Since last_block_column is an unsigned int,
subtracting 1 from it produced 0xFFFFFFFF, the test in line 590 passed,
and we attempted to access blocks from a second block column that didn't
actually exist.
Closes#476
- Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for
compile-time detection of Arm builds, since __arm__ and __aarch64__
are only present in GNU-compatible compilers.
- Neon/intrinsics: Use the _CountLeadingZeros() and
_CountLeadingZeros64() intrinsics provided by Visual Studio, since
__builtin_clz() and __builtin_clzl() are only present in
GNU-compatible compilers.
- Neon/intrinsics: Since Visual Studio does not support static vector
initialization, replace static initialization of Neon vectors with the
appropriate intrinsics. Compared to the static initialization
approach, this produces identical assembly code with both GCC and
Clang.
- Neon/intrinsics: Since Visual Studio does not support inline assembly
code, provide alternative code paths for Visual Studio whenever inline
assembly is used.
- Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds
(Visual Studio does not emit fused multiply-add [FMA] instructions by
default for such builds.)
- Neon/intrinsics: Move temporary buffer allocation outside of nested
loops. Since Visual Studio configures Arm builds with a relatively
small amount of stack memory, attempting to allocate those buffers
within the inner loops caused a stack overflow.
Closes#461Closes#475
Otherwise, because the file begins with an ASCII header, Git will
erroneously treat is as an ASCII file, and if Git for Windows is
configured with default options (specifically, "Checkout windows-style,
commit Unix-style line endings"), it will add carriage return characters
to all of the "linefeed" characters in the PPM file, thus corrupting it
and causing libjpeg-turbo's regression tests to fail.
There doesn't seem to be any performance or compatibility downside to
this, and it has the advantages of simplicity and consistency between
the PR and official builds.
- Rename IOS_ARMV8_BUILD to ARMV8_BUILD.
- Rename install_ios() to install_subbuild() in makemacpkg.
- Wordsmith the build instructions accordingly.
- Use xcode12.2 image in Travis CI.
This error occurs at the call to (*cinfo->cconvert->color_convert)() in
sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE
(i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is
innocuous, since (*cinfo->cconvert->color_convert)() points to a dummy
function (noop_convert()) in that case.
Fixes#470
The "32bit" vs. "64bit" floating point test results actually have
nothing to do with the FPU. That was a fallacious assumption based on
the observation that, with multiple CPU types, 32-bit and 64-bit builds
produce different floating point test results. It seems that this is,
in fact, due to differing compiler behavior-- more specifically, whether
fused multiply-add (FMA) instructions are used to combine multiple
floating point operations into a single instruction ("floating point
expression contraction".) GCC does this by default if the target
supports FMA instructions, which PowerPC and AArch64 targets both do.
Fixes#468
This allows the Neon intrinsics code to be built successfully (albeit
likely with reduced run-time performance) with Xcode 5.0-6.2
(iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2
will not build the Armv8 GAS code without gas-preprocessor.pl, and no
version of Xcode will build the Armv7 GAS code without
gas-preprocessor.pl, so we always use the full Neon intrinsics
implementation by default with macOS and iOS builds.
Auto-detecting the completeness of the compiler's set of Neon intrinsics
also allows us to more intelligently set the default value of
NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a
reasonable, albeit imperfect, proxy for whether a compiler has a full
and optimal set of Neon intrinsics. Specific notes:
- 64-bit RGB-to-YCbCr color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer forward DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit Huffman encoding
uses vld1q_u8_x4(), regresses with GCC
- 64-bit YCbCr-to-RGB color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
- 32-bit RGB-to-YCbCr color conversion:
uses vld1_u16_x2(), regresses with GCC
- 32-bit accurate integer forward DCT
uses vld1_s16_x3(), regression irrelevant because there was no
previous implementation
- 32-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 32-bit fast integer inverse DCT
does not use any of the intrinsics in question, regresses with GCC
- 32-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
Presumably when GCC includes a full and optimal set of Neon intrinsics,
the HAVE_VLD1* tests will pass, and the full Neon intrinsics
implementation will be enabled automatically.
- Remove gas-preprocessor.pl. None of the compilers that can build the
new intrinsics implementation require gas-preprocessor.pl (tested
with Xcode and with Clang 3.9+ for Linux.)
- Document that Xcode 6.3.x or later is now required for iOS builds
(older versions of Xcode do not have a full set of Neon intrinsics.)
- Add a change log entry.
- Do not enable the ASM CMake language unless NEON_INTRINSICS is false.
- Add a Clang/Arm64 test to .travis.yml in order to test the new
intrinsics implementation.
Closes#455
There was no previous GAS implementation.
NOTE: This doesn't produce much of a speedup when using -O3, because -O3
already enables Neon autovectorization, which works well for the scalar
C implementation of plain upsampling. However, the Neon SIMD
implementation will benefit other optimization levels.
There was no previous GAS implementation.
This commit also reverts 40557b2301 and
7723d7f7d0.
7723d7f7d0 was only necessary because
there was no Neon implementation of merged upsampling/color conversion,
and 40557b2301 was only necessary because
of 7723d7f7d0.
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using the new
NEON_INTRINSICS CMake variable.
The previous AArch32 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch64 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch32 GAS implementation of h2v1 fancy upsampling has
been removed, since the intrinsics implementation provides the same or
better performance. There was no previous GAS implementation of h2v2
fancy upsampling, and there was no previous AArch64 GAS implementation
of h2v1 fancy upsampling.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. There was no previous AArch32 GAS implementation.
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using a new
NEON_INTRINSICS CMake variable.
Regression caused by
a46c111d9f
Because of 7723d7f7d0, which was
introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/
color conversion is disabled on platforms that have SIMD-accelerated
YCbCr -> RGB color conversion but not SIMD-accelerated merged
upsampling/color conversion. This was intended to improve performance
with the Neon SIMD extensions, since those are the only SIMD extensions
for which those circumstances apply. Under normal circumstances, the
separate "plain" (non-fancy) upsampling and color conversion routines
will produce bitwise-identical output to the merged upsampling/color
conversion routines, but that is not the case when skipping scanlines
starting at an odd-numbered scanline. The modified test introduced in
a46c111d9f does precisely that in order to
validate the fixes introduced in
9120a24743 and
a46c111d9f.
Because of 7723d7f7d0, the segfault fixed
in 9120a24743 and
a46c111d9f didn't affect the Neon SIMD
extensions, so this commit effectively reverts the test modifications in
a46c111d9f when using those SIMD
extensions. We can get rid of this hack, as well as
7723d7f7d0, once a Neon implementation of
merged upsampling/color conversion is available.
- Refer to the "slow" [I]DCT algorithms as "accurate" instead, since
they are not slow under libjpeg-turbo.
- Adjust documentation claims to reflect the fact that the "slow" and
"fast" algorithms produce about the same performance on AVX2-equipped
CPUs (because of the dual-lane nature of AVX2, it was not possible to
accelerate the "fast" algorithm beyond what was achievable with SSE2.)
Also adjust the claims to reflect the fact that the "fast" algorithm
tends to be ~5-15% faster than the "slow" algorithm on
non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo
SIMD extensions.
- Indicate the legacy status of the "fast" and float algorithms in the
documentation and cjpeg/djpeg usage info.
- Remove obsolete paragraph in the djpeg man page that suggested that
the float algorithm could be faster than the "fast" algorithm on some
CPUs.
It is our convention to use the term "region" when referring to crop
specs, since this is more consistent with the terminology used by the
rest of the image processing community.
The checkstyle script was hastily developed prior to libjpeg-turbo 2.0
beta1, so it has a lot of exceptions and is thus prone to false
negatives. This commit eliminates some of those exceptions.
- Restore GIF read/compressed GIF write support from jpeg-6a and
jpeg-9d.
- Integrate jpegtran -wipe and -drop options from jpeg-9a and jpeg-9d.
- Integrate jpegtran -crop extension (for expanding the image size) from
jpeg-9a and jpeg-9d.
- Integrate other minor code tweaks from jpeg-9*
- Set CPU_TYPE=arm if performing a 32-bit build on an AArch64 system.
This eliminates the need to use a CMake toolchain file.
- Set RPMARCH=armv7hl if building on a 32-bit Arm system with an FPU.
- Set RPMARCH=armv7hl and DEBARCH=armhf if performing a 32-bit build
using a gnueabihf toolchain.
- If performing a 32-bit Arm build, generate a 32-bit supplementary DEB
package for AArch64 systems.
This commit modifies decompress_smooth_data(), adding missing MCU column
offsets to the prev_block_row and next_block_row indices that are used
for block rows other than the first and last. Effectively, this
eliminates unexpected visual artifacts when using jpeg_crop_scanline()
along with interblock smoothing while decompressing the DC scan of a
progressive JPEG image.
Based on:
0227d4fb48Fixes#456Closes#457
* tag '2.0.5':
TurboJPEG: Make global error handling thread-safe
ChangeLog.md: Add missing sub-header for 2.0.5
ChangeLog.md: List CVE ID fixed by previous commit
rdppm.c: Fix buf overrun caused by bad binary PPM
Build: Add missing jpegtran-icc test dependency
rdswitch.c: Eliminate spaces before semicolons
TJCompressor.compress(int): Fix YUV-to-JPEG error
Bump version to 2.0.5; Document previous commit
MIPS DSPr2: Work around various 'make test' errors
MIPS DSPr2: Fix compiler warning with -mdspr2
MIPS SIMD: Always honor JSIMD_FORCE* env vars
Test: Honor CMAKE_CROSSCOMPILING_EMULATOR variable
- Introduce a partial image decompression regression test script that
validates the correctness of jpeg_skip_scanlines() and
jpeg_crop_scanlines() for a variety of cropping regions and libjpeg
settings.
This regression test catches the following issues:
#182, fixed in 5bc43c7821#237, fixed in 6e95c08649794f5018608f37250026a45ead2db8
#244, fixed in 398c1e9acc#441, fully fixed in this commit
It does not catch the following issues:
#194, fixed in 773040f9d9#244 (additional segfault), fixed in
9120a24743
- Modify the libjpeg-turbo regression test suite (make test) so that it
checks for the issue reported in #441 (segfault in
jpeg_skip_scanlines() when used with 4:2:0 merged upsampling/color
conversion.)
- Fix issues in jpeg_skip_scanlines() that caused incorrect output with
h2v2 (4:2:0) merged upsampling/color conversion. The previous commit
fixed the segfault reported in #441, but that was a symptom of a
larger problem. Because merged 4:2:0 upsampling uses a "spare row"
buffer, it is necessary to allow the upsampler to run when skipping
rows (fancy 4:2:0 upsampling, which uses context rows, also requires
this.) Otherwise, if skipping starts at an odd-numbered row, the
output image will be incorrect.
- Throw an error if jpeg_skip_scanlines() is called with two-pass color
quantization enabled. With two-pass color quantization, the first
pass occurs within jpeg_start_decompress(), so subsequent calls to
jpeg_skip_scanlines() interfere with the multipass state and prevent
the second pass from occurring during subsequent calls to
jpeg_read_scanlines().
The additional segfault mentioned in #244 was due to the fact that
the merged upsamplers use a different private structure than the
non-merged upsamplers. jpeg_skip_scanlines() was assuming the latter, so
when merged upsampling was enabled, jpeg_skip_scanlines() clobbered one
of the IDCT method pointers in the merged upsampler's private structure.
For reasons unknown, the test image in #441 did not encounter this
segfault (too small?), but it encountered an issue similar to the one
fixed in 5bc43c7821, whereby it was
necessary to set up a dummy postprocessing function in
read_and_discard_scanlines() when merged upsampling was enabled.
Failing to do so caused either a segfault in merged_2v_upsample() (due
to a NULL pointer being passed to jcopy_sample_rows()) or an error
("Corrupt JPEG data: premature end of data segment"), depending on the
number of scanlines skipped and whether the first scanline skipped was
an odd- or even-numbered row.
Fixes#441Fixes#244 (for real this time)
IIRC, this was only necessary with the version of Java 1.5 that shipped
with OS X 10.4 "Tiger". Apple's implementation of Java 6 ("Java for
OS X Systems") supported both .jnilib and .dylib extensions for JNI
libraries, but Oracle's implementation of Java has only ever supported
the .dylib extension.
The scales have now tilted overwhelmingly in favor of eliminating
support for 32-bit Macs:
- 32-bit applications are only necessary in order to support OS X 10.5
"Leopard" and OS X 10.6 "Snow Leopard". OS X 10.7 "Lion" requires a
64-bit Mac and supports all 64-bit Macs.
- 32-bit applications are no longer allowed in the macOS App Store.
- 32-bit applications no longer run in macOS 10.15 "Catalina".
- 32-bit applications do not support thread-local storage, so the
TurboJPEG API library's global error handler is not thread-safe with
such applications.
- libjpeg-turbo 2.1.x no longer supports 32-bit iOS apps, so it makes
sense to also eliminate support for 32-bit macOS applications.
It's time.
We haven't provided official Cygwin builds since 1.4.x, since Cygwin
now supplies its own libjpeg-turbo packages (although they apparently
haven't been updated past 1.5.3.)
This extends the fix in 1e81b0c3ea to
include binary PPM files with maximum values < 255, thus preventing a
malformed binary PPM input file with those specifications from
triggering an overrun of the rescale array and potentially crashing
cjpeg, TJBench, or any program that uses the tjLoadImage() function.
Fixes#433
Referring to #408, this commit #ifdefs DSPr2 SIMD functions that only
work on little endian processors, and it completely excludes
jsimd_h2v1_downsample_dspr2() and jsimd_h2v2_downsample_dspr2(). The
latter two functions fail with the TJBench tiling regression tests, most
likely because the implementation of the functions predates those tests.
Previously, these environment variables were not honored unless a 74K
CPU was detected, but this detection doesn't work properly with QEMU's
user mode emulation. With all other CPU types, libjpeg-turbo honors
JSIMD_FORCE* regardless of CPU detection.
This CMake variable is intended to define a wrapper program for
executing cross-compiled executables. However, CTest doesn't use
CMAKE_CROSSCOMPILING_EMULATOR, because it isn't obvious which tests
should be executed with the wrapper and which tests are scripts that
don't need it. This commit manually prepends
${CMAKE_CROSSCOMPILING_EMULATOR} to all unit test command lines that
execute a program built by the libjpeg-turbo build system. Thus, one
can set CMAKE_CROSSCOMPILING_EMULATOR in a CMake toolchain file to (for
instance) "qemu-{architecture} {qemu_arguments}") in order to execute
all eligible unit tests using QEMU.
+ document that tjFree() accepts NULL pointers without complaint.
Effectively, it has had that behavior all along, but the API does not
guarantee that tjFree() will be implemented with free() behind the
scenes, so it's best to formalize the behavior.
This programming practice (which exists in other code bases as well)
is a by-product of having used early C compilers that did not properly
handle free(NULL). All modern compilers should properly handle that.
Fixes#398
- Don't enumerate the types of SIMD instructions that libjpeg-turbo
supports, as this can change without notice.
- Use more clear terminology when describing support for libjpeg v7/v8
features ("libjpeg" is, colloquially but not officially, the name for
the IJG's software, whereas the "libjpeg API" refers to our emulation
of said software.)
- "PhotoShop" = "Photoshop" (StudLy Caps Police)
- Adjust dynamic library versions to reflect the addition of
jpeg_read_icc_profile() and jpeg_write_icc_profile() in
libjpeg-turbo 2.0.x.
Move constants out of the .text section in simd/arm64/jsimd_neon.S and
into a .rodata section. This ensures that the ARMv8 NEON SIMD
extensions are compatible with memory layouts that are marked
execute-only (and thus unreadable.)
Based on:
88f3ca7664Closes#318
libjpeg-turbo never included that code, because it requires an external
library (the Utah Raster Toolkit.) The RLE image format was supplanted
by GIF in the late 1980s, so it is rarely seen these days. (It had a
lousy Weissman score, anyhow.)
- Enable progress reporting at run time using a new -report argument
(cjpeg now supports that argument as well)
- Limit the allowable number of scans using a new -maxscans argument
- Treat warnings as fatal using a new -strict argument
This mainly demonstrates how to work around the two issues with the
JPEG standard described here:
https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf
since those and similar issues continue to be erroneously reported as
libjpeg-turbo bugs.
Modern Loongson processors are MIPS64-compatible, and MMI instructions
are now supported in the mainline of GCC. Thus, this commit adds
compile-time and run-time auto-detection of MMI instructions and moves
the MMI SIMD extensions for libjpeg-turbo from simd/loongson/ to
simd/mips64/. That will allow MMI and MSA instructions to co-exist
in the same build once #377 has been integrated.
Based on:
82953ddd61Closes#383
Because of 01e3032354 (officially
eliminating support for compilers without unsigned char, since we never
effectively supported those compilers anyhow), GETJOCTET() is now a
no-op. Since that macro is in jmorecfg.h, it is part of the de facto
libjpeg API and must remain in the public headers. However, there is no
reason to continue using it internally, and eliminating its internal use
improves code readability.
This commit adds ARM64 NEON optimizations for the
encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in
progressive Huffman encoding.
Compression speedups for the typical set of five libjpeg-turbo test
images (https://libjpeg-turbo.org/About/Performance):
Cortex-A53: 23.8-39.2% (avg. 32.2%)
Cortex-A72: 26.8-41.1% (avg. 33.5%)
Apple A7: 29.7-45.9% (avg. 39.6%)
Closes#229
Homebrew tends to drop support for a macOS release the second that Apple
stops releasing security updates for it, and that makes HB difficult to
use with some of the Travis macOS images. Furthermore, even on
supported macOS releases, HB sometimes tries to build GCC from source
even if a binary (bottle) is available. Long story short, MacPorts just
generally has better backward compatibility. MacPorts is also what I
personally use on the official libjpeg-turbo build machine.
... detected by ASan. This is a similar issue to the issue that was
fixed with 402a715f82. Apparently it is
possible to create a malformed JPEG image that exceeds the Huffman
encoder's 256-byte local buffer when attempting to losslessly tranform
the image. That makes sense, given that it was necessary to extend the
Huffman decoder's local buffer to 512 bytes in order to handle all
pathological cases (refer to 0463f7c9aad060fcd56e98d025ce16185279e2bc.)
Since this issue affected only lossless transformation, a workflow that
isn't generally exposed to arbitrary data exploits, and since the
overrun did not overflow the stack (i.e. it did not result in a segfault
or other user-visible issue, and valgrind didn't even detect it), it did
not likely pose a security risk.
Fixes#392
... that caused some JPEG images with unusual sampling factors to be
misidentified as 4:4:4. This led to a buffer overflow when attempting
to decompress some such images using tjDecompressToYUV*().
Regression introduced by 479501b07c
The correct behavior is for the TurboJPEG API to refuse to decompress
such images, which it did prior to the aforementioned commit.
Fixes#389
Referring to #289, I'm not sure where I arrived at the conclusion that
the SSE2 progressive Huffman encoder doesn't provide any speedup for
x32. Upon re-testing, I discovered it to be about 50% faster than the
C encoder.
This commit also re-purposes one of the CI tests (specifically, the
jpeg-7 API/ABI test) so that it tests x32 as well.
... introduced by 42825b68d5. In fact,
fault-tolerant multi-scan block smoothing cannot currently be used with
the arithmetic decoder, because that decoder doesn't have any way of
distinguishing a normal end of scan from an unexpected end of scan.
Thus, this commit also modifies the change log to reset the expectations
regarding the scope of the fault-tolerant multi-scan block smoothing
feature. If, at some point in the future, the arithmetic decoder can be
modified to detect an unexpected end of scan, then one would need only
set entropy->pub.insufficient_data = TRUE when the arithmetic decoder
encounters an unexpected end of scan in order to make fault-tolerant
block smoothing work properly with that decoder.
This commit modifies the behavior of the block smoothing algorithm in
the libjpeg API library so that, if a scan in a multi-scan JPEG image is
incomplete (due to premature termination of the image stream), the block
smoothing parameters from the previous (complete) scan are used to
smooth any iMCU rows that the incomplete scan does not contain.
Closes#343
Modifying a locally-defined non-volatile variable below the setjmp()
return point results in undefined behavior whereby the variable may not
have the expected value after setjmp() returns.
Fixes#379
The primary impetus for this is to eliminate build warnings, such as
(32-bit only)
section "__textcoal_nt" is deprecated
object file (XXXXXX.o) was built for newer OSX version (10.XX) than
being linked (10.5)
Upgrading to GCC 6 results in neutral performance for compression,
a measured average overall decompression speedup of 2.5% for 64-bit
code, and a measured average overall decompression speedup of -4.3% for
32-bit code on a 3 GHz Core i7. The 4.3% slow-down for 32-bit code is
deemed acceptable, given that 32-bit macOS apps are deprecated.
Splitting the pointer arithmetic in GET_SYM() into a separate add and
sub instruction was an attempt to work around an error ("invalid operand
type") that occurred when assembling the file with NASM. However, this
created a link error on macOS ("ld: illegal text-relocation to
'_jconst_huff_encode_one_block' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o from
'_jsimd_huff_encode_one_block_sse2' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o for architecture i386")
and also changed the alignment of the code in ways that might have
affected the previous benchmark results (which took a great deal of time
to obtain.) Ultimately, the path of least resistance is just to
require NASM 2.13 or later.
This commit improves the C and SSE2 Huffman encoding implementations in
the following ways:
- Avoid using xmm8-xmm15 in the x86-64 SSE2 implementation. There is no
actual need to use those registers, and avoiding them produces a
cleaner WIN64 function entry/exit-- as well as shorter code, since REX
prefixes can be avoided (this is helpful on certain CPUs, such as
Intel Atom, for which instruction fetch and decoding can be a
bottleneck.)
- Optimize register usage so that fewer REX prefixes and
register-register moves are needed.
- Use the bit counter to store the number of free bits in the bit buffer
rather than the number of bits in the bit buffer. This changes the
method for inserting a code into the bit buffer to:
(put_buffer |= code << (free_bits -= code_size));
As a result:
* Only one bit counter needs to stay in a register (we just keep it in
cl.)
* The bit buffer contents are already properly aligned to be written
out (after a byte swap.)
* Adjusting the free bits counter and checking if the bit buffer is
full can be combined into a single operation.
* We can wait to flush the bit buffer until the buffer is actually
full and not just in danger of becoming full. Thus, eight bytes can
be flushed at a time.
- Speed is quite sensitive to the alignment of branch target labels, so
insert some padding and remove branches from the flush code.
(Flushing this way isn't actually faster when compared to using
branches, but the branchless code doesn't need extra alignment and is
thus smaller.)
- Speculatively write out the bit buffer as a single 8-byte write,
falling back to a byte-by-byte write only if there are any 0xFF bytes
in the bit buffer that need to be encoded as 0xFF 0x00.
- Use MMX registers for the 32-bit implementation (so the bit buffer can
be 64 bits wide.)
- Slightly reduce overall function code size.
- Eliminate or combine a few SSE instructions.
- Make some minor improvements to instruction scheduling.
- Adjust flush_bits() in jchuff.c to handle cases in which the bit
buffer has less than 7 free bits (apparently that couldn't happen
before.)
Based on:
947a09defa262ebb6b816e9a091221
See change log for performance claims.
Closes#292
This macro is a relic of libjpeg's historic need to support a wide
variety of C compilers with varying degrees of compatibility. Such was
necessary during the open systems era, because C compilers were often
supplied by the system vendor. Prior to 1989, there was no C standard
per se, and even after ANSI C became a thing, there were still compilers
in use that didn't conform to it (libjpeg was first released in 1991.)
Realistically, only a handful of C compilers are in widespread use these
days, and all modern C compilers should support structure assignment.
... to avoid backward compatibility issues with GCC 4-6 MinGW
toolchains. Apparently GCC 7+ MinGW toolchains introduce a link-time
dependency with internal MinGW CRT functions that are meant to provide
compatibility with Microsoft's Universal CRT (ucrt) library, but those
internal functions are not available in GCC 4-6 MinGW toolchains. This
made it impossible to use the official builds of libjpeg.a and
libturbojpeg.a with GCC 4-6 MinGW toolchains (a fatal link error--
"undefined reference to '__imp___acrt_iob_func'"-- occurred.)
This problem was not immediately apparent after switching to the MSYS2
implementation of MinGW (d6d7b53968)
because, for a while, MSYS2 was still using GCC 5 and 6.
Refer to libjpeg-turbo/libjpeg-turbo#382
byte, word, dword, qword, oword, and yword are all assembler keywords,
so it makes sense to use lowercase for these so as not to mistake them
for macros or constants.
(AKA "Java for OS X systems.") This implementation of Java 1.6 is long
obsolete and not supported on any version of macOS past High Sierra.
Oracle no longer provides a 32-bit JVM on macOS, so it is no longer
necessary to provide a 32-bit version of the TurboJPEG Java wrapper on
macOS.
We have more than eight registers to work with, as well as three-operand
intrinsics, so there's no need for the implementation to be such a
literal port of the MMX code.
libjpeg-turbo has never really supported such compilers, since (AFAIK)
they are non-existent on any modern computing platform and thus
impossible for us to test. (Also, the TurboJPEG API would break without
unsigned chars.)
Furthermore, the unified CMake-based build system introduced in 2.0
always defines HAVE_UNSIGNED_CHAR, so retaining other code paths is
pointless. Eliminating support for compilers without unsigned char
eliminates the need for the GETJSAMPLE() macro, which improves the
readability of many parts of the code as well as improving the
performance of writing Targa and Windows BMP files.
Fixes#317
When cli arguments request image-changing operation (like transform, scans or arith coding) to be applied, force output result file, even if it has bigger filesize than original
This change is proposed to enable cases where this library and it's CMakeLists.txt are included in other projects/CMakeLists.txt files as a dependency via the add_subdirectory method, for example: add_subdirectory(mozjpeg). There are several reasons and workflows which "wrap" third party projects/builds using this method, including many enterprise build/devops pipelines.
This change will have no effect on users building mozjpeg by itself as usual, it very simply enables the wrapping use cases.
With this parem do not write JFIF APP0 marker segment. Reduce size in 18 bytes. This is a mandatory marker, but no error in know programs if are lost. Safe for web use.
Tag 1.5.2 release
* tag '1.5.2': (54 commits)
x86: Fix "short jump is out of range" w/ NASM<2.04
TurboJPEG: Document xform issue w/ big marker data
Java TJBench: Fix parsing of -warmup argument
Build: Disable warmup in TJBench regression tests
TJBench: Improve consistency of results
TurboJPEG: C API documentation buglet
TJBench: Code formatting tweaks
TJBench: Fix errors when decomp. files w/ ICC data
BUILDING.md: Include Android/x86 build recipes
Travis: Fix OS X build
Restore compatibility with older autoconf releases
Attribute ARM runtime detection code to Nokia
Honor max_memory_to_use/JPEGMEM/-maxmemory
AppVeyor: Fix CI build
TurboJPEG: Fix potential memory leaks
Always tweak EXIF w/h tags w/ lossless transforms
Fix error w/ lossless crop & libjpeg v7 emulation
Include jpeg_skip/crop_scanlines() in jpeg7.dll
libjpeg.txt: Include partial decomp. in TOC
Slightly de-confusify cjpeg, jpegtran usage info
...
Tag 1.5.1 release
* tag '1.5.1':
ARM64 NEON: Fix another ABI conformance issue
Build: Remove ARMv6 support from 'make iosdmg'
Fix out-of-bounds write in partial decomp. feature
Silence additional UBSan warnings
Fix unsigned int overflow in libjpeg memory mgr.
TurboJPEG: Decomp. 4:2:2/4:4:0 JPEGs w/unusual SFs
Silence pedantic GCC6 code formatting warnings
Use plain upsampling if merged isn't accelerated
Implement h1v2 fancy upsampling
Fix AArch64 ABI conformance issue in SIMD code
Don't install libturbojpeg.pc if TJPEG disabled
Linux/PPC: Only enable AltiVec if CPU supports it
ARM/MIPS: Change the behavior of JSIMD_FORCE*
Bump version to 1.5.1 to prepare for new commits
* libjpeg-turbo/master: (140 commits)
Increase severity of tjDecompressToYUV2() bug desc
Catch libjpeg errors in tjDecompressToYUV2()
BUILDING.md: Fix "... OR ..." indentation again
BUILDING.md: Fix confusing Windows build reqs
ChangeLog.md: Improve readability of plain text
change.log: Refer users to ChangeLog.md
Markdown version of ChangeLog.txt
Rename ChangeLog.txt
README.md: Link to BUILDING.md
BUILDING.md and README.md: Cosmetic tweaks
ChangeLog: "1.5 beta1" --> "1.4.90 (1.5 beta1)"
Java: Fix parallel make with autotools
Win/x64: Fix improper callee save of xmm8-xmm11
Bump TurboJPEG C API revision to 1.5
ChangeLog: Mention jpeg_crop_scanline() function
1.5 beta1
Fix v7/v8-compatible build
libjpeg API: Partial scanline decompression
Build: Make the NASM autoconf variable persistent
Use consistent/modern code formatting for dbl ptrs
...
* libjpeg-turbo/1.4.x: (94 commits)
CMakeLists.txt: Clarify that Un*x isn't supported
Catch libjpeg errors in tjDecompressToYUV2()
cjpeg: Fix buf overrun caused by bad bin PPM input
Add version/build info to global string table
Ensure that default Huffman tables are initialized
Fix memory leak when running tjunittest -yuv
Prevent overread when decoding malformed JPEG
Guard against wrap-around in alloc functions
Fix Visual C++ compiler warnings
rdppm.c: formatting tweaks
jmemmgr.c: formatting tweaks
TurboJPEG: Avoid dangling pointers
Update Android build instr. for ARMv8, PIE, etc.
Makefile.am: formatting tweak
Update build instructions for new autoconf, GitHub
1.4.3
Regression: Allow co-install of 32-bit/64-bit RPMs
Build: Use FILEPATH type for NASM CMake variable
Comment formatting tweaks
Fix 'make dist'
...
Tag 1.4.1 release
* tag '1.4.1': (427 commits)
Now that the TurboJPEG API is reporting libjpeg warnings as errors, an "Invalid SOS parameters for sequential JPEG" warning surfaced in tjDecodeYUV*(). This was caused by the Se member of jpeg_decompress_struct being set to 0 (it is normally set to a non-zero value when the start-of-scan markers are read, but there are no SOS markers in this case, because we're not actually decompressing a JPEG file.)
Fix a segfault that occured in the MIPS DSPr2 fancy upsampling routine when downsampled_width==3. Because the DSPr2 code unrolls the loop for the middle columns (refer to jdsample.c), it has the effect of performing two column iterations, and that only works properly if the number of columns (minus the first and last) is >= 2. For the specific case of downsampled_width==3, this patch skips to the second iteration of the unrolled column loop.
If a warning (such as "Premature end of JPEG file") is triggered in the underlying libjpeg API, make sure that the TurboJPEG API function returns -1. Unlike errors, however, libjpeg warnings do not make the TurboJPEG functions abort.
Back out r1555 and r1548. Using setenv() didn't fix the iOS simulator issue. It just replaced an undefined _putenv$UNIX2003 symbol with an undefined _setenv$UNIX2003 symbol. The correct solution seems to be to use -D_NONSTD_SOURCE when generating our official builds.
Fix the Windows build. I remember now why I used putenv() originally-- because Windows doesn't have setenv(). We could use _putenv_s(), but older versions of MinGW don't have that either. Fortunately, since all of the environment values we're setting in turbojpeg.c are static, we can just map setenv() to putenv() using a macro. NOTE: we still have to use _putenv_s() in turbojpeg-jni.c, but at least people who may need to build with an older version of MinGW can still do so by disabling the Java build.
Allow building only static or only shared libraries on Windows
__WORDSIZE doesn't seem to be available on platforms other than Mac or Linux, and best practices are for user-level code not to rely on it anyhow, since it's meant to be an internal macro. Fortunately, autoconf already has a way of determining the word size at configure time, so it can be passed into the compiler. This should work on any platform and has been tested on all of the Un*x platforms we support (Linux, Mac, FreeBSD, Solaris.)
Unless you define _ANSI_SOURCE, then putenv() on Mac is renamed to putenv$UNIX2003(), and this causes problems when trying to link an i386 iOS application (for the simulator) against the TurboJPEG static library. It's easiest to just use setenv() instead.
Fix a bug in the 64-bit Huffman encoder that Google discovered when encoding some very specific (and proprietary) aerial images using quality=98, an optimized Huffman table, and the ISLOW DCT. These images were causing the Huffman bit buffer to overflow, because the code for encoding the DC coefficient was using the equivalent of the 32-bit version of EMIT_BITS(). Thus, when 64-bit code was used, the DC coefficient code was not properly checking how many bits were in the buffer before attempting to add more bits to it. This issue appears to have existed in all versions of libjpeg-turbo.
Restore backward compatibility with MSVC < 2010 (broken by r1541)
Oops. OS X doesn't define __WORDSIZE unless you include stdint.h, so apparently the Huffman codec hasn't ever been fully accelerated on 64-bit OS X.
Allow the executables and libraries outside of the sharedlib/ directory to be linked against msvcr*.dll instead of libcmt*.lib. This is reported to be necessary when building libjpeg-turbo for use with C#.
Surround the usage of getenv() in the TurboJPEG API with #ifndef NO_GETENV so that developers can add -DNO_GETENV to the C flags when building for platforms that don't have getenv(). Currently this is known to be necessary when building for Windows Phone.
If libjpeg-turbo is configured with a non-default prefix, such as /usr, then use the docdir variable defined by autoconf 2.60 and later, if available. This will, for instance, install the documentation under /usr/share/doc/libjpeg-turbo by default if prefix=/usr, unless docdir is overridden. When using earlier versions of autoconf, docdir is set to ${datadir}/doc, as it always has been.
Enable silent build rules for the NASM objects, if the source is configured with automake 1.11 or later. NOTE: the build still spits out "error: ignoring unknown tag NASM" for each object, but unfortunately, if we remove "--tag NASM" from the command line, the build breaks under older versions of automake (it aborts with "unable to infer tagged configuration.")
Set the RPM and deb architecture properly on non-x86 platforms.
Come on, Cohaagen, you got what you want. Give these people air!
Oops. Need to set the alpha channel when using TYPE_4BYTE_ABGR*. This has no bearing on the actual tests, but it prevents the PNG pre-encode reference images for those tests from being blank.
Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error.
Introduce fast paths to speed up NULL color conversion somewhat, particularly when using 64-bit code; on the decompression side, the "slow path" also now use an approach similar to that of the compression side (with the component loop outside of the column loop rather than inside.) This is faster when using 32-bit code.
...
* origin/master: (108 commits)
Bump version number to 3.1.
jpegyuv: fix memory leak when path is invalid
jpegyuv: fix memory leak when @image_buffer allocation fails
yuvjpeg: fix memory leak when @image_buffer allocation fails
jpegtran: Do not leak the input and output buffers
Fix previous commit
Scan optimization: return error when unable to copy data buffer
cjpeg option for baseline quant tables
Fix#153
rdpng: convert 16-bit input to 8-bit
Larger number of DC trellis candidates
Fix overflow issue #157
Const on getters
Const on simple getters and copy source
Expanded .gitignore
Add pkg-config requirement
Re-order links.
Declare inbuffer const
Oops. Delete the duplicate copy of [lib]turbojpeg.dll in the binary directory when uninstalling the package.
Get rid of changelog file that we don't update.
...
* commit 'eca0637c8150d3d1c08a60c64d7ee16eaea4b198':
Remove trailing spaces
Another oops. tjBufSizeYUV2() should return -1 if width < 1.
Oops. tjPlaneSizeYUV() should return -1 if componentID > 0 and subsamp==TJSAMP_GRAY.
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
Fix Windows build
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
Fix build on OS X PowerPC platforms
Oops. Forgot to alter the version header in the change log to indicate the release of 1.4 beta.
Create 1.4.x branch
#166 describes an issue where I/O suspension is not properly handled in
scan optimization. Supporting I/O suspension may be difficult to
achieve here, thus return an error to make it explicit that I/O
suspension is unsupported.
Add command line option -quant-baseline to cjpeg to force quantization
table entries to be in 1-255 range for JPEG baseline compatibility. See
related discussion in #145
* libjpeg-turbo: (39 commits)
Oops. Delete the duplicate copy of [lib]turbojpeg.dll in the binary directory when uninstalling the package.
AltiVec SIMD implementation of sample conversion and integer quantization
Document the fact that the AltiVec implementation uses the same modified algorithms as the SSE2 implementation
Use intrinsics for loading/storing data in the DCT/IDCT functions. This has no effect on the performance of the aligned loads/stores, but it makes it more obvious what that code is doing. Using intrinsics for the unaligned stores in the inverse DCT functions increases overall decompression performance by 1-2%.
AltiVec SIMD implementation of RGB-to-Grayscale color conversion
Remove unneeded code; Make sure jccolor-altivec.o will be rebuilt if jccolext-altivec.c changes.
AltiVec SIMD implementation of RGB-to-YCC color conversion
Make test a phony target so things don't go haywire if there is a file named test.c in the current directory.
Maintain the traditional order of the regression tests while allowing the TurboJPEG and libjpeg portions to be executed separately
Make comments more consistent
Add a "quicktest" pseudo-target, for those times when you just don't want to sit through 11 iterations of TJUnitTest.
Cosmetic tweaks to the PowerPC SIMD stubs
Split AltiVec algorithms into separate files for ease of maintenance; Rename constants using lowercase so they are not confused with macros
Optimizations to the AltiVec DCT algorithms (pre-compute constants and combine multiply/add operations)
AltiVec SIMD implementation of slow integer inverse DCT
Use macros to allocate constants statically, rather than reading them from a table using vec_splat*(). This improves code readability and probably improves performance a bit as well.
Swap the order of the IFAST and ISLOW FDCT functions so that it matches the order of the prototypes in jsimd.h and the stubs in jsimd_powerpc.c.
Include ARMv8 binaries when generating a combined OS X/iOS package using 'make iosdmg'
In the output of the configure script, indicate whether gas-preprocessor.pl is being used along with the assembler.
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
...
Conflicts:
BUILDING.txt
ChangeLog.txt
cjpeg.c
jpegtran.c
Initial implementation of trellis quantization for arithmetic coding.
The rate computation does not yet implement all rules of the entropy
coder and may thus be suboptimal.
Fix pass number computation in scan optimization to support case where
Huffman table optimization is not done, e.g. when arithmetic coding is
used
Enable combination of arithmetic coding and scan optimization
(previously disabled)
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1444 632fc199-4ca6-4c93-a231-07263d6284db
Add extension parameter JFLOAT_TRELLIS_DELTA_DC_WEIGHT that controls
how distortion is calculated in DC trellis quantization. The parameter
defines weighting between actual distortion of DC and distortion of
vertical gradient of DC.
By default the parameter is 0.0 and has no effect.
Addresses #117
This incorporates an upstream fix to add jdmrg565.c to the tarball created
by 'make dist', as well as a fix to add the new jcmaster.h file to same. There
are still some mozjpeg-specific files that aren't added when doing 'make dist'.
I'll let someone else worry about those. This patch mainly ensures that any
files that might be eventually adopted upstream are included.
mozjpeg should produce identical output to libjpeg-turbo when the JCP_FASTEST
compression profile is used. That means that that profile needs to revert to
the default libjpeg quantization/Huffman tables as well as disable mozjpeg's
duplicate table checking feature. This patch also adds -revert to any instance
of cjpeg and jpegtran called by 'make test' (or ctest on Windows), so that
those tests actually work again. The tests aren't useful for regression
testing the mozjpeg extensions, but at least they can now be used to regression
test the underlying code.
There was an oversight in the extension framework. jpeg_start_compress() can
be called multiple times between the time that a compress structure is created
and the time it is destroyed. If this happened, then the following sequence
would occur:
-- heap alloc of master struct within jpeg_create_compress()
-- heap free of master struct within jinit_c_master_control()
-- static alloc of extended master struct (JPOOL_IMAGE) within
jinit_c_master_control()
-- free extended master struct in jpeg_finish_compress()
-- jinit_c_master_control() now sees that cinfo->master is set and tries to
free it, even though it has already been freed. Chaos ensues.
The fix involved breaking out the extended master struct into a header so that
jpeg_create_compress() can go ahead and allocate it to the correct size, thus
eliminating the need to free and reallocate it in jinit_c_master_control().
Further, the master struct is now created in the permanent pool, so it will
survive until the compression struct is destroyed. Further,
jinit_c_master_control() now resets all fields in the master struct that
are not related to the extension parameters.
"jcext" is a bit more descriptive, since this code is primarily intended to
extend the libjpeg API. It does so in a backward-ABI-compatible manner, but
"jccompat" could be misinterpreted to mean that the code is providing backward
compatibility at the code level..
This eliminates JBOOLEAN_USE_MOZ_DEFAULTS and replaces it with
JINT_COMPRESS_PROFILE, a more flexible and descriptive parameter. Currently,
this new parameter works in much the same way as the old-- it changes the
behavior of jpeg_set_defaults(). It currently supports only two values
(max. compression, i.e. mozjpeg defaults, and fastest, i.e. libjpeg-turbo
defaults), but it can be extended in the future with additional profiles that
balance compression ratio with performance.
This includes more descriptive text for the project summary (the same
text that is in the package descriptions), a more thorough description of the
libjpeg API extensibility framework, reformatting to improve readability
(particularly on 80-column terminals), and numerous grammar tweaks.
JBOOLEAN_ONE_DC_SCAN and JBOOLEAN_SEP_DC_SCAN are merged into a single
parameter JINT_DC_SCAN_OPT_MODE
Default behavior is modified to use one DC scan per component
Since mozjpeg is now backward ABI-compatible with libjpeg[-turbo], it is now
possible to temporarily load mozjpeg into a binary application and cause that
application to generate uber-compressed JPEGs (at the expense of an extreme
performance loss, of course.) For instance, someone could do
LD_LIBRARY_PATH=/opt/mozjpeg/lib convert blah_blah_blah
to make ImageMagick use mozjpeg instead of the system's pre-installed JPEG
library (libjpeg-turbo, in most cases.) However, this only makes sense if
mozjpeg is actually producing different behavior by default than libjpeg-turbo.
Currently it isn't. Currently it requires the application to set
JBOOLEAN_USE_MOZ_DEFAULTS to TRUE in order to enable the mozjpeg-specific
behavior, but of course applications that were built to use libjpeg[-turbo]
won't do that. Thus, this patch sets use_moz_defaults to TRUE by default,
requiring an application to explicitly set it to FALSE in order to revert to
the libjpeg[-turbo] behavior (makes sense, since the only applications that
would need to revert to the libjpeg[-turbo] behavior would be mozjpeg-aware
applications.)
Note that we discussed the possibility of adding a function
(jpeg_revert_defaults()), which would act the same as jpeg_set_defaults() does
in libjpeg[-turbo]. This is a good solution for implementing the -revert
switch in cjpeg, but unfortunately it doesn't work for jpegtran. The reason
is that jpeg_set_defaults() is called within the body of
jpeg_copy_critical_parameters(), which is part of the API. So yet again,
if mozjpeg were loaded into a non-mozjpeg-aware application at run time, it
would be desirable for jpeg_copy_critical_parameters() to set the parameters
to mozjpeg defaults. That means that, in order to implement the -revert
switch in jpegtran, it would be necessary to introduce a new function
(jpeg_revert_critical_parameters(), perhaps). It seems cleaner to just keep
using the JBOOLEAN_USE_MOZ_DEFAULTS parameter to control the behavior of
jpeg_set_defaults(), even though this represents a minor abuse of the libjpeg
API (jpeg_set_defaults() is technically supposed to set all of the parameters
to defaults, irrespective of any previous state. However, as long as we
document that JBOOLEAN_USE_MOZ_DEFAULTS works differently, then it should be
OK.)
* libjpeg-turbo:
Remove trailing spaces
Another oops. tjBufSizeYUV2() should return -1 if width < 1.
Oops. tjPlaneSizeYUV() should return -1 if componentID > 0 and subsamp==TJSAMP_GRAY.
The AltiVec code actually works on 32-bit PowerPC platforms as well, so change the "powerpc64" token to "powerpc". Also clean up the shift code, which wasn't building properly on OS X.
AltiVec SIMD implementation of fast forward DCT
Bump version to 1.5 alpha1 to prepare for new features
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
Fix Windows build
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
Fix build on OS X PowerPC platforms
Oops. Forgot to alter the version header in the change log to indicate the release of 1.4 beta.
For whatever reason, some of these files didn't get fully merged from
libjpeg-turbo 1.4. They still contained tab characters and other formatting
conventions from libjpeg-turbo 1.3. This patch also fixes some obvious
indentation errors in the mozjpeg-specific code. There is more formatting work
that needs to be done to the mozjpeg-specific code, to fix line overruns,
incorrect operator whitespace, and other issues that make it not consistent
with the libjpeg/libjpeg-turbo code.
This might be slightly more controversial, since it changes the CMake and
autotools project names and the binaty package names to "mozjpeg", and it
changes the default install directory to /opt/mozjpeg. To me, this makes much
more sense, but it does represent a change in operational behavior, which is
why I put it in a separate commit.
This patch does the following:
-- Implements some (hopefully non-controversial) changes to the package
descriptions, in order to prevent confusion (the existing descriptions from
libjpeg-turbo are not appropriate for mozjpeg.)
-- Replaces "libmozjpeg" with "mozjpeg" in all documentation and comments. The project is called "mozjpeg", and it doesn't actually generate a library called
"libmozjpeg", so it doesn't make sense to use "libmozjpeg" to describe it.
-- Replaces "MozJPEG" with "TurboJPEG" in all documentation and comments.
"MozJPEG" appears to have been the product of blindly searching/replacing
instances of "Turbo". TurboJPEG is the name of the API, and that name still
applies to the implementation in mozjpeg. Furthermore, the TurboJPEG libraries
are still called "libturbojpeg" in mozjpeg.
-- Attempts to remove build instructions that are irrelevant or not applicable
to mozjpeg. Further work possibly needs to be done here-- for instance, it
doesn't make much sense to have build instructions for mobile devices when the
library is not intended to be used for decoding.
-- Changes the vendor in the DEB and RPM files from "The libmozjpeg Project" to
"Mozilla Research".
-- Changes the source tarball location in the RPM spec file to correctly point
to the release tarball on github.
-- Changes the source directory in the RPM spec file to "mozjpeg-%{version}",
which is the actual name of the source directory in the mozjpeg tarballs.
The ABI compatibility feature was developed by the current maintainer of
libjpeg-turbo with an eye toward eventual inclusion in libjpeg-turbo (once
other features are added to libjpeg-turbo that necessitate the inclusion.)
Thus, it is easy to ensure that the DLL function ordinals will be synchronized
between libjpeg-turbo and mozjpeg. However, it still makes sense to allow for
a little bit of breathing room, just in case. Thus, this patch uses ordinals
starting at 200 for the accessor functions. It would probably make sense to
start the equivalent decompressor get/set functions at ordinal 300, once they
are implemented.
Windows requires exported symbols to be explicitly declared.
Also, use a very large ordinal number so that any future symbols
added by IJG or TurboJPEG will not break ABI.
Fixes#104.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '73edb3d734a628fd88994bc974dc6737a58bd956': (45 commits)
Rename the ARM64 assembly file to match the C file
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches: https://sourceforge.net/p/libjpeg-turbo/patches/64/
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
Clarify forward compatibility of iOS/ARM builds
ARM64 NEON SIMD support for YCC-to-RGB565 conversion
ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
Windows doesn't have setenv(). Go, go Gadget Macros.
1.4 beta1
Fix 'make dist'
Don't use sudo when building a Debian package unless the user is non-root
Add a set of undocumented environment variables and Java system properties that allow compression features of libjpeg that are not normally exposed in the TurboJPEG API to be enabled. These features are not normally exposed because, for the most part, they aren't "turbo" features, but it is still useful to be able to benchmark them without modifying the code.
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
Oops
Subtle point, but dest->outbuffer is a pointer to the address of the JPEG buffer, which is stored in the calling program. Thus, *(dest->outbuffer) will always equal *outbuffer. We need to compare *outbuffer with dest->buffer instead to determine if the pointer is being reused.
If the output buffer in the TurboJPEG destination manager was allocated by the destination manager and is being reused from a previous compression operation, then we need to get the buffer size from the previous operation, since the calling program doesn't know the actual buffer size.
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
...
Conflicts:
CMakeLists.txt
Makefile.am
configure.ac
jcdctmgr.c
release/deb-control.tmpl
sharedlib/CMakeLists.txt
simd/CMakeLists.txt
turbojpeg.c
* origin/master: (23 commits)
Update .gitignore
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Enable DC trellis by default
Avoid double inline attribute
Detect libpng
Implement DHT Merging
Add .gitignore for autotools files
Check memory alloc success
Update cjpeg usage text
Implement DQT merging
Fix issue with scan printout
Get rid of unnecessary and obsolete platform configuration instructions.
Add error checks for malloc calls that don't already have them. Issue #87.
yuvjpeg: fix trivial leak
Parse quality as float
PNG reading support
Fix issue with DC trellis
Add option to split DC scans
Add trellis for DC
Bump version to 2.1.
...
Conflicts:
BUILDING.txt
cdjpeg.h
jcdctmgr.c
jchuff.h
jcmarker.c
jcmaster.c
jconfig.txt
jpeglib.h
rdswitch.c
* commit 'b8d044a666056d4d8d28d7a5d0805ac32b619b36': (58 commits)
Big oops. wrjpgcom on Windows was being built using the rdjpgcom source.
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
Remove VMS-specific code
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We don't support non-ANSI C compilers
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
SIMD-accelerated int upsample routine for MIPS DSPr2
Fix MIPS build
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
Further copyright header cleanup
Further copyright header cleanup
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
Clean up code formatting in the SIMD interface functions
SIMD-accelerated NULL convert routine for MIPS DSPr2
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
Fix error in MIPS DSPr2 accelerated smooth downsample routine
SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2
Minor tweak to improve code readability
...
Conflicts:
BUILDING.txt
CMakeLists.txt
Makefile.am
cdjpeg.h
cjpeg.1
cjpeg.c
configure.ac
djpeg.1
example.c
jccoefct.c
jcdctmgr.c
jchuff.c
jchuff.h
jcinit.c
jcmaster.c
jcparam.c
jcphuff.c
jidctflt.c
jpegint.h
jpeglib.h
jversion.h
libjpeg.txt
rdswitch.c
simd/CMakeLists.txt
tjbench.c
turbojpeg.c
usage.txt
wrjpgcom.c
* commit '93ddfcfc1a814789ed64d967a6118616753bb9d5': (65 commits)
Use clz/bsr instructions on ARM for bit counting rather than the lookup table (reduces memory footprint and can improve performance in some cases.)
Make iOS build instructions more generic and applicable to all versions of Xcode; modify iOS build procedure for Xcode 5.0 and later to fix a build issue with Xcode 5.1.
Update build instructions to reflect the use of pkgbuild/productbuild
Remove any claims of support for OS X 10.4 "Tiger" (the packaging system overhaul produces packages that require Leopard or later, and I haven't been able to test Tiger for years anyhow.) Update TurboJPEG shared library version.
Migrate Mac packaging system to pkgbuild, since PackageMaker is no longer supported.
Remove the sections about replacing libjpeg at run time and compile time. These were written before O/S distributions started shipping libjpeg-turbo, and they are either pedantic or no longer relevant. Also remove any text that assumes the use of our official project binaries. Notes specific to the official binaries have been moved into the project wiki.
Fix Windows build
Since we're now maintaining our own Cygwin pseudo-repository directories instead of recommending that users install these packages from a local source, it makes more sense to name the packages according to Cygwin specs, so they can be copied as-is into the pseudo-repository.
39dbc2db9718f9af2f62eb486fd73328fe8bf5e8
Fix 'make dist'
RHEL 6 (and probably other platforms as well) sets _defaultdocdir=%{_datadir}/doc, which screws things up, since we're overriding _datadir. Since we intend _defaultdocdir to be /usr/share/doc, just be explicit about it.
Fix compiler warning about unused function when building with the libjpeg v6b API/ABI
Fix compiler warning ("always_inline function might not be inlinable") when building with recent versions of GCC
Enable silent build (can be overridden with 'make V=1') if the version of autotools being used is new enough.
Extend YUVImage class to allow reuse of the same buffer with different metadata; port TJBench changes that treat YUV encoding/decoding as an intermediate step of the JPEG compression/decompression pipeline rather than a separate test case; add YUV encode/decode tests to the Java version of tjbenchtest
formatting tweaks
Fix an error that occurred when trying to use the lossless transform feature without specifying -quiet; formatting tweak
Move the garbage collection of the JPEG tiles into the decompression function to increase the chances that tiled decompression of large images will succeed without an OutOfMemoryError.
Generate the Java documentation using javadoc 7, to improve readability.
This should have been checked in with the previous commit.
...
Conflicts:
BUILDING.txt
configure.ac
jversion.h
release/Info.plist.in
release/ReadMe.rtf
tjbench.c
turbojpeg.c
* mozjpeg: (94 commits)
Disable scan optimization if no scan given
Fixed mozjpeg build with Visual C++ 2010
Added few error messages in cjpeg
Fix trellis for nonprogressive mode (#69)
Bugfix: AM_PROG_AR is not recognized by older automake, so only use it when defined
Bump version number for 2.0, make this version 2.0.1.
Update MS-SSIM tuning
Fix#64
Updating yuvjpeg and jpegyuv to match Daala tools.
Fix#56
Silence compiler warning
Silence compiler warning
Improve support of JPEG input in cjpeg
Fix issue with JPEG read in cjpeg
Add support for JPEG input in cjpeg
Fix#50
Disable trellis in jpegtran
Update version to 2.0pre.
Use single DC scan by default
Add configure check for libm/pow. Fixes Linux build issue.
...
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1403 632fc199-4ca6-4c93-a231-07263d6284db
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1402 632fc199-4ca6-4c93-a231-07263d6284db
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
The current way multipass works, write_scan_header() seems to be called
multiple times, so only DC tables get merged, however, this will still
merge AC tables if possible.
Implements the second half of #30.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Command line option -split-dc-scan is added to code DC scans
independently (instead of interleaved). It should be determined whether
this option introduces any decoder compatibility issues ( see #83 )
Option -multidcscan is renamed to -opt-dc-scan
Add option to apply trellis quantization to the DC coefficients ( see
#57 ). May need further refinement to make sure block order during
trellis optimization matches order during coding.
if trellis_quant is enabled, increment total number of
passes by optimization beginning pass number.
Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
From jpeglib-turbo r1221:
Integrate a slightly modified version of Mozilla's patch for
precomputing the bit-counting LUT. This is useful if the table needs
to be shared among multiple processes, although the primary reason for
doing that is reduced footprint on mobile devices, which are probably
already covered by the clz intrinsic code.
From libjpeg-turbo r1288
Port the more accurate (and slightly faster) floating point IDCT
implementation from jpeg-8a and later. New research revealed that the
SSE/SSE2 floating point IDCT implementation was actually more accurate
than the jpeg-6b implementation, not less, which is why its
mathematical results have always differed from those of the jpeg-6b
implementation. This patch brings the accuracy of the C code in line
with that of the SSE/SSE2 code.
Add macro JPEG_RAW_READER that defines whether to pass RAW sample data
from input to output JPEG files (hence preserving color space and
sampling). Macro is now disabled by default.
Add code to copy metadata from input to output JPEG, hence preserving
color profiles and other important information
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1319 632fc199-4ca6-4c93-a231-07263d6284db
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1318 632fc199-4ca6-4c93-a231-07263d6284db
Value of cinfo->one_dc_scan is set to true by default to use a single
DC scan for all components.
Option -onedcscan is replaced by -multidcscan to enable multiple DC
scans.
While this change appears to degrade compression performance it
improves compatibility with a wider range of JPEG decoders.
Add option to have a single DC scan wherein all components are
interleaved when using progressive mode. This may resolve compatibility
issues raised in #29 and #48.
This option is available through -onedcscan in cjpeg
Optimizes quantization matrix by minimizing reconstruction error based
on quantized coefficients.
Feature is controlled by cinfo->trellis_q_opt; disabled by default.
The invocation of this function is wrapped in an ifdef JPEGLIB >70
but the definition of the function wasn't. This change adds the
ifdef around the function definition.
the throw macros set retval, but not all functions use/return this
value. This change, mainly to clean up compiler warnings, makes
those functions return a value.
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1258 632fc199-4ca6-4c93-a231-07263d6284db
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1257 632fc199-4ca6-4c93-a231-07263d6284db
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1253 632fc199-4ca6-4c93-a231-07263d6284db
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1251 632fc199-4ca6-4c93-a231-07263d6284db
This macro defined two local variables, cinfo and dinfo, but both
aren't always used. Add a (void)cinfo; and (void)dinfo in there
to hush up the compiler.
Derivation of sign value relies on shift right operator >> being an
arithmetic shift. It is thus not strictly portable since the C standard
defines the result of x >> y as "implementation-defined" when x is a
signed integer with a negative value.
Multiple trellis iterations may improve coding performance as Huffman
tables are updated with each iteration. In practice the benefit appears
to be very minimal
Trellis quantization is modified:
- to work on the configurable spectral range Ss to Se
- to optionally optimize runs of EOBs
- to optionally split optimization between 2 spectral ranges
In trellis quantization passes Huffman table code optimization is
modified such as to generable a valid code length for each possible
symbol by resetting frequency counters to 1 instead of 0
Fixes issues #23#24#31
Note that the original jpgcrush script optimizes scans only for YCbCr
and grayscale color spaces. Scan optimization is thus disabled for RGB
and CMYK color spaces and behavior reverts to the fast mode of jpgcrush
which uses predefined scans
Different lambda values may be used for each frequency in DCT domain.
The weighting table is currently not configurable but can be
enabled/disabled with cinfo-> use_lambda_weight_tbl
-tune-psnr, -tune-ssim and -tune-hvs-psnr are added to cjpeg to control
the trellis quantization process and optimize output for PSNR, SSIM and
HVS-PSNR distortion metrics
A new pass type trellis_pass is added. It defines a pass where trellis
quantization is done in the quantize_trellis() function.
Trellis quantization can be enabled by setting use_moz_defaults to 2 or
by using the -trellis option in cjpeg
Note that trellis does currently not work with scan optimization. Scan
optimization is disabled when trellis is enabled.
First implementation of scan optimisation as done by jpgcrush. Many
parameters are currently hardcoded which should be changed.
Implementation is missing for monochrome.
Add the fast mode of jpgcrush where:
- Huffman table optimisation is enabled
- Progressive coding mode is enabled
- New default scans are defined for progressive coding
-- The Mac and Cygwin packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version} on Cygwin and /Library/Documentation/{package_name} on Mac.
-- Fixed a duplicate filename warning when generating RPMs with the default prefix of /opt/libjpeg-turbo.
-- Moved the TurboJPEG libraries out of the system directory on Windows and Mac. It is no longer necessary to put them there, since we are not trying to be backward compatible with TurboJPEG/IPP anymore.
-- Fixed an issue whereby building the "installer" target on Windows would not build the Java JAR file, thus causing an error if the JAR had not been previously built.
-- Building the "install" target on Windows will now install libjpeg-turbo into c:\libjpeg-turbo[-gcc][64] (the same directories used by the installers.) This can be overridden by setting CMAKE_INSTALL_PREFIX.
-- The Java classes on all platforms will now look for the JNI library in the directory under which the build/packaging system installs it.
-- The Mac and Cygwin packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version} on Cygwin and /Library/Documentation/{package_name} on Mac.
-- Fixed a duplicate filename warning when generating RPMs with the default prefix of /opt/libjpeg-turbo.
-- Moved the TurboJPEG libraries out of the system directory on Windows and Mac. It is no longer necessary to put them there, since we are not trying to be backward compatible with TurboJPEG/IPP anymore.
-- Fixed an issue whereby building the "installer" target on Windows would not build the Java JAR file, thus causing an error if the JAR had not been previously built.
-- Building the "install" target on Windows will now install libjpeg-turbo into c:\libjpeg-turbo[-gcc][64] (the same directories used by the installers.) This can be overridden by setting CMAKE_INSTALL_PREFIX.
-- The Java classes on all platforms will now look for the JNI library in the directory under which the build/packaging system installs it.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@946 632fc199-4ca6-4c93-a231-07263d6284db
If the default prefix (/opt/libjpeg-turbo) is used, then we now always install 32-bit libraries in /opt/libjpeg-turbo/lib32 and 64-bit libraries in /opt/libjpeg-turbo/lib64 instead of trying to conform to the Debian or Red Hat conventions. The RPM and DEB packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version}.
If the default prefix (/opt/libjpeg-turbo) is used, then we now always install 32-bit libraries in /opt/libjpeg-turbo/lib32 and 64-bit libraries in /opt/libjpeg-turbo/lib64 instead of trying to conform to the Debian or Red Hat conventions. The RPM and DEB packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version}.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@944 632fc199-4ca6-4c93-a231-07263d6284db
The SIMD glue code has gotten a bit #ifdef heavy so clean it up by having
one file for each possible SIMD arch. This also allows a simplification of
the x86_64 code as SSE/SSE2 is always known to exist on that arch.
Older versions of automake doesn't properly support no-recursive make.
Reimplement the build system by having a local Makefile.am in the
simd/ directory.
Studio build. A static jconfig.h has been re-added, but in a separate
directory, to avoid clash with jconfig.h generated by configure
script. Also, jconfig.h now includes the inline macro. jpeg.dsp has
been modified to search in the "win" subdir, to find jconfig.h.
This patch is in spirit similar to r121.
about: Inform the libjpeg-turbo maintainer about unexpected, reproducible behavior
title: ''
labels: bug
assignees: dcommander
---
**Have you searched the existing issues (both open and closed) in the libjpeg-turbo issue tracker to ensure that this bug report is not a duplicate?**
**Does this bug report describe one of the [two known and unsolvable issues with the JPEG format](https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf)?**
**Clear and concise description of the bug:**
**Steps to reproduce the bug (using *only* libjpeg-turbo):**
**Image(s) needed in order to reproduce the bug (if applicable):**
**Expected behavior:**
**Observed behavior:**
**Platform(s) (compiler version, operating system version, CPU) on which the bug was observed:**
**libjpeg-turbo release(s), commit(s), or branch(es) in which the bug was observed (always test the tip of the master branch or the latest [stable pre-release](https://libjpeg-turbo.org/DeveloperInfo/PreReleases) to verify that the bug hasn't already been fixed):**
**If the bug is a regression, the specific commit that introduced the regression (use `git bisect` to determine this):**
**Complete description of the bug fix or feature that this pull request implements**
**Checklist before submitting the pull request, to maximize the chances that the pull request will be accepted**
- [ ] Read CONTRIBUTING.md, a link to which appears under "Helpful resources" below. That document discusses general guidelines for contributing to libjpeg-turbo, as well as the types of contributions that will not be accepted or are unlikely to be accepted.
- [ ] Search the existing issues and pull requests (both open and closed) to ensure that a similar request has not already been submitted and rejected.
- [ ] Discuss the proposed bug fix or feature in a GitHub issue, through direct e-mail with the project maintainer, or on the libjpeg-turbo-devel mailing list.
The Windows SDK includes both 32-bit and 64-bit Visual C++ compilers and
everything necessary to build libjpeg-turbo.
installing
[Visual Studio Community Edition](https://visualstudio.microsoft.com),
which includes everything necessary to build libjpeg-turbo.
* You can also use Microsoft Visual Studio Express/Community Edition, which
is a free download. (NOTE: versions prior to 2012 can only be used to
build 32-bit code.)
* You can also download and install the standalone Windows SDK (for Windows 7
or later), which includes command-line versions of the 32-bit and 64-bit
Visual C++ compilers.
* If you intend to build libjpeg-turbo from the command line, then add the
appropriate compiler and SDK directories to the `INCLUDE`, `LIB`, and
`PATH` environment variables. This is generally accomplished by
executing `vcvars32.bat` or `vcvars64.bat` and `SetEnv.cmd`.
`vcvars32.bat` and `vcvars64.bat` are part of Visual C++ and are located in
the same directory as the compiler. `SetEnv.cmd` is part of the Windows
SDK. You can pass optional arguments to `SetEnv.cmd` to specify a 32-bit
or 64-bit build environment.
executing `vcvars32.bat` or `vcvars64.bat`, which are located in the same
directory as the compiler.
* If built with Visual C++ 2015 or later, the libjpeg-turbo static libraries
cannot be used with earlier versions of Visual C++, and vice versa.
* The libjpeg API DLL (**jpeg{version}.dll**) will depend on the C run-time
DLLs corresponding to the version of Visual C++ that was used to build it.
- Vcpkg
You need to download and install libpng using the [vcpkg](https://github.com/Microsoft/vcpkg) dependency manager:
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.bat
./vcpkg integrate install
./vcpkg install libpng:x64-windows
./vcpkg install libpng:x64-windows-static
Actually, you can just download and install MozJPEG using vcpkg dependency manager:
./vcpkg install mozjpeg
The mozjpeg port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
... OR ...
- MinGW
[MSYS2](http://msys2.github.io/) or [tdm-gcc](http://tdm-gcc.tdragon.net/)
[MSYS2](https://msys2.org) or [tdm-gcc](https://jmeubank.github.io/tdm-gcc)
recommended if building on a Windows machine. Both distributions install a
Start Menu link that can be used to launch a command prompt with the
appropriate compiler paths automatically set.
- If building the TurboJPEG Java wrapper, JDK 1.5 or later is required. This
This repository is governed by Mozilla's code of conduct and etiquette guidelines.
For more details, please read the
[Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/).
## How to Report
For more information on how to report violations of the Community Participation Guidelines, please read our '[How to Report](https://www.mozilla.org/about/governance/policies/participation/reporting/)' page.
<!--
## Project Specific Etiquette
In some cases, there will be additional project etiquette i.e.: (https://bugzilla.mozilla.org/page.cgi?id=etiquette.html).
Mozilla JPEG Encoder Project [](https://ci.appveyor.com/project/kornel/mozjpeg-4ekrx)
============================
libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2,
AVX2, NEON, AltiVec) to accelerate baseline JPEG compression and decompression
on x86, x86-64, ARM, and PowerPC systems, as well as progressive JPEG
compression on x86 and x86-64 systems. On such systems, libjpeg-turbo is
generally 2-6x as fast as libjpeg, all else being equal. On other types of
systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by
virtue of its highly-optimized Huffman coding routines. In many cases, the
performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
MozJPEG improves JPEG compression efficiency achieving higher visual quality and smaller file sizes at the same time. It is compatible with the JPEG standard, and the vast majority of the world's deployed JPEG decoders.
libjpeg-turbo implements both the traditional libjpeg API as well as the less
powerful but more straightforward TurboJPEG API. libjpeg-turbo also features
colorspace extensions that allow it to compress from/decompress to 32-bit and
big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java
interface.
MozJPEG is a patch for [libjpeg-turbo](https://github.com/libjpeg-turbo/libjpeg-turbo). **Please send pull requests to libjpeg-turbo** if the changes aren't specific to newly-added MozJPEG-only compression code. This project aims to keep differences with libjpeg-turbo minimal, so whenever possible, improvements and bug fixes should go there first.
libjpeg-turbo was originally based on libjpeg/SIMD, an MMX-accelerated
derivative of libjpeg v6b developed by Miyasaka Masaru. The TigerVNC and
VirtualGL projects made numerous enhancements to the codec in 2009, and in
early 2010, libjpeg-turbo spun off into an independent project, with the goal
of making high-speed JPEG compression/decompression technology available to a
broader range of users and developers.
MozJPEG is compatible with the libjpeg API and ABI. It is intended to be a drop-in replacement for libjpeg. MozJPEG is a strict superset of libjpeg-turbo's functionality. All MozJPEG's improvements can be disabled at run time, and in that case it behaves exactly like libjpeg-turbo.
MozJPEG is meant to be used as a library in graphics programs and image processing tools. We include a demo `cjpeg` command-line tool, but it's not intended for serious use. We encourage authors of graphics programs to use libjpeg's [C API](libjpeg.txt) and link with MozJPEG library instead.
License
=======
## Features
libjpeg-turbo is covered by three compatible BSD-style open source licenses.
Refer to [LICENSE.md](LICENSE.md) for a roll-up of license terms.
* Progressive encoding with "jpegrescan" optimization. It can be applied to any JPEG file (with `jpegtran`) to losslessly reduce file size.
* Trellis quantization. When converting other formats to JPEG it maximizes quality/filesize ratio.
* Comes with new quantization table presets, e.g. tuned for high-resolution displays.
* Fully compatible with all web browsers.
* Can be seamlessly integrated into any program that uses the industry-standard libjpeg API. There's no need to write any MozJPEG-specific integration code.
- **libjpeg: In-memory source and destination managers**<br>
See notes below.
- **cjpeg: Separate quality settings for luminance and chrominance**<br>
Note that the libpjeg v7+ API was extended to accommodate this feature only
for convenience purposes. It has always been possible to implement this
feature with libjpeg v6b (see rdswitch.c for an example.)
- **cjpeg: 32-bit BMP support**
- **cjpeg: `-rgb` option**
- **jpegtran: Lossless cropping**
- **jpegtran: `-perfect` option**
- **jpegtran: Forcing width/height when performing lossless crop**
- **rdjpgcom: `-raw` option**
- **rdjpgcom: Locale awareness**
#### Not supported
NOTE: As of this writing, extensive research has been conducted into the
usefulness of DCT scaling as a means of data reduction and SmartScale as a
means of quality improvement. The reader is invited to peruse the research at
<http://www.libjpeg-turbo.org/About/SmartScale> and draw his/her own conclusions,
but it is the general belief of our project that these features have not
demonstrated sufficient usefulness to justify inclusion in libjpeg-turbo.
- **libjpeg: DCT scaling in compressor**<br>
`cinfo.scale_num` and `cinfo.scale_denom` are silently ignored.
There is no technical reason why DCT scaling could not be supported when
emulating the libjpeg v7+ API/ABI, but without the SmartScale extension (see
below), only scaling factors of 1/2, 8/15, 4/7, 8/13, 2/3, 8/11, 4/5, and
8/9 would be available, which is of limited usefulness.
- **libjpeg: SmartScale**<br>
`cinfo.block_size` is silently ignored.
SmartScale is an extension to the JPEG format that allows for DCT block
sizes other than 8x8. Providing support for this new format would be
feasible (particularly without full acceleration.) However, until/unless
the format becomes either an official industry standard or, at minimum, an
accepted solution in the community, we are hesitant to implement it, as
there is no sense of whether or how it might change in the future. It is
our belief that SmartScale has not demonstrated sufficient usefulness as a
lossless format nor as a means of quality enhancement, and thus our primary
interest in providing this feature would be as a means of supporting
additional DCT scaling factors.
- **libjpeg: Fancy downsampling in compressor**<br>
`cinfo.do_fancy_downsampling` is silently ignored.
This requires the DCT scaling feature, which is not supported.
- **jpegtran: Scaling**<br>
This requires both the DCT scaling and SmartScale features, which are not
supported.
- **Lossless RGB JPEG files**<br>
This requires the SmartScale feature, which is not supported.
### What About libjpeg v9?
libjpeg v9 introduced yet another field to the JPEG compression structure
(`color_transform`), thus making the ABI backward incompatible with that of
libjpeg v8. This new field was introduced solely for the purpose of supporting
lossless SmartScale encoding. Furthermore, there was actually no reason to
extend the API in this manner, as the color transform could have just as easily
been activated by way of a new JPEG colorspace constant, thus preserving
backward ABI compatibility.
Our research (see link above) has shown that lossless SmartScale does not
generally accomplish anything that can't already be accomplished better with
existing, standard lossless formats. Therefore, at this time it is our belief
that there is not sufficient technical justification for software projects to
upgrade from libjpeg v8 to libjpeg v9, and thus there is not sufficient
technical justification for us to emulate the libjpeg v9 ABI.
In-Memory Source/Destination Managers
-------------------------------------
By default, libjpeg-turbo 1.3 and later includes the `jpeg_mem_src()` and
`jpeg_mem_dest()` functions, even when not emulating the libjpeg v8 API/ABI.
Previously, it was necessary to build libjpeg-turbo from source with libjpeg v8
API/ABI emulation in order to use the in-memory source/destination managers,
but several projects requested that those functions be included when emulating
the libjpeg v6b API/ABI as well. This allows the use of those functions by
programs that need them, without breaking ABI compatibility for programs that
don't, and it allows those functions to be provided in the "official"
libjpeg-turbo binaries.
Those who are concerned about maintaining strict conformance with the libjpeg
v6b or v7 API can pass an argument of `-DWITH_MEM_SRCDST=0` to `cmake` prior to
building libjpeg-turbo. This will restore the pre-1.3 behavior, in which
`jpeg_mem_src()` and `jpeg_mem_dest()` are only included when emulating the
libjpeg v8 API/ABI.
On Un*x systems, including the in-memory source/destination managers changes
the dynamic library version from 62.1.0 to 62.2.0 if using libjpeg v6b API/ABI
emulation and from 7.1.0 to 7.2.0 if using libjpeg v7 API/ABI emulation.
Note that, on most Un*x systems, the dynamic linker will not look for a
function in a library until that function is actually used. Thus, if a program
is built against libjpeg-turbo 1.3+ and uses `jpeg_mem_src()` or
`jpeg_mem_dest()`, that program will not fail if run against an older version
of libjpeg-turbo or against libjpeg v7- until the program actually tries to
call `jpeg_mem_src()` or `jpeg_mem_dest()`. Such is not the case on Windows.
If a program is built against the libjpeg-turbo 1.3+ DLL and uses
`jpeg_mem_src()` or `jpeg_mem_dest()`, then it must use the libjpeg-turbo 1.3+
DLL at run time.
Both cjpeg and djpeg have been extended to allow testing the in-memory
source/destination manager functions. See their respective man pages for more
details.
Mathematical Compatibility
==========================
For the most part, libjpeg-turbo should produce identical output to libjpeg
v6b. The one exception to this is when using the floating point DCT/IDCT, in
which case the outputs of libjpeg v6b and libjpeg-turbo can differ for the
following reasons:
- The SSE/SSE2 floating point DCT implementation in libjpeg-turbo is ever so
slightly more accurate than the implementation in libjpeg v6b, but not by
any amount perceptible to human vision (generally in the range of 0.01 to
0.08 dB gain in PNSR.)
- When not using the SIMD extensions, libjpeg-turbo uses the more accurate
(and slightly faster) floating point IDCT algorithm introduced in libjpeg
v8a as opposed to the algorithm used in libjpeg v6b. It should be noted,
however, that this algorithm basically brings the accuracy of the floating
point IDCT in line with the accuracy of the slow integer IDCT. The floating
point DCT/IDCT algorithms are mainly a legacy feature, and they do not
produce significantly more accuracy than the slow integer algorithms (to put
numbers on this, the typical difference in PNSR between the two algorithms
is less than 0.10 dB, whereas changing the quality level by 1 in the upper
range of the quality scale is typically more like a 1.0 dB difference.)
- If the floating point algorithms in libjpeg-turbo are not implemented using
SIMD instructions on a particular platform, then the accuracy of the
floating point DCT/IDCT can depend on the compiler settings.
While libjpeg-turbo does emulate the libjpeg v8 API/ABI, under the hood it is
still using the same algorithms as libjpeg v6b, so there are several specific
cases in which libjpeg-turbo cannot be expected to produce the same output as
libjpeg v8:
- When decompressing using scaling factors of 1/2 and 1/4, because libjpeg v8
implements those scaling algorithms differently than libjpeg v6b does, and
libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
- When using chrominance subsampling, because libjpeg v8 implements this
with its DCT/IDCT scaling algorithms rather than with a separate
downsampling/upsampling algorithm. In our testing, the subsampled/upsampled
output of libjpeg v8 is less accurate than that of libjpeg v6b for this
reason.
- When decompressing using a scaling factor > 1 and merged (AKA "non-fancy" or
"non-smooth") chrominance upsampling, because libjpeg v8 does not support
merged upsampling with scaling factors > 1.
Performance Pitfalls
====================
Restart Markers
---------------
The optimized Huffman decoder in libjpeg-turbo does not handle restart markers
in a way that makes the rest of the libjpeg infrastructure happy, so it is
necessary to use the slow Huffman decoder when decompressing a JPEG image that
has restart markers. This can cause the decompression performance to drop by
as much as 20%, but the performance will still be much greater than that of
libjpeg. Many consumer packages, such as PhotoShop, use restart markers when
generating JPEG images, so images generated by those programs will experience
this issue.
Fast Integer Forward DCT at High Quality Levels
-----------------------------------------------
The algorithm used by the SIMD-accelerated quantization function cannot produce
correct results whenever the fast integer forward DCT is used along with a JPEG
quality of 98-100. Thus, libjpeg-turbo must use the non-SIMD quantization
function in those cases. This causes performance to drop by as much as 40%.
It is therefore strongly advised that you use the slow integer forward DCT
whenever encoding images with a JPEG quality of 98 or higher.
Memory Debugger Pitfalls
========================
Valgrind and Memory Sanitizer (MSan) can generate false positives
(specifically, incorrect reports of uninitialized memory accesses) when used
with libjpeg-turbo's SIMD extensions. It is generally recommended that the
SIMD extensions be disabled, either by passing an argument of `-DWITH_SIMD=0`
to `cmake` when configuring the build or by setting the environment variable
`JSIMD_FORCENONE` to `1` at run time, when testing libjpeg-turbo with Valgrind,
MSan, or other memory debuggers.
See [BUILDING](BUILDING.md). MozJPEG is built exactly the same way as libjpeg-turbo, so if you need additional help please consult [libjpeg-turbo documentation](https://libjpeg-turbo.org/).
if not exist c:\installers\nasm-2.10.01-win32.zip curl -fSL -o c:\installers\nasm-2.10.01-win32.zip http://www.nasm.us/pub/nasm/releasebuilds/2.10.01/win32/nasm-2.10.01-win32.zip
7z x c:\installers\nasm-2.10.01-win32.zip -oc:\ > c:\installers\nasm.install.log
set INCLUDE=c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\include;c:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\include
set LIB=c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;c:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\lib\x64
set PATH=c:\nasm-2.10.01;c:\Program Files (x86)\NSIS;c:\msys64\mingw32\bin;c:\msys64\usr\bin;c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64;c:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE;c:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\bin\x64;c:\Program Files (x86)\Microsoft SDKs\Windows\v7.1A\bin;%PATH%
"Directory containing ARMv8 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV8_BUILD})")
set(SECONDARY_BUILD""CACHEPATH
"Directory containing cross-compiled x86-64 or Armv8 (64-bit) iOS or macOS build to include in universal binaries")
set(OSX_APP_CERT_NAME""CACHESTRING
set(MACOS_APP_CERT_NAME""CACHESTRING
"Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG. Leave this blank to generate an unsigned DMG.")
set(OSX_INST_CERT_NAME""CACHESTRING
set(MACOS_INST_CERT_NAME""CACHESTRING
"Name of the Developer ID Installer certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo installer package. Leave this blank to generate an unsigned package.")
<pathd="m 14,-1.1445312 c -2.824372,0 -5.1445313,2.320159 -5.1445312,5.1445312 v 72 c 0,2.824372 2.3201592,5.144531 5.1445312,5.144531 h 52 c 2.824372,0 5.144531,-2.320159 5.144531,-5.144531 V 23.699219 a 1.1447968,1.1447968 0 0 0 -0.01563,-0.1875 C 70.977847,22.605363 70.406495,21.99048 70.007812,21.591797 L 48.208984,-0.20898438 C 47.606104,-0.81186474 46.804652,-1.1445313 46,-1.1445312 Z m 1.144531,6.2890624 H 42.855469 V 24 c 0,1.724372 1.420159,3.144531 3.144531,3.144531 H 64.855469 V 74.855469 H 15.144531 Z m 34,4.4179688 L 60.4375,20.855469 H 49.144531 Z"/>
</g>
<gstyle="fill:#D8DFEE;stroke-width:0">
<pathd="M 3.0307167,13.993174 V 7.0307167 h 2.7576792 2.7576792 v 1.8826151 c 0,1.2578262 0.0099,1.9287572 0.029818,2.0216512 0.03884,0.181105 0.168631,0.348218 0.33827,0.43554 l 0.1355017,0.06975 1.9598092,0.0079 1.959809,0.0078 v 4.749829 4.749829 H 8 3.0307167 Z"transform="matrix(5,0,0,5,0,-30)"/>
<pathd="M 9.8293515,9.0581469 V 7.9456453 l 1.1058025,1.1055492 c 0.608191,0.6080521 1.105802,1.1086775 1.105802,1.1125015 0,0.0038 -0.497611,0.007 -1.105802,0.007 H 9.8293515 Z"transform="matrix(5,0,0,5,0,-30)"/>
<pathd="m 14,-1.1445312 c -2.824372,0 -5.1445313,2.320159 -5.1445312,5.1445312 v 72 c 0,2.824372 2.3201592,5.144531 5.1445312,5.144531 h 52 c 2.824372,0 5.144531,-2.320159 5.144531,-5.144531 V 23.699219 a 1.1447968,1.1447968 0 0 0 -0.01563,-0.1875 C 70.977847,22.605363 70.406495,21.99048 70.007812,21.591797 L 48.208984,-0.20898438 C 47.606104,-0.81186474 46.804652,-1.1445313 46,-1.1445312 Z m 1.144531,6.2890624 H 42.855469 V 24 c 0,1.724372 1.420159,3.144531 3.144531,3.144531 H 64.855469 V 74.855469 H 15.144531 Z m 34,4.4179688 L 60.4375,20.855469 H 49.144531 Z"/>
</g>
<gstyle="fill:#4665A2;stroke-width:0">
<pathd="M 3.0307167,13.993174 V 7.0307167 h 2.7576792 2.7576792 v 1.8826151 c 0,1.2578262 0.0099,1.9287572 0.029818,2.0216512 0.03884,0.181105 0.168631,0.348218 0.33827,0.43554 l 0.1355017,0.06975 1.9598092,0.0079 1.959809,0.0078 v 4.749829 4.749829 H 8 3.0307167 Z"transform="matrix(5,0,0,5,0,-30)"/>
<pathd="M 9.8293515,9.0581469 V 7.9456453 l 1.1058025,1.1055492 c 0.608191,0.6080521 1.105802,1.1086775 1.105802,1.1125015 0,0.0038 -0.497611,0.007 -1.105802,0.007 H 9.8293515 Z"transform="matrix(5,0,0,5,0,-30)"/>
<pathd="M 5.6063709,24.951908 C 4.3924646,24.775461 3.4197129,23.899792 3.1031586,22.698521 L 3.0216155,22.389078 V 13.997725 5.6063709 L 3.1037477,5.2982247 C 3.3956682,4.2029881 4.1802788,3.412126 5.2787258,3.105917 5.5646428,3.0262132 5.6154982,3.0244963 8.0611641,3.0119829 l 2.4911989,-0.012746 1.932009,1.9300342 c 1.344142,1.3427669 1.976319,1.9498819 2.07763,1.9952626 0.137456,0.061571 0.474218,0.066269 6.006826,0.083795 l 5.861206,0.018568 0.29124,0.081916 c 1.094895,0.3079569 1.890116,1.109428 2.175567,2.192667 l 0.08154,0.3094425 V 16 22.389078 l -0.08154,0.309443 c -0.28446,1.079482 -1.086411,1.888085 -2.175567,2.193614 l -0.29124,0.0817 -10.302616,0.0049 c -5.700217,0.0027 -10.4001945,-0.0093 -10.5210471,-0.02684 z"/>
<pathd="M 5.6063709,24.951908 C 4.3924646,24.775461 3.4197129,23.899792 3.1031586,22.698521 L 3.0216155,22.389078 V 13.997725 5.6063709 L 3.1037477,5.2982247 C 3.3956682,4.2029881 4.1802788,3.412126 5.2787258,3.105917 5.5646428,3.0262132 5.6154982,3.0244963 8.0611641,3.0119829 l 2.4911989,-0.012746 1.932009,1.9300342 c 1.344142,1.3427669 1.976319,1.9498819 2.07763,1.9952626 0.137456,0.061571 0.474218,0.066269 6.006826,0.083795 l 5.861206,0.018568 0.29124,0.081916 c 1.094895,0.3079569 1.890116,1.109428 2.175567,2.192667 l 0.08154,0.3094425 V 16 22.389078 l -0.08154,0.309443 c -0.28446,1.079482 -1.086411,1.888085 -2.175567,2.193614 l -0.29124,0.0817 -10.302616,0.0049 c -5.700217,0.0027 -10.4001945,-0.0093 -10.5210471,-0.02684 z"/>
['tjtransform',['tjtransform',['../structtjtransform.html',1,'tjtransform'],['../group___turbo_j_p_e_g.html#gaa29f3189c41be12ec5dee7caec318a31',1,'tjtransform(): turbojpeg.h'],['../group___turbo_j_p_e_g.html#ga9cb8abf4cc91881e04a0329b2270be25',1,'tjTransform(tjhandle handle, const unsigned char *jpegBuf, unsigned long jpegSize, int n, unsigned char **dstBufs, unsigned long *dstSizes, tjtransform *transforms, int flags): turbojpeg.h']]],
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.