The loop in jsimd_quantize_neon() is only executed twice and should be
unrolled for AArch64 targets. GCC does that by default, but Clang 11
and later versions available at the time of this writing do not. This
patch adds an unroll pragma when targetting AArch64 with Clang. We do
not use the unroll pragma for AArch32 targets, because it causes the
Clang-generated assembly code to exhaust the available Neon registers
(32 x 64-bit) and spill to the stack. (DRC: Referring to the discussion
in #570, this is likely due to compiler confusion that results in poor
register allocation. It is possible to eliminate the spillage and
reduce the instruction count by loading the data on a just-in-time
basis, thus explicitly interleaving compute and I/O, but the performance
implications of that are currently unknown.)
The effects of unrolling the quantization loop are:
1) elimination of the loop control flow overhead and
2) enabling the use of LDP/STP instructions that work from a single
base pointer, instead of using double the number of LDR/STR
instructions, each requiring an address calculation.
Closes#570
- Suppress a UBSan warning regarding storing a 64-bit value to a
non-64-bit-aligned address. That behavior is technically undefined
per the C spec but is supported in the context of the AArch64
architecture and compilers.
- Explicitly promote block_diff[i] to unsigned int prior to left
shifting it, in order to avoid a UBSan warning. This warning also
described behavior that is technically undefined per the C spec but is
supported in the context of the AArch64 architecture and compilers.
Changing the type cast order eliminated the warning without changing
the generated assembly code.
Closes#582
- Make better use of 128-bit vector registers, thus reducing the number
of Neon instructions required to construct the AC coefficient bitmap.
- Refactor the Neon computations of 'nbits' and 'diff' to use shorter
and higher-throughput instruction sequences.
DRC's notes:
This commit partially integrates #570. Arm reported a 1-4% speedup on
Cortex-A55 and Neoverse-N1 cores when using recent compilers but little
or no speedup with Clang 10. I observed no speedup with Clang 10 on my
Cortex-A53 and Cortex-A72 cores. Thus, referring to #582, the primary
purpose of this commit is to fix UBSan warnings regarding the shift
operations previously located at Line 253:
d640a45730/simd/arm/aarch64/jchuff-neon.c (L253)
The primary purpose of this is to encourage adoption of libjpeg-turbo in
downstream Windows projects that forbid the use of "deprecated"
functions. libjpeg-turbo's usage of those functions was not actually
unsafe, because:
- libjpeg-turbo always checks the return value of fopen() and ensures
that a NULL filename can never be passed to it.
- libjpeg-turbo always checks the return value of getenv() and never
passes a NULL argument to it.
- The sprintf() calls in format_message() (jerror.c) could never
overflow the destination string buffer or leave it unterminated as
long as the buffer was at least JMSG_LENGTH_MAX bytes in length, as
instructed. (Regardless, this commit replaces those calls with
snprintf() calls.)
- libjpeg-turbo never uses sscanf() to read strings or multi-byte
character arrays.
- Because of b7d6e84d6a, wrjpgcom
explicitly checks the bounds of the source and destination strings
before calling strcat() and strcpy().
- libjpeg-turbo always ensures that the destination string is
terminated when using strncpy().
(548490fe5e made this explicit.)
Regarding thread safety:
Technically speaking, getenv() is not thread-safe, because the returned
pointer may be invalidated if another thread sets the same environment
variable between the time that the first thread calls getenv() and the
time that that thread uses the return value. In practice, however, this
could only occur with libjpeg-turbo if:
(1) A multithreaded calling application used the deprecated and
undocumented TJFLAG_FORCEMMX/TJFLAG_FORCESSE/TJFLAG_FORCESSE2 flags in
the TurboJPEG API or set one of the corresponding environment variables
(which are only intended for testing purposes.) Since the TurboJPEG API
library only ever passed string constants to putenv(), the only inherent
risk (i.e. the only risk introduced by the library and not the calling
application) was that the SIMD extensions may have read an incorrect
value from one of the aforementioned environment variables.
or
(2) A multithreaded calling application modified the value of the
JPEGMEM environment variable in one thread while another thread was
reading the value of that environment variable (in the body of
jpeg_create_compress() or jpeg_create_decompress().) Given that the
libjpeg API provides a thread-safe way for applications to modify the
default memory limit without using the JPEGMEM environment variable,
direct modification of that environment variable by calling applications
is not supported.
Microsoft's implementation of getenv_s() does not claim to be
thread-safe either, so this commit uses getenv_s() solely to mollify
Visual Studio. New inline functions and macros (GETENV_S() and
PUTENV_S) wrap getenv_s()/_putenv_s() when building for Visual Studio
and getenv()/setenv() otherwise, but GETENV_S()/PUTENV_S() provide no
advantages over getenv()/setenv() other than parameter validation. They
are implemented solely for convenience.
Technically speaking, strerror() is not thread-safe, because the
returned pointer may be invalidated if another thread changes the locale
and/or calls strerror() between the time that the first thread calls
strerror() and the time that that thread uses the return value. In
practice, however, this could only occur with libjpeg-turbo if a
multithreaded calling application encountered a file I/O error in
tjLoadImage() or tjSaveImage(). Since both of those functions
immediately copy the string returned from strerror() into a thread-local
buffer, the risk is minimal, and the worst case would involve an
incorrect error string being reported to the calling application.
Regardless, this commit uses strerror_s() in the TurboJPEG API library
when building for Visual Studio. Note that strerror_r() could have been
used on Un*x systems, but it would have been necessary to handle both
the POSIX and GNU implementations of that function and perform
widespread compatibility testing. Such is left as an exercise for
another day.
Fixes#568
- Since the ERREXITS() and TRACEMSS() macros are never used internally
(they are a relic of the legacy memory managers that libjpeg
provided), the only risk was that an external program might have
invoked one of those macros with a string longer than 79 characters
(JMSG_STR_PARM_MAX - 1).
- TJBench never invokes the THROW_TJ() macro with a string longer than
199 (JMSG_LENGTH_MAX - 1) characters, so there was no risk. However,
it's a good idea to explicitly terminate the destination strings so
that anyone looking at the code can immediately tell that it is safe.
The h2v2 (4:2:0) merged upsampler uses a spare row buffer so that it can
upsample two rows at a time but return only one row to the application,
if necessary. merged_2v_upsample() copies from this spare row buffer
into the application-supplied output buffer, using the out_row_width
field in the my_merged_upsampler struct to determine how many samples to
copy. out_row_width is set in jinit_merged_upsampler(), which is called
within the body of jpeg_start_decompress(). Since jpeg_crop_scanline()
must be called after jpeg_start_decompress(), jpeg_crop_scanline() must
modify the value of out_row_width if the h2v2 merged upsampler will be
used. Otherwise, merged_2v_upsample() can overflow the output buffer if
the number of bytes between the current output buffer position and the
end of the buffer is less than the number of bytes required to represent
an uncropped scanline of the output image. All of the destination
managers used by djpeg allocate either a whole image buffer or a
scanline buffer based on the uncropped output image width, so this issue
is not reproducible using djpeg.
Fixes#574
libjpeg-turbo has never supported non-ANSI C compilers. Per the spec,
ANSI C compilers must have locale.h, stddef.h, stdlib.h, memset(),
memcpy(), unsigned char, and unsigned short. They must also handle
undefined structures.
If NEON_INTRINSICS=0, then run the GAS sanity check from libjpeg-turbo
2.0.x and force-enable NEON_INTRINSICS if the test fails. This fixes
the AArch32 build when using Clang 6.0 on Linux or the Clang toolchain
in the Android NDK r15*-r16*. It also prevents users from manually
disabling NEON_INTRINSICS if doing so would break the build (such as
with Xcode 5.)
- Use check_c_source_compiles() rather than check_symbol_exists() to
detect the presence of vld1_s16_x3(), vld1_u16_x2(), and
vld1q_u8_x4(). check_symbol_exists() is unreliable for detecting
intrinsics, and in practice, it did not detect the presence of the
aforementioned intrinsics in versions of GCC that support them.
- Set DEFAULT_NEON_INTRINSICS=0 for GCC < 12, even if the aforementioned
intrinsics are available. The AArch64 back end in GCC 10 and 11
supports the necessary intrinsics, but the GAS implementation is still
faster when using those compilers.
Fixes#547
- Use JERR_NOTIMPL ("Not implemented yet") rather than JERR_NOT_COMPILED
("Requested feature was omitted at compile time") to indicate that
arithmetic coding is incompatible with Huffman table optimization.
This is more consistent with other parts of the libjpeg API code.
JERR_NOT_COMPILED is typically used to indicate that a major feature
was not compiled in, whereas JERR_NOTIMPL is typically used to
indicate that two features were compiled in but are incompatible with
each other (such as, for instance, two-pass color quantization and
partial image decompression.)
- Change the text of JERR_NOTIMPL to "Requested features are
incompatible". This is a more accurate description of the situation.
"Not implemented yet" implies that it may be possible to support the
requested combination of features in the future, but that is not true
in most of the cases where JERR_NOTIMPL is used.
Fixes#567
(regression introduced by aa7459050d)
cjpeg sets cinfo.in_color_space to JCS_RGB as an "arbitrary guess."
Since tjLoadImage() never uses JCS_RGB, the PGM reader should treat
JCS_RGB the same as JCS_UNKNOWN.
Fixes#566
This fixes an oversight from the integration of the arithmetic entropy
codec from libjpeg (66f97e6820). I chose
to integrate the latest implementation available at the time, which was
from jpeg-8b. However, I naively replaced cinfo->lim_Se with
DCTSIZE2 - 1, not realizing that-- because of SmartScale-- jpeg-8b
contains additional code
(https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-8b/jdinput.c#L249-L334)
that guards against illegal values of cinfo->Se >= DCTSIZE2. Thus,
libjpeg-turbo's implementation of arithmetic decoding has never guarded
against such illegal values. This commit restores the relevant check
from the original jpeg-6b arithmetic entropy codec patch ("jpeg-ari",
1e247ac854).
Fixes#564
CIFuzz runs the project's fuzzers for a limited period of time any time
a commit is pushed or a PR is submitted. This is not intended to
replace OSS-Fuzz but rather to allow us to more quickly catch some
fuzzing failures, including fuzzer build regressions like the one
introduced in ecf021bc0d.
Closes#559
Recent FreeBSD/PowerPC compilers, such as Clang 11.0.x on FreeBSD 13, do
the equivalent of passing -maltivec to the compiler by default, so
run-time AltiVec detection is unnecessary. However, it becomes
necessary when using other compilers or when passing -mno-altivec to the
compiler.
Closes#552
- Don't check for exceptions immediately after invoking the
GetPrimitiveArrayCritical() method. That method does not throw
exceptions, and checking for them caused -Xcheck:jni to warn about
calling other JNI functions in the scope of
Get/ReleasePrimitiveArrayCritical().
- Check for exceptions immediately after invoking the
CallStaticObjectMethod() method in the PROP2ENV() macro.
- Don't use the Get/ReleasePrimitiveArrayCritical() methods for small
arrays. -Xcheck:jni didn't complain about that, but there is no
performance advantage to using those methods rather than the
Get*ArrayRegion() methods for small arrays, and using
Get*ArrayRegion() makes the code less error-prone.
- Don't release the source/destination planes arrays in the YUV methods
until after the corresponding C TurboJPEG functions have returned.
Referring to https://cmake.org/cmake/help/latest/policy/CMP0065.html,
CMake 3.3 and earlier automatically added compiler/linker flags such as
-rdynamic/-export-dynamic, which caused symbols to be exported from
executables. The primary purpose of this is to allow plugins loaded via
dlopen() to access symbols from the calling program. libjpeg-turbo
does not need this functionality, and enabling it needlessly increases
the size of the libjpeg-turbo executables.
Setting CMP0065 to NEW when using CMake 3.4 and later prevents CMake
from automatically adding the aforementioned compiler/linker flags
unless the ENABLE_EXPORTS property is set for a target (or the
CMAKE_ENABLE_EXPORTS variable is set, which causes ENABLE_EXPORTS to be
set for all targets.)
Closes#554
When building for 32-bit Arm platforms, test whether basic Neon
intrinsics will compile with the specified compiler and C flags. This
prevents the build system from enabling the Neon SIMD extensions when
targetting Armv6 and other legacy architectures that do not support Neon
instructions.
Regression introduced by bbd8089297.
(Checking whether gas-preprocessor.pl was needed for 32-bit Arm builds
had the effect of checking whether Neon instructions were supported.)
Fixes#553
We would have been perfectly happy for AppVeyor to delete all artifacts
other than those from the latest builds in each branch. Instead, they
chose to change the global artifact retention policy to 1 month. In
addition to being unworkable, that new policy uses more storage than the
policy we requested. Lose/lose, so we'll just deploy to S3 like we do
with other platforms.
This issue was introduced in 5557fd2217
due to an oversight, so it has existed in libjpeg-turbo since the
project's inception. However, the issue is effectively a non-issue.
Although #325 proposes allowing programs to override jpeg_get_*() and
jpeg_free_*() externally, there is currently no way to override those
functions without modifying the libjpeg-turbo source code.
libjpeg-turbo only includes the malloc()/free() memory manager from
libjpeg, and the implementation of jpeg_free_*() in that memory manager
ignores the size argument. libjpeg had several additional memory
managers for legacy systems (MS-DOS, System 7, etc.), but those memory
managers ignored the size argument to jpeg_free_*() as well. Thus, this
issue would have only potentially affected custom memory managers in
downstream libjpeg-turbo forks, and since no one has complained until
now, apparently those are rare.
Fixes#542
Attempting to losslessly transform certain malformed JPEG images can
cause the nbits table index in the Huffman encoder to exceed 32768, so
we need to pad the SSE2 implementation of that table to 65536 entries as
we do with the C implementation.
Regression introduced by 087c29e07fFixes#543
This is the same error that d147be83e9
suppressed in decode_mcu_slow(). The image that reproduces this error
in decode_mcu_fast() has been added to the libjpeg-turbo seed corpora.
Closes#537
Although sizeof(void *) == sizeof(size_t) for all architectures that are
currently supported by libjpeg-turbo, such is not guaranteed by the C
standard. Specifically, CHERI-enabled architectures (e.g. CHERI-RISC-V
or Arm's Morello) use capability pointers that are twice the size of
size_t (128 bits for Morello and RV64), so casting to size_t strips the
upper bits of the pointer (including the validity bit) and makes it
non-deferenceable, as indicated by the following compiler warning:
warning: cast from provenance-free integer type to pointer type will
give pointer that can not be dereferenced
[-Werror,-Wcheri-capability-misuse]
cvalue = values = (JCOEF *)PAD((size_t)values_unaligned, 16);
Ignoring this warning results in a run-time crash. Casting pointers to
uintptr_t, if it is available, avoids this problem, since uintptr_t is
defined as an unsigned integer type that can hold a pointer value.
Since C89 compatibility is still necessary in libjpeg-turbo, this commit
introduces a new typedef for pointer-to-integer casts that uses a
GNU-specific extension available in GCC 4.6+ and Clang 3.0+ and falls
back to using size_t if the extension is unavailable. The only other
options would require C99 or Clang-specific builtins.
Closes#538
'buffer' is both passed into the inline assembly code and modified by
it. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, 6.47.2.3.
With GCC 4, this commit does not change the generated assembly code at
all.
With GCC 8, this commit fixes an assembly error:
/tmp/{foo}.s: Assembler messages:
/tmp/{foo}.s:775: Error: registers may not be the same --
`str r9,[r9],#4'
I'm not sure why that error went unnoticed, since I definitely
benchmarked the previous commit with GCC 8. Anyhow, this commit changes
the generated assembly code slightly but does not alter performance.
With Clang 10, this commit changes the generated assembly code slightly
but does not alter performance.
Refer to #529
Referring to the C standard
(http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf,
J.2 Undefined behavior), the behavior of the compiler is undefined if
"conversion between two pointer types produces a result that is
incorrectly aligned." Thus, the behavior of this code
*((uint32_t *)buffer) = BUILTIN_BSWAP32(put_buffer);
in the AArch32 version of the FLUSH() macro is undefined unless 'buffer'
is 32-bit-aligned. Referring to
https://bugs.llvm.org/show_bug.cgi?id=50785, certain versions of Clang,
when generating Thumb (T32) instructions, miscompile that code into an
assembly instruction (stm) that requires the destination to be
32-bit-aligned. Since such alignment cannot be guaranteed within the
Huffman encoder, this reportedly led to crashes (SIGBUS: illegal
alignment) with AArch32/Thumb builds of libjpeg-turbo running on Android
devices, although thus far I have been unable to reproduce those crashes
with a plain Linux/Arm system.
The miscompilation is visible with the Compiler Explorer:
https://godbolt.org/z/rv1ccx1Pb
However, it goes away when removing the return statement from the
function. Thus, it seems that Clang's behavior in this regard is
somewhat variable, which may explain why the crashes are only
reproducible on certain platforms.
The suggested workaround is to use memcpy(), but whereas Clang and
recent GCC releases are smart enough to compile a 4-byte memcpy() call
into a str instruction, GCC < 6 is not. Referring to
https://godbolt.org/z/ae7Wje3P6, the only way to consistently produce
the desired str instruction across all supported compilers is to use
inline assembly. Visual C++ presumably does not miscompile the code in
question, since no issues have been reported with it, but since the code
relies on undefined compiler behavior, prudence dictates that
e4ec23d7ae should be reverted for Visual
C++, which this commit does. The performance impact of
e4ec23d7ae for Visual C++/Arm builds is
unknown (I have no ability to test such builds), but regardless, this
commit reverts the Visual C++/Arm performance to that of libjpeg-turbo
2.1 beta1.
Closes#529
This resolves a conflict between the RPM generated by the libjpeg-turbo
build system and the Red Hat 'filesystem' RPM if
CMAKE_INSTALL_LIBDIR=/usr/lib[64]. This code was largely borrowed from
the VirtualGL RPM spec. (I can legally do that because I hold the
copyright on VirtualGL's implementation.)
Fixes#532
Arm compilers have three floating point ABI options:
'soft' compiles floating point operations as function calls into a
software floating point library, which emulates floating point
operations using integer operations. Floating point function arguments
are passed using integer registers.
'softfp' also compiles floating point operations as function calls into
a floating point library and passes floating point function arguments
using integer registers, but the floating point library functions can
use FPU instructions if the CPU supports them.
'hard' compiles floating point operations into inline FPU instructions,
similarly to x86 and other architectures, and passes floating point
function arguments using FPU registers.
Not all AArch32 CPUs have FPUs or support Neon instructions, so on Linux
and Android platforms, the AArch32 SIMD dispatcher in libjpeg-turbo only
enables the Neon SIMD extensions at run time if /proc/cpuinfo indicates
that the CPU supports Neon instructions or if Neon instructions are
explicitly enabled (e.g. by passing -mfpu=neon to the compiler.) In
order to support all AArch32 CPUs using the same code base, i.e. to
support run-time FPU and Neon auto-detection, it is necessary to compile
the scalar C source code using -mfloat-abi=soft. However, the 'soft'
floating point ABI cannot be used when compiling Neon intrinsics, so the
intrinsics implementation of the Neon SIMD extensions must be compiled
using -mfloat-abi=softfp if the scalar C source code is compiled using
-mfloat-abi=soft.
This commit modifies the build system so that it detects whether
-mfloat-abi=softfp must be explicitly added to the compiler flags when
building the intrinsics implementation of the Neon SIMD extensions.
This will be necessary if the build is using the 'soft' floating
point ABI along with run-time auto-detection of Neon instructions.
Fixes#523
The existing /*FALLTHROUGH*/ comments work with GCC but not Clang, so
this commit adds a FALLTHROUGH macro that uses the 'fallthrough'
attribute if the compiler supports it.
Refer to https://bugs.chromium.org/p/chromium/issues/detail?id=995993
NOTE: All versions of GCC that support -Wimplicit-fallthrough also
support the 'fallthrough' attribute, but certain other compilers (Oracle
Solaris Studio, for instance) support /*FALLTHROUGH*/ but not the
'fallthrough' attribute. Thus, this commit retains the /*FALLTHROUGH*/
comments, which have existed in the libjpeg code base in some form since
1994 (libjpeg v5.)
Closes#531
Referring to #527, the security community did not assign this CVE ID
until more than 8 months after the fix for the issue was released. By
the time they assigned the ID, libjpeg-turbo already had two production
releases containing the fix. This calls into question the usefulness of
assigning a CVE ID to the issue, particularly given that the buffer
overrun in question was fully contained in the stack, not detectable
with valgrind, and confined to lossless transformation (it did not
affect JPEG compression or decompression.)
https://vuldb.com/?id.176175
says that "the exploitability is told to be easy" but provides no
clarification, and given that the author of that page does not seem to
be aware that a fix for the issue has been available since early
December of 2019, it calls into question the accuracy of everything else
on the page.
It would really be nice if the security community approached me about
these things before wasting my time, but I guess it's my lot in life to
modify a change log entry from 2019 to include a CVE ID from 2020.
So it goes...
In theory, all objects that will be included in a Un*x shared library
must be built using PIC. In practice, most compilers don't require PIC
to be explicitly specified for jsimd_none.o, either because the compiler
automatically enables PIC in all cases (Ubuntu) or because the size of
the generated object is too small. But some rare compilers do require
PIC to be explicitly specified for jsimd_none.o.
Fixes#520
Our workflow script does not currently work with tags, and there is no
point to building tags anyhow, since we do not use the CI system to spin
official builds.
When using the in-memory destination manager, it is necessary to
explicitly call the destination manager's term_destination() method if
an error occurs. That method is called by jpeg_finish_compress() but
not by jpeg_abort_compress().
This fixes a potential double free() that could occur if tjCompress*()
or tjTransform() returned an error and the calling application tried to
clean up a JPEG buffer that was dynamically re-allocated by one of those
functions.
- UBSan complained that entropy->restarts_to_go was underflowing an
unsigned integer when it was decremented while
cinfo->restart_interval == 0. That was, of course, completely
innocuous behavior, since the result of the underflowing computation
was never used.
- d3a3a73f64 and
7bc9fca430 silenced a UBSan signed
integer overflow error, but unfortunately other malformed JPEG images
have been discovered that cause unsigned integer overflow in the same
computation. Since, to the best of our understanding, this behavior
is innocuous, this commit reverts the commits listed above, suppresses
the UBSan errors, and adds code comments to document the issue.
Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1050342,
there are certain very rare circumstances under which a malformed JPEG
image can cause different Huffman decoder output to be produced,
depending on the size of the source manager's I/O buffer. (More
specifically, the fast Huffman decoder didn't handle invalid codes in
the same manner as the slow decoder, and since the fast decoder requires
all data to be memory-resident, the buffering strategy determines
whether or not the fast decoder can be used on a particular MCU block.)
After extensive experimentation, the Mozilla and Chrome developers and I
determined that this truly was an innocuous issue. The patch that both
browsers adopted as a workaround caused a performance regression with
32-bit code, which is why it was not accepted into libjpeg-turbo. This
commit fixes the problem in a less disruptive way with no performance
regression.
After the completion of the start_input() method, it's too late to check
the image size, because the image readers may have already tried to
allocate memory for the image. If the width and height are excessively
large, then attempting to allocate memory for the image could slow
performance or lead to out-of-memory errors prior to the fuzz target
checking the image size.
NOTE: Specifically, the aforementioned OOM errors and slow units were
observed with the compression fuzz targets when using MSan.
Certain rare malformed input images can cause the Huffman encoder to
generate a value for nbits that corresponds to an uninitialized member
of the DC code table. The ramifications of this are minimal and would
basically amount to a different bogus JPEG image being generated from a
particular bogus input image.
- Referring to 3311fc0001, we need to use
unsigned intermediate math in order to make UBSan happy, even though
(JDIMENSION)(A * B) is effectively the same as
(JDIMENSION)A *(JDIMENSION)B, regardless of intermediate overflow.
- Because of the previous commit, it is now possible for bfOffBits to be
INT_MIN, which would cause the initial computation of bPad to
underflow a signed integer. Thus, we need to check for that
possibility as soon as we know the values of bfOffBits and headerSize.
The worst case from this regression is that bPad could wrap around to
a large positive value, which would cause a "Premature end of input
file" error in the subsequent read_byte() loop. Thus, this issue was
effectively innocuous as well, since it resulted in catching the same
error later and in a different way. Also, the issue was very
well-contained, since it was both introduced and fixed as part of the
ongoing OSS-Fuzz integration project.
- rdbmp.c: Because of 8fb37b8171,
bfOffBits, biClrUsed, and headerSize were made into unsigned ints.
Thus, if bPad would eventually be negative due to a malformed header,
UBSan complained about unsigned math being used in the intermediate
computations. It was unnecessary to make those variables unsigned,
since they are only meant to hold small values, so this commit makes
them signed again. The UBSan error was innocuous, because it is
effectively (if not officially) the case that
(int)((unsigned int)a - (unsigned int)b) == (int)a - (int)b.
- rdbmp.c: If (biWidth * source->bits_per_pixel / 8) would overflow an
unsigned int, then UBSan complained at the point at which row_width
was set in start_input_bmp(), even though the overflow would have been
detected later in the function. This commit adds overflow checks
prior to setting row_width.
- rdppm.c: read_pbm_integer() now bounds-checks the intermediate
value computations in order to catch integer overflow caused by a
malformed text PPM. It's possible, though extremely unlikely, that
the intermediate value computations could have wrapped around to a
value smaller than maxval, but the worst case is that this would have
generated a bogus pixel in the uncompressed image rather than throwing
an error.
A fuzzing test case that was effectively a 1-pixel PGM file with a
maximum value of 1 and an actual value of 8 caused an uninitialized
member of the rescale[] array to be accessed in get_gray_rgb_row() or
get_gray_cmyk_row(). Since, for performance reasons, those functions do
not perform bounds checking on the PPM values, we need to ensure that
unused members of the rescale[] array are initialized.
A fuzzing test case with an image width of 838860946 triggered a UBSan
error:
rdbmp.c:633:34: runtime error: signed integer overflow:
838860946 * 3 cannot be represented in type 'int'
Because the result is cast to an unsigned int (JDIMENSION), this error
is irrelevant, because
(unsigned int)((int)838860946 * (int)3) ==
(unsigned int)838860946 * (unsigned int)3
This limits the tjLoadImage() behavioral changes to the scope of the
compress_fuzzer target. Otherwise, TJBench in fuzzer builds would
refuse to load images larger than 1 Mpixel.
- The PPM reader now throws an error rather than segfaulting (due to a
buffer overrun) if an application attempts to load a 16-bit PPM file
into a grayscale uncompressed image buffer. No known applications
allowed that (not even the test applications in libjpeg-turbo),
because that mode of operation was never expected to work and did not
work under any circumstances. (In fact, it was necessary to modify
TJBench in order to reproduce the issue outside of a fuzzing
environment.) This was purely a matter of making the library bow out
gracefully rather than crash if an application tries to do something
really stupid.
- The PPM reader now throws an error rather than generating incorrect
pixels if an application attempts to load a 16-bit PGM file into an
RGB uncompressed image buffer.
- The PPM reader now correctly loads 16-bit PPM files into extended
RGB uncompressed image buffers. (Previously it generated incorrect
pixels unless the input colorspace was JCS_RGB or JCS_EXT_RGB.)
The only way that users could have potentially encountered these issues
was through the tjLoadImage() function. cjpeg and TJBench were
unaffected.
The non-default options were not being tested because of a pixel format
comparison buglet. This commit also changes the code in both
decompression fuzz targets such that non-default options are tested
based on the pixel format index rather than the pixel format value,
which is a bit more idiot-proof.
Otherwise, the targets will require libstdc++, the i386 version of which
is not available in the OSS-Fuzz runtime environment. The OSS-Fuzz
build environment passes -stdlib:libc++ in the CXXFLAGS environment
variable in order to mitigate this issue, since the runtime environment
has the i386 version of libc++, but using that compiler flag requires
using the C++ compiler.
This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo
source tree, thus obsoleting and improving code coverage relative to
Google's OSS-Fuzz target for libjpeg-turbo (previously available here:
https://github.com/google/oss-fuzz).
I hope to eventually create fuzz targets for the BMP, GIF, and PPM
readers as well, which would allow for fuzz-testing compression, but
since those readers all require an input file, it is unclear how to
build an efficient fuzzer around them. It doesn't make sense to
fuzz-test compression in isolation, because compression can't accept
arbitrary input data.
Define compiler-independent byte-swap macros and use them instead of
executing 'rev' via inline assembly code with GCC-compatible compilers
or a slow shift-store sequence with Visual C++.
* This produces identical assembly code with:
- 64-bit GCC 8.4.0 (Linux)
- 64-bit GCC 9.3.0 (Linux)
- 64-bit Clang 10.0.0 (Linux)
- 64-bit Clang 10.0.0 (MinGW)
- 64-bit Clang 12.0.0 (Xcode 12.2, macOS)
- 64-bit Clang 12.0.0 (Xcode 12.2, iOS)
* This produces different assembly code with:
- 64-bit GCC 4.9.1 (Linux)
- 32-bit GCC 4.8.2 (Linux)
- 32-bit GCC 8.4.0 (Linux)
- 32-bit GCC 9.3.0 (Linux)
Since the intrinsics implementation of Huffman encoding is not used
by default with these compilers, this is not a concern.
- 32-bit Clang 10.0.0 (Linux)
Verified performance neutrality
Closes#507
(regression introduced by 16bd984557)
This implements the same fix for
jsimd_encode_mcu_AC_refine_prepare_sse2() that
a81a8c137b implemented for
jsimd_encode_mcu_AC_first_prepare_sse2().
Based on:
1a59587397eb176a91d8Fixes#509Closes#510
Referring to https://bugzilla.redhat.com/show_bug.cgi?id=1937385#c2,
it is my opinion that the severity of this bug was grossly overstated
and that a CVE never should have been assigned to it, but since one was
assigned, users need to know which version of libjpeg-turbo contains
the fix.
Dear security community, please learn what "DoS" actually means and stop
misusing that term for dramatic effect. Thanks.
* commit '8a2cad020171184a49fa8696df0b9e267f1cf2f6': (99 commits)
Build: Handle CMAKE_OSX_ARCHITECTURES=(i386|ppc)
Add Sponsor button for GitHub repository
Build: Support CMAKE_OSX_ARCHITECTURES
cjpeg: Fix FPE when compressing 0-width GIF
Fix build with Visual C++ and /std:c11 or /std:c17
Neon: Fix Huffman enc. error w/Visual Studio+Clang
Use CLZ compiler intrinsic for Windows/Arm builds
Build: Use correct SIMD exts w/VStudio IDE + Arm64
jcphuff.c: Fix compiler warning with clang-cl
Migrate from Travis CI to GitHub Actions
tjexample.c: Fix mem leak if tjTransform() fails
Build: Officially support Ninja
decompress_smooth_data(): Fix another uninit. read
LICENSE.md: Remove trailing whitespace
Build: Test for correct AArch32 RPM/DEBARCH value
LICENSE.md: Formatting tweak
Fix uninitialized read in decompress_smooth_data()
Fix buffer overrun with certain narrow prog JPEGs
Bump revision to 2.0.91 for post-beta fixes
Travis: Use Docker tag that matches Git branch
...
We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo
anymore, but they still work (bearing in mind that PowerPC builds
require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or
earlier.) Referring to #495, apparently MacPorts needs this
functionality.
The GNU builtin function __builtin_clzl() accepts an unsigned long
argument, which is 8 bytes wide on LP64 systems (most Un*x systems,
including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused
the Neon intrinsics implementation of Huffman encoding to produce
mathematically incorrect results when compiled using Visual Studio with
Clang.
This commit changes all invocations of __builtin_clzl() in the Neon SIMD
extensions to __builtin_clzll(), which accepts an unsigned long long
argument that is guaranteed to be 8 bytes wide on all systems.
Fixes#480Closes#490
The __builtin_clz() compiler intrinsic was already used in the C Huffman
encoders when building libjpeg-turbo for Arm CPUs using a GCC-compatible
compiler. This commit modifies the C Huffman encoders so that they also
use__builtin_clz() when building for Arm CPUs using Visual Studio +
Clang, as well as the equivalent _CountLeadingZeros() compiler intrinsic
when building for Arm CPUs using Visual C++.
In addition to making the C Huffman encoders faster on Windows/Arm, this
also prevents jpeg_nbits_table from being included in Windows/Arm builds,
thus saving 128 KB of memory.
When configuring a Visual Studio IDE build and passing -A arm64 to
CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE
based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of
CMAKE_SYSTEM_PROCESSOR.
Note that this removes our ability to regression test the Armv8 and
PowerPC SIMD extensions, effectively reverting
a524b9b06b and
02227e48a9, but at the moment, there is no
other way.
Regression introduced by 42825b68d5
The test case
https://user-images.githubusercontent.com/3491627/101376530-fde56180-38b0-11eb-938d-734119a5b5ba.jpg
is a malformed progressive JPEG image containing an interleaved Y/Cb/Cr
DC scan followed by two non-interleaved Y DC scans. Thus, the
prev_coef_bits[] array was initialized for the Y component but not the
other components, the uninitialized values for Cb and Cr were
transferred to the prev_coef_bits_latch[] array in smoothing_ok(), and
because cinfo->master->last_good_iMCU_row was 0,
decompress_smooth_data() read those uninitialized values when attempting
to smooth the second iMCU row.
Possibly fixes#478
Regression introduced by 6d91e950c8
last_block_column in decompress_smooth_data() can be 0 if, for instance,
decompressing a 4:4:4 image of width 8 or less or a 4:2:2 or 4:2:0 image
of width 16 or less. Since last_block_column is an unsigned int,
subtracting 1 from it produced 0xFFFFFFFF, the test in line 590 passed,
and we attempted to access blocks from a second block column that didn't
actually exist.
Closes#476
- Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for
compile-time detection of Arm builds, since __arm__ and __aarch64__
are only present in GNU-compatible compilers.
- Neon/intrinsics: Use the _CountLeadingZeros() and
_CountLeadingZeros64() intrinsics provided by Visual Studio, since
__builtin_clz() and __builtin_clzl() are only present in
GNU-compatible compilers.
- Neon/intrinsics: Since Visual Studio does not support static vector
initialization, replace static initialization of Neon vectors with the
appropriate intrinsics. Compared to the static initialization
approach, this produces identical assembly code with both GCC and
Clang.
- Neon/intrinsics: Since Visual Studio does not support inline assembly
code, provide alternative code paths for Visual Studio whenever inline
assembly is used.
- Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds
(Visual Studio does not emit fused multiply-add [FMA] instructions by
default for such builds.)
- Neon/intrinsics: Move temporary buffer allocation outside of nested
loops. Since Visual Studio configures Arm builds with a relatively
small amount of stack memory, attempting to allocate those buffers
within the inner loops caused a stack overflow.
Closes#461Closes#475
Otherwise, because the file begins with an ASCII header, Git will
erroneously treat is as an ASCII file, and if Git for Windows is
configured with default options (specifically, "Checkout windows-style,
commit Unix-style line endings"), it will add carriage return characters
to all of the "linefeed" characters in the PPM file, thus corrupting it
and causing libjpeg-turbo's regression tests to fail.
There doesn't seem to be any performance or compatibility downside to
this, and it has the advantages of simplicity and consistency between
the PR and official builds.
- Rename IOS_ARMV8_BUILD to ARMV8_BUILD.
- Rename install_ios() to install_subbuild() in makemacpkg.
- Wordsmith the build instructions accordingly.
- Use xcode12.2 image in Travis CI.
This error occurs at the call to (*cinfo->cconvert->color_convert)() in
sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE
(i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is
innocuous, since (*cinfo->cconvert->color_convert)() points to a dummy
function (noop_convert()) in that case.
Fixes#470
The "32bit" vs. "64bit" floating point test results actually have
nothing to do with the FPU. That was a fallacious assumption based on
the observation that, with multiple CPU types, 32-bit and 64-bit builds
produce different floating point test results. It seems that this is,
in fact, due to differing compiler behavior-- more specifically, whether
fused multiply-add (FMA) instructions are used to combine multiple
floating point operations into a single instruction ("floating point
expression contraction".) GCC does this by default if the target
supports FMA instructions, which PowerPC and AArch64 targets both do.
Fixes#468
This allows the Neon intrinsics code to be built successfully (albeit
likely with reduced run-time performance) with Xcode 5.0-6.2
(iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2
will not build the Armv8 GAS code without gas-preprocessor.pl, and no
version of Xcode will build the Armv7 GAS code without
gas-preprocessor.pl, so we always use the full Neon intrinsics
implementation by default with macOS and iOS builds.
Auto-detecting the completeness of the compiler's set of Neon intrinsics
also allows us to more intelligently set the default value of
NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a
reasonable, albeit imperfect, proxy for whether a compiler has a full
and optimal set of Neon intrinsics. Specific notes:
- 64-bit RGB-to-YCbCr color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer forward DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit Huffman encoding
uses vld1q_u8_x4(), regresses with GCC
- 64-bit YCbCr-to-RGB color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
- 32-bit RGB-to-YCbCr color conversion:
uses vld1_u16_x2(), regresses with GCC
- 32-bit accurate integer forward DCT
uses vld1_s16_x3(), regression irrelevant because there was no
previous implementation
- 32-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 32-bit fast integer inverse DCT
does not use any of the intrinsics in question, regresses with GCC
- 32-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
Presumably when GCC includes a full and optimal set of Neon intrinsics,
the HAVE_VLD1* tests will pass, and the full Neon intrinsics
implementation will be enabled automatically.
- Remove gas-preprocessor.pl. None of the compilers that can build the
new intrinsics implementation require gas-preprocessor.pl (tested
with Xcode and with Clang 3.9+ for Linux.)
- Document that Xcode 6.3.x or later is now required for iOS builds
(older versions of Xcode do not have a full set of Neon intrinsics.)
- Add a change log entry.
- Do not enable the ASM CMake language unless NEON_INTRINSICS is false.
- Add a Clang/Arm64 test to .travis.yml in order to test the new
intrinsics implementation.
Closes#455
There was no previous GAS implementation.
NOTE: This doesn't produce much of a speedup when using -O3, because -O3
already enables Neon autovectorization, which works well for the scalar
C implementation of plain upsampling. However, the Neon SIMD
implementation will benefit other optimization levels.
There was no previous GAS implementation.
This commit also reverts 40557b2301 and
7723d7f7d0.
7723d7f7d0 was only necessary because
there was no Neon implementation of merged upsampling/color conversion,
and 40557b2301 was only necessary because
of 7723d7f7d0.
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using the new
NEON_INTRINSICS CMake variable.
The previous AArch32 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch64 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch32 GAS implementation of h2v1 fancy upsampling has
been removed, since the intrinsics implementation provides the same or
better performance. There was no previous GAS implementation of h2v2
fancy upsampling, and there was no previous AArch64 GAS implementation
of h2v1 fancy upsampling.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. There was no previous AArch32 GAS implementation.
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using a new
NEON_INTRINSICS CMake variable.
Regression caused by
a46c111d9f
Because of 7723d7f7d0, which was
introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/
color conversion is disabled on platforms that have SIMD-accelerated
YCbCr -> RGB color conversion but not SIMD-accelerated merged
upsampling/color conversion. This was intended to improve performance
with the Neon SIMD extensions, since those are the only SIMD extensions
for which those circumstances apply. Under normal circumstances, the
separate "plain" (non-fancy) upsampling and color conversion routines
will produce bitwise-identical output to the merged upsampling/color
conversion routines, but that is not the case when skipping scanlines
starting at an odd-numbered scanline. The modified test introduced in
a46c111d9f does precisely that in order to
validate the fixes introduced in
9120a24743 and
a46c111d9f.
Because of 7723d7f7d0, the segfault fixed
in 9120a24743 and
a46c111d9f didn't affect the Neon SIMD
extensions, so this commit effectively reverts the test modifications in
a46c111d9f when using those SIMD
extensions. We can get rid of this hack, as well as
7723d7f7d0, once a Neon implementation of
merged upsampling/color conversion is available.
- Refer to the "slow" [I]DCT algorithms as "accurate" instead, since
they are not slow under libjpeg-turbo.
- Adjust documentation claims to reflect the fact that the "slow" and
"fast" algorithms produce about the same performance on AVX2-equipped
CPUs (because of the dual-lane nature of AVX2, it was not possible to
accelerate the "fast" algorithm beyond what was achievable with SSE2.)
Also adjust the claims to reflect the fact that the "fast" algorithm
tends to be ~5-15% faster than the "slow" algorithm on
non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo
SIMD extensions.
- Indicate the legacy status of the "fast" and float algorithms in the
documentation and cjpeg/djpeg usage info.
- Remove obsolete paragraph in the djpeg man page that suggested that
the float algorithm could be faster than the "fast" algorithm on some
CPUs.
It is our convention to use the term "region" when referring to crop
specs, since this is more consistent with the terminology used by the
rest of the image processing community.
The checkstyle script was hastily developed prior to libjpeg-turbo 2.0
beta1, so it has a lot of exceptions and is thus prone to false
negatives. This commit eliminates some of those exceptions.
- Restore GIF read/compressed GIF write support from jpeg-6a and
jpeg-9d.
- Integrate jpegtran -wipe and -drop options from jpeg-9a and jpeg-9d.
- Integrate jpegtran -crop extension (for expanding the image size) from
jpeg-9a and jpeg-9d.
- Integrate other minor code tweaks from jpeg-9*
- Set CPU_TYPE=arm if performing a 32-bit build on an AArch64 system.
This eliminates the need to use a CMake toolchain file.
- Set RPMARCH=armv7hl if building on a 32-bit Arm system with an FPU.
- Set RPMARCH=armv7hl and DEBARCH=armhf if performing a 32-bit build
using a gnueabihf toolchain.
- If performing a 32-bit Arm build, generate a 32-bit supplementary DEB
package for AArch64 systems.
This commit modifies decompress_smooth_data(), adding missing MCU column
offsets to the prev_block_row and next_block_row indices that are used
for block rows other than the first and last. Effectively, this
eliminates unexpected visual artifacts when using jpeg_crop_scanline()
along with interblock smoothing while decompressing the DC scan of a
progressive JPEG image.
Based on:
0227d4fb48Fixes#456Closes#457
* tag '2.0.5':
TurboJPEG: Make global error handling thread-safe
ChangeLog.md: Add missing sub-header for 2.0.5
ChangeLog.md: List CVE ID fixed by previous commit
rdppm.c: Fix buf overrun caused by bad binary PPM
Build: Add missing jpegtran-icc test dependency
rdswitch.c: Eliminate spaces before semicolons
TJCompressor.compress(int): Fix YUV-to-JPEG error
Bump version to 2.0.5; Document previous commit
MIPS DSPr2: Work around various 'make test' errors
MIPS DSPr2: Fix compiler warning with -mdspr2
MIPS SIMD: Always honor JSIMD_FORCE* env vars
Test: Honor CMAKE_CROSSCOMPILING_EMULATOR variable
- Introduce a partial image decompression regression test script that
validates the correctness of jpeg_skip_scanlines() and
jpeg_crop_scanlines() for a variety of cropping regions and libjpeg
settings.
This regression test catches the following issues:
#182, fixed in 5bc43c7821#237, fixed in 6e95c08649794f5018608f37250026a45ead2db8
#244, fixed in 398c1e9acc#441, fully fixed in this commit
It does not catch the following issues:
#194, fixed in 773040f9d9#244 (additional segfault), fixed in
9120a24743
- Modify the libjpeg-turbo regression test suite (make test) so that it
checks for the issue reported in #441 (segfault in
jpeg_skip_scanlines() when used with 4:2:0 merged upsampling/color
conversion.)
- Fix issues in jpeg_skip_scanlines() that caused incorrect output with
h2v2 (4:2:0) merged upsampling/color conversion. The previous commit
fixed the segfault reported in #441, but that was a symptom of a
larger problem. Because merged 4:2:0 upsampling uses a "spare row"
buffer, it is necessary to allow the upsampler to run when skipping
rows (fancy 4:2:0 upsampling, which uses context rows, also requires
this.) Otherwise, if skipping starts at an odd-numbered row, the
output image will be incorrect.
- Throw an error if jpeg_skip_scanlines() is called with two-pass color
quantization enabled. With two-pass color quantization, the first
pass occurs within jpeg_start_decompress(), so subsequent calls to
jpeg_skip_scanlines() interfere with the multipass state and prevent
the second pass from occurring during subsequent calls to
jpeg_read_scanlines().
The additional segfault mentioned in #244 was due to the fact that
the merged upsamplers use a different private structure than the
non-merged upsamplers. jpeg_skip_scanlines() was assuming the latter, so
when merged upsampling was enabled, jpeg_skip_scanlines() clobbered one
of the IDCT method pointers in the merged upsampler's private structure.
For reasons unknown, the test image in #441 did not encounter this
segfault (too small?), but it encountered an issue similar to the one
fixed in 5bc43c7821, whereby it was
necessary to set up a dummy postprocessing function in
read_and_discard_scanlines() when merged upsampling was enabled.
Failing to do so caused either a segfault in merged_2v_upsample() (due
to a NULL pointer being passed to jcopy_sample_rows()) or an error
("Corrupt JPEG data: premature end of data segment"), depending on the
number of scanlines skipped and whether the first scanline skipped was
an odd- or even-numbered row.
Fixes#441Fixes#244 (for real this time)
IIRC, this was only necessary with the version of Java 1.5 that shipped
with OS X 10.4 "Tiger". Apple's implementation of Java 6 ("Java for
OS X Systems") supported both .jnilib and .dylib extensions for JNI
libraries, but Oracle's implementation of Java has only ever supported
the .dylib extension.
The scales have now tilted overwhelmingly in favor of eliminating
support for 32-bit Macs:
- 32-bit applications are only necessary in order to support OS X 10.5
"Leopard" and OS X 10.6 "Snow Leopard". OS X 10.7 "Lion" requires a
64-bit Mac and supports all 64-bit Macs.
- 32-bit applications are no longer allowed in the macOS App Store.
- 32-bit applications no longer run in macOS 10.15 "Catalina".
- 32-bit applications do not support thread-local storage, so the
TurboJPEG API library's global error handler is not thread-safe with
such applications.
- libjpeg-turbo 2.1.x no longer supports 32-bit iOS apps, so it makes
sense to also eliminate support for 32-bit macOS applications.
It's time.
We haven't provided official Cygwin builds since 1.4.x, since Cygwin
now supplies its own libjpeg-turbo packages (although they apparently
haven't been updated past 1.5.3.)
This extends the fix in 1e81b0c3ea to
include binary PPM files with maximum values < 255, thus preventing a
malformed binary PPM input file with those specifications from
triggering an overrun of the rescale array and potentially crashing
cjpeg, TJBench, or any program that uses the tjLoadImage() function.
Fixes#433
Referring to #408, this commit #ifdefs DSPr2 SIMD functions that only
work on little endian processors, and it completely excludes
jsimd_h2v1_downsample_dspr2() and jsimd_h2v2_downsample_dspr2(). The
latter two functions fail with the TJBench tiling regression tests, most
likely because the implementation of the functions predates those tests.
Previously, these environment variables were not honored unless a 74K
CPU was detected, but this detection doesn't work properly with QEMU's
user mode emulation. With all other CPU types, libjpeg-turbo honors
JSIMD_FORCE* regardless of CPU detection.
This CMake variable is intended to define a wrapper program for
executing cross-compiled executables. However, CTest doesn't use
CMAKE_CROSSCOMPILING_EMULATOR, because it isn't obvious which tests
should be executed with the wrapper and which tests are scripts that
don't need it. This commit manually prepends
${CMAKE_CROSSCOMPILING_EMULATOR} to all unit test command lines that
execute a program built by the libjpeg-turbo build system. Thus, one
can set CMAKE_CROSSCOMPILING_EMULATOR in a CMake toolchain file to (for
instance) "qemu-{architecture} {qemu_arguments}") in order to execute
all eligible unit tests using QEMU.
+ document that tjFree() accepts NULL pointers without complaint.
Effectively, it has had that behavior all along, but the API does not
guarantee that tjFree() will be implemented with free() behind the
scenes, so it's best to formalize the behavior.
This programming practice (which exists in other code bases as well)
is a by-product of having used early C compilers that did not properly
handle free(NULL). All modern compilers should properly handle that.
Fixes#398
- Don't enumerate the types of SIMD instructions that libjpeg-turbo
supports, as this can change without notice.
- Use more clear terminology when describing support for libjpeg v7/v8
features ("libjpeg" is, colloquially but not officially, the name for
the IJG's software, whereas the "libjpeg API" refers to our emulation
of said software.)
- "PhotoShop" = "Photoshop" (StudLy Caps Police)
- Adjust dynamic library versions to reflect the addition of
jpeg_read_icc_profile() and jpeg_write_icc_profile() in
libjpeg-turbo 2.0.x.
Move constants out of the .text section in simd/arm64/jsimd_neon.S and
into a .rodata section. This ensures that the ARMv8 NEON SIMD
extensions are compatible with memory layouts that are marked
execute-only (and thus unreadable.)
Based on:
88f3ca7664Closes#318
libjpeg-turbo never included that code, because it requires an external
library (the Utah Raster Toolkit.) The RLE image format was supplanted
by GIF in the late 1980s, so it is rarely seen these days. (It had a
lousy Weissman score, anyhow.)
- Enable progress reporting at run time using a new -report argument
(cjpeg now supports that argument as well)
- Limit the allowable number of scans using a new -maxscans argument
- Treat warnings as fatal using a new -strict argument
This mainly demonstrates how to work around the two issues with the
JPEG standard described here:
https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf
since those and similar issues continue to be erroneously reported as
libjpeg-turbo bugs.
Modern Loongson processors are MIPS64-compatible, and MMI instructions
are now supported in the mainline of GCC. Thus, this commit adds
compile-time and run-time auto-detection of MMI instructions and moves
the MMI SIMD extensions for libjpeg-turbo from simd/loongson/ to
simd/mips64/. That will allow MMI and MSA instructions to co-exist
in the same build once #377 has been integrated.
Based on:
82953ddd61Closes#383
Because of 01e3032354 (officially
eliminating support for compilers without unsigned char, since we never
effectively supported those compilers anyhow), GETJOCTET() is now a
no-op. Since that macro is in jmorecfg.h, it is part of the de facto
libjpeg API and must remain in the public headers. However, there is no
reason to continue using it internally, and eliminating its internal use
improves code readability.
This commit adds ARM64 NEON optimizations for the
encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in
progressive Huffman encoding.
Compression speedups for the typical set of five libjpeg-turbo test
images (https://libjpeg-turbo.org/About/Performance):
Cortex-A53: 23.8-39.2% (avg. 32.2%)
Cortex-A72: 26.8-41.1% (avg. 33.5%)
Apple A7: 29.7-45.9% (avg. 39.6%)
Closes#229
Homebrew tends to drop support for a macOS release the second that Apple
stops releasing security updates for it, and that makes HB difficult to
use with some of the Travis macOS images. Furthermore, even on
supported macOS releases, HB sometimes tries to build GCC from source
even if a binary (bottle) is available. Long story short, MacPorts just
generally has better backward compatibility. MacPorts is also what I
personally use on the official libjpeg-turbo build machine.
... detected by ASan. This is a similar issue to the issue that was
fixed with 402a715f82. Apparently it is
possible to create a malformed JPEG image that exceeds the Huffman
encoder's 256-byte local buffer when attempting to losslessly tranform
the image. That makes sense, given that it was necessary to extend the
Huffman decoder's local buffer to 512 bytes in order to handle all
pathological cases (refer to 0463f7c9aad060fcd56e98d025ce16185279e2bc.)
Since this issue affected only lossless transformation, a workflow that
isn't generally exposed to arbitrary data exploits, and since the
overrun did not overflow the stack (i.e. it did not result in a segfault
or other user-visible issue, and valgrind didn't even detect it), it did
not likely pose a security risk.
Fixes#392
... that caused some JPEG images with unusual sampling factors to be
misidentified as 4:4:4. This led to a buffer overflow when attempting
to decompress some such images using tjDecompressToYUV*().
Regression introduced by 479501b07c
The correct behavior is for the TurboJPEG API to refuse to decompress
such images, which it did prior to the aforementioned commit.
Fixes#389
Referring to #289, I'm not sure where I arrived at the conclusion that
the SSE2 progressive Huffman encoder doesn't provide any speedup for
x32. Upon re-testing, I discovered it to be about 50% faster than the
C encoder.
This commit also re-purposes one of the CI tests (specifically, the
jpeg-7 API/ABI test) so that it tests x32 as well.
... introduced by 42825b68d5. In fact,
fault-tolerant multi-scan block smoothing cannot currently be used with
the arithmetic decoder, because that decoder doesn't have any way of
distinguishing a normal end of scan from an unexpected end of scan.
Thus, this commit also modifies the change log to reset the expectations
regarding the scope of the fault-tolerant multi-scan block smoothing
feature. If, at some point in the future, the arithmetic decoder can be
modified to detect an unexpected end of scan, then one would need only
set entropy->pub.insufficient_data = TRUE when the arithmetic decoder
encounters an unexpected end of scan in order to make fault-tolerant
block smoothing work properly with that decoder.
This commit modifies the behavior of the block smoothing algorithm in
the libjpeg API library so that, if a scan in a multi-scan JPEG image is
incomplete (due to premature termination of the image stream), the block
smoothing parameters from the previous (complete) scan are used to
smooth any iMCU rows that the incomplete scan does not contain.
Closes#343
Modifying a locally-defined non-volatile variable below the setjmp()
return point results in undefined behavior whereby the variable may not
have the expected value after setjmp() returns.
Fixes#379
The primary impetus for this is to eliminate build warnings, such as
(32-bit only)
section "__textcoal_nt" is deprecated
object file (XXXXXX.o) was built for newer OSX version (10.XX) than
being linked (10.5)
Upgrading to GCC 6 results in neutral performance for compression,
a measured average overall decompression speedup of 2.5% for 64-bit
code, and a measured average overall decompression speedup of -4.3% for
32-bit code on a 3 GHz Core i7. The 4.3% slow-down for 32-bit code is
deemed acceptable, given that 32-bit macOS apps are deprecated.
Splitting the pointer arithmetic in GET_SYM() into a separate add and
sub instruction was an attempt to work around an error ("invalid operand
type") that occurred when assembling the file with NASM. However, this
created a link error on macOS ("ld: illegal text-relocation to
'_jconst_huff_encode_one_block' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o from
'_jsimd_huff_encode_one_block_sse2' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o for architecture i386")
and also changed the alignment of the code in ways that might have
affected the previous benchmark results (which took a great deal of time
to obtain.) Ultimately, the path of least resistance is just to
require NASM 2.13 or later.
This commit improves the C and SSE2 Huffman encoding implementations in
the following ways:
- Avoid using xmm8-xmm15 in the x86-64 SSE2 implementation. There is no
actual need to use those registers, and avoiding them produces a
cleaner WIN64 function entry/exit-- as well as shorter code, since REX
prefixes can be avoided (this is helpful on certain CPUs, such as
Intel Atom, for which instruction fetch and decoding can be a
bottleneck.)
- Optimize register usage so that fewer REX prefixes and
register-register moves are needed.
- Use the bit counter to store the number of free bits in the bit buffer
rather than the number of bits in the bit buffer. This changes the
method for inserting a code into the bit buffer to:
(put_buffer |= code << (free_bits -= code_size));
As a result:
* Only one bit counter needs to stay in a register (we just keep it in
cl.)
* The bit buffer contents are already properly aligned to be written
out (after a byte swap.)
* Adjusting the free bits counter and checking if the bit buffer is
full can be combined into a single operation.
* We can wait to flush the bit buffer until the buffer is actually
full and not just in danger of becoming full. Thus, eight bytes can
be flushed at a time.
- Speed is quite sensitive to the alignment of branch target labels, so
insert some padding and remove branches from the flush code.
(Flushing this way isn't actually faster when compared to using
branches, but the branchless code doesn't need extra alignment and is
thus smaller.)
- Speculatively write out the bit buffer as a single 8-byte write,
falling back to a byte-by-byte write only if there are any 0xFF bytes
in the bit buffer that need to be encoded as 0xFF 0x00.
- Use MMX registers for the 32-bit implementation (so the bit buffer can
be 64 bits wide.)
- Slightly reduce overall function code size.
- Eliminate or combine a few SSE instructions.
- Make some minor improvements to instruction scheduling.
- Adjust flush_bits() in jchuff.c to handle cases in which the bit
buffer has less than 7 free bits (apparently that couldn't happen
before.)
Based on:
947a09defa262ebb6b816e9a091221
See change log for performance claims.
Closes#292
This macro is a relic of libjpeg's historic need to support a wide
variety of C compilers with varying degrees of compatibility. Such was
necessary during the open systems era, because C compilers were often
supplied by the system vendor. Prior to 1989, there was no C standard
per se, and even after ANSI C became a thing, there were still compilers
in use that didn't conform to it (libjpeg was first released in 1991.)
Realistically, only a handful of C compilers are in widespread use these
days, and all modern C compilers should support structure assignment.
... to avoid backward compatibility issues with GCC 4-6 MinGW
toolchains. Apparently GCC 7+ MinGW toolchains introduce a link-time
dependency with internal MinGW CRT functions that are meant to provide
compatibility with Microsoft's Universal CRT (ucrt) library, but those
internal functions are not available in GCC 4-6 MinGW toolchains. This
made it impossible to use the official builds of libjpeg.a and
libturbojpeg.a with GCC 4-6 MinGW toolchains (a fatal link error--
"undefined reference to '__imp___acrt_iob_func'"-- occurred.)
This problem was not immediately apparent after switching to the MSYS2
implementation of MinGW (d6d7b53968)
because, for a while, MSYS2 was still using GCC 5 and 6.
Refer to libjpeg-turbo/libjpeg-turbo#382
byte, word, dword, qword, oword, and yword are all assembler keywords,
so it makes sense to use lowercase for these so as not to mistake them
for macros or constants.
(AKA "Java for OS X systems.") This implementation of Java 1.6 is long
obsolete and not supported on any version of macOS past High Sierra.
Oracle no longer provides a 32-bit JVM on macOS, so it is no longer
necessary to provide a 32-bit version of the TurboJPEG Java wrapper on
macOS.
If the TurboJPEG instance passed to tjDecodeYUV[Planes]() was previously
used to decompress a progressive JPEG image, then we need to disable the
progressive decompression parameters in the underlying libjpeg instance
before calling jinit_master_decompress().
This commit also modifies the build system so that the "tjtest" target
will test for this issue, and it corrects a previous oversight in the
build system whereby tjbenchtest did not test progressive
compression/decompression unless WITH_JAVA was true.
(regression introduced by 5b177b3cab)
The SSE2 implementation of progressive Huffman encoding performed
extraneous iterations when the scan length was a multiple of 16.
Based on:
bb7f1ef983Fixes#335Closes#367
- Re-purpose the non-SIMD test to test with MSan as well.
- Re-purpose the ASan test to test with UBSan as well.
- Use the default Travis build environment rather than specifying Ubuntu
14.04. I think I added 'dist: trusty' back when 14.04 was newer than
the default, but now it's older than the default.
- Enable verbose output for any unit tests that fail (so we can see the
sanitizer output.)
Prevent several integer overflow issues and subsequent segfaults that
occurred when attempting to compress or decompress gigapixel images with
the TurboJPEG API:
- Modify tjBufSize(), tjBufSizeYUV2(), and tjPlaneSizeYUV() to avoid
integer overflow when computing the return values and to return an
error if such an overflow is unavoidable.
- Modify tjunittest to validate the above.
- Modify tjCompress2(), tjEncodeYUVPlanes(), tjDecompress2(), and
tjDecodeYUVPlanes() to avoid integer overflow when computing the row
pointers in the 64-bit TurboJPEG C API.
- Modify TJBench (both C and Java versions) to avoid overflowing the
size argument to malloc()/new and to fail gracefully if such an
overflow is unavoidable.
In general, this allows gigapixel images to be accommodated by the
64-bit TurboJPEG C API when using automatic JPEG buffer (re)allocation.
Such images cannot currently be accommodated without automatic JPEG
buffer (re)allocation, due to the fact that tjAlloc() accepts a 32-bit
integer argument (oops.) Such images cannot be accommodated in the
TurboJPEG Java API due to the fact that Java always uses a signed 32-bit
integer as an array index.
Fixes#361
This commit modifies h1v2_fancy_upsample() so that it uses an ordered
dither pattern, similar to that of h2v1_fancy_upsample(), rounding up or
down the result for alternate pixels rather than always rounding down.
This ensures that the decompression error pattern for a 4:4:0 JPEG image
will be similar to the rotated decompression error pattern for a 4:2:2
JPEG image. Thus, the final result will be similar regardless of
whether a 4:2:2 JPEG image is rotated or transposed before or after
decompression.
Closes#356
xcode7.3 is based on El Capitan, which is EOL, and Homebrew no longer
provides El Cap bottles (pre-compiled binaries.) Thus, Homebrew was
trying to build GCC 5, YASM, and the other packages we need from source,
which caused the Mac CI builds to time out. I tried goading Homebrew
into installing GCC 5.5.0_2 and YASM 1.3.0_1, which still have El Cap
bottles available, by using the URLs of those specific versions of the
formulae (from the Homebrew GitHub repository) as package names. This
failed, however, because 'brew bundle' converted the URLs to all
lowercase. I then tried explicitly installing the old formulae by using
'brew install' with the aforementioned URLs (bypassing the Travis
Homebrew addon), but Homebrew still tried to build all of the
dependencies from source.
Upgrading to Xcode 8.3.x necessitated regression testing the performance
on iOS, but that proved less of a pain than figuring out how to install
all of the old Homebrew bottles we needed. Also, this future-proofs the
CI builds against the inevitable discontinuation of the xcode7.3 image.
Note that Xcode 8.3.x improves iOS 64-bit decompression performance
significantly relative to Xcode 7.2.x or 7.3.x. iOS 32-bit performance
unfortunately regresses by as much as 5%, but it can't be helped (32-bit
iOS apps are no longer supported on iOS 11+ anyhow, and the next major
release of libjpeg-turbo will remove support for them as well.)
... including, but not limited to:
- unused macros
- private functions not marked as static
- unprototyped global functions
- variable shadowing
(detected by various non-default GCC 8 warning options)
According to Intel's manual [1], "If a value entered for CPUID.EAX is
higher than the maximum input value for basic or extended function for
that processor then the data for the highest basic information leaf is
returned."
Right now, libjpeg-turbo doesn't first check that leaf 07H is supported
before attempting to use it, so the ostensible AVX2 bit (Bit 05) of the
CPUID result might actually be Bit 05 from a lower leaf. That bit might
be set, even if the CPU doesn't support AVX2.
This commit modifies the x86 and x86-64 SIMD feature detection code so
that it first checks whether CPUID leaf 07H is supported before
attempting to use it to check for AVX2 instruction support.
DRC:
This commit should fix
https://bugzilla.mozilla.org/show_bug.cgi?id=1520760
However, I have not personally been able to reproduce that issue,
despite using a Nehalem (pre-AVX2) CPU on which the maximum CPUID leaf
has been limited via a BIOS setting.
Closes#348
[1]
"Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z", https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf, page 3-192.
Same as d3a3a73f64 but in the fast decode
path. It was necessary to use a different-sized test image in order to
trigger the error in this location.
Refer to #347
5ea77d8b77 was insufficient to fix all of
these. In particular, we need to always release the primitive arrays
before throwing an exception, because throwing an exception qualifies as
"using JNI."
Refer to #300
CONTRIBUTING.md: Correct misuse of "as such" (Grammar Police)
bug-report.md: Clarify that the submitter should always test against the
latest stable code base.
Some pathological test images have been created that can cause s to
overflow or underflow the signed int data type during decompression.
This is technically undefined behavior according to the C spec, although
every modern implementation I'm aware of will treat the signed int as a
2's complement unsigned int, thus causing the value to wrap around to
INT_MIN if it exceeds INT_MAX. This commit simply makes that behavior
explicit in order to shut up UBSan. At least when building for x86-64
or i386 using Clang or GCC, this commit does not change the
compiler-generated assembly code at all.
The code that triggered this error has existed in the libjpeg code base
for at least 20 years (and probably much longer), so the fact that it
hasn't produced a user-visible problem in all of that time strongly
suggests that UBSan is being overly pedantic here. But if someone can
cough up a platform that doesn't wrap around to INT_MIN when 1 is added
to INT_MAX, then I'll happily change my opinion.
Fixes#347
... since www.nasm.us seems to be down frequently. This doesn't help us
at the moment, but hopefully once the site is back up this will prevent
future build failures.
When simd/arm/jsimd.c is compiled with __ARM_NEON__ defined (which will
be the case if -mfpu=neon is passed to the compiler), the
parse_proc_cpuinfo() and check_feature() functions and the bufsize
variable are unused and thus need to be #ifdef'ed out in order to avoid
compiler warnings. Note that the bufsize variable was already #ifdef'ed
out on Linux but not on Android due to lack of parentheses (&& takes
precedence over ||.)
Closes#331
%{_libdir}/pkgconfig is a directory and should thus be prefixed by
%{dir} (oops.) This issue caused the debuginfo build under RHEL 8
(which is apparently now enabled by default-- regardless of whether the
RPM actually contains debug info, but that's another matter) to fail
with:
RPM build errors:
File listed twice: /opt/libjpeg-turbo/lib64/pkgconfig/libjpeg.pc
File listed twice: /opt/libjpeg-turbo/lib64/pkgconfig/libturbojpeg.pc
On RHEL 7 and later (not sure exactly whether this is a product of the
newer RPM release or something distro-specific), macros are lazily
expanded, so we need to set _docdir using %global (which expands at
definition time) and prior to _prefix and _datarootdir (which affect
_defaultdocdir.) Otherwise, _docdir is set to a subdirectory of
/opt/libjpeg-turbo/share/doc or /opt/libjpeg-turbo/doc. The former
(which happens on RHEL 7) leads to incorrect documentation packaging
(the docs should be packaged under /usr/share/doc per Red Hat
standards), and the latter (which happens on RHEL 8) leads to an RPM
build error.
AFAICT, Requires and BuildRequires subsumed the functionality of Prereq
and BuildPrereq in RPM 4.0, and none of the platforms we support with
libjpeg-turbo 2.0.x has RPM < 4.4.
We have more than eight registers to work with, as well as three-operand
intrinsics, so there's no need for the implementation to be such a
literal port of the MMX code.
Unfortunately, this hack is necessary because:
- install(TARGETS, ...) doesn't support the RENAME option.
- We can't modify OUTPUT_NAME for the "-static" targets without breaking
the regression tests.
- ${CMAKE_CFG_INTDIR} doesn't seem to work properly in an install()
command.
Refer to #307
libjpeg-turbo has never really supported such compilers, since (AFAIK)
they are non-existent on any modern computing platform and thus
impossible for us to test. (Also, the TurboJPEG API would break without
unsigned chars.)
Furthermore, the unified CMake-based build system introduced in 2.0
always defines HAVE_UNSIGNED_CHAR, so retaining other code paths is
pointless. Eliminating support for compilers without unsigned char
eliminates the need for the GETJSAMPLE() macro, which improves the
readability of many parts of the code as well as improving the
performance of writing Targa and Windows BMP files.
Fixes#317
Including the license templates was confusing to some, since it made
it appear as if the copyright year and author were unspecified for the
libjpeg-turbo source. Thus, rather than include the zlib License
template, link to that template on opensource.org. For the Modified BSD
License, include a roll-up of copyright years and authors, since the
terms of that license require the text of it to be included in product
documentation for binary distributions without accompanying source code.
Use the android.toolchain.cmake toolchain file in the NDK (v13b or
later), since this toolchain file generally takes care of setting the
approprate compiler flags and dealing with the differences between
GCC and Clang. Our custom Android build procedure did not work with
Clang-based NDK toolchains, which meant that it could not be made to
work with NDK v18b or later.
Fixes#309
Normally, 4:4:4 JPEGs have horizontal x vertical luminance & chrominance
sampling factors of 1x1. However, it is technically legal to create
4:4:4 JPEGs with sampling factors of 2x1, 1x2, 3x1, or 1x3, since the
sums of the products of those sampling factors are still <= 10. The
libjpeg API correctly decodes such images, so the TurboJPEG API should
as well.
Fixes#323
If cinfo->quantize_colors == 1, then jpeg_calc_output_dimensions() will
set cinfo->output_components to 1, and if cinfo->out_color_space is not
RGB (or extended RGB), hilarity will ensue.
Fixes#305
Caused by 950580eb0c. Since the code that
sets CMAKE_INSTALL_RPATH now depends on ENABLE_SHARED, that code needed
to be moved to after the point at which ENABLE_SHARED is defined.
I give up on the public keyserver. It inexplicably just fails
sometimes. I was trying to use it out of an abundance of caution
(<cough> paranoia <cough>), but it seems like most open source projects
just serve up their public keys from their project web sites. The
private and public pre-release keys are still stored on separate sites,
the private key is still strongly encrypted by Travis, and we use a
separate key for pre-releases anyhow, so even if it's compromised, we
can quickly and easily deploy a new one.
... when downloading the RPM signing key. Apparently the key server
URL sometimes redirects to an https URL, which may explain why fetching
the RPM signing keys failed frequently when we used to run wget inside
of the CentOS 5 Docker container.
The build will consistently fail for days at a time with:
error: http://pool.sks-keyservers.net/pks/lookup?op=get&search=0x0575F26BD5B3FDB1: import read failed(-1).
I have a hunch that this is related to the CentOS 5 Docker container, so
this commit causes Travis to download the RPM signing key outside of
the container and share it with the container.
We shouldn't be making JNI calls between GetPrimitiveArrayCritical() and
ReleasePrimitiveArrayCritical(). Apparently Android is stricter about
this than desktop Java.
Issue was introduced in 0713c1bb54.
Fixes#300
Apparently Windows 7 without SP1 has O/S support for XSAVE but not for
YMM registers, and this exposed a bug in our usage of xgetbv. The test
instruction will set ZF only if none of the bits match between the two
operarands, so in effect, we were enabling AVX2 instructions if the O/S
supported XSAVE and the CPU supported AVX2 but the O/S only supported
XMM registers. This bug was not exposed on, for instance, Windows XP or
RHEL 5 because those O/S's do not support XSAVE.
Fixes#288
- CMake 3.10.x or later must be used with JDK 11, or an error
("regex not supported") will occur when CMake tries to parse the Java
version number.
- The JDK is no longer available at java.com.
The DSPr2 extensions have been verified to work with little endian MIPS.
Whether or not CMAKE_SYSTEM_PROCESSOR is set to "mips" or "mipsel" in a
little endian MIPS environment seems to be inconsistent, but our build
system needs to handle both cases.
(for instance, when passing -msoft-float to the compiler)
The instructions used by jsimd_quantize_float_dspr2() and
jsimd_convsamp_float_dspr2() don't work with the soft float ABI, so
disable those functions when soft float is enabled.
Based on:
129a739bfaCloses#272
When cli arguments request image-changing operation (like transform, scans or arith coding) to be applied, force output result file, even if it has bigger filesize than original
@rpath is only supported with 10.5 and later deployment targets.
libjpeg-turbo hasn't supported 10.4 "Tiger" since prior to 1.4, but I
still sometimes use the 10.4 SDK to test PowerPC code in a Snow Leopard
VM.
- When referring to specific clauses, annexes, tables, and figures, a
"timed reference" (a reference that includes the year) must be used in
order to avoid confusion.
- "CCITT" = "ITU-T"
- Replace ambiguous "JPEG spec" with the specific document number.
... in which one or more of the color indices is out of range for the
number of palette entries.
Fix partly borrowed from jpeg-9c. This commit also adopts Guido's
JERR_PPM_OUTOFRANGE enum value in lieu of our project-specific
JERR_PPM_TOOLARGE enum value.
Fixes#258
Normally the value of CMAKE_EXECUTABLE_SUFFIX is clobbered by project().
This allows for specifying an executable suffix of .html with Emscripten
builds, which causes Emscripten to build standalone HTML versions of the
libjpeg-turbo test programs.
The sentence:
"Indeed, one of the original reasons for developing this free software
was to help force convergence on common, interoperable format standards
for JPEG files."
might be seen to imply that JPEG 2000 and JPEG XR are not interoperable
with themselves, although it is certainly the case that those formats
are not interoperable with each other, nor with
ITU T.81 | ISO/IEC 10918. They are also certainly not as common as
ITU T.81 | ISO/IEC 10918, and (as an example) popular web browsers will
not display JPEG 2000 files.
The sentence in question was originally referring to proprietary,
non-standard formats and was meant to provide historical context.
libjpeg was originally released prior to the adoption of JFIF as an
official standard, so it encouraged adoption of JFIF as a de facto
standard by providing, under a business-friendly free software license,
a library for reading and writing images in that format.
This is basically the same test that was performed in acinclude.m4 in
the old autotools-based build system. It was not ported to the
CMake-based build system because I previously had no way of testing
a non-DSPr2 build environment.
Fixes#248
... caused by using certain specific combinations of
jpeg_skip_scanlines() and jpeg_read_scanlines() calls with progressive,
vertically-subsampled JPEG images.
Fixes#237
In rdbmp.c, it is necessary to guard against 32-bit overflow/wraparound
when allocating the row buffer, because since BMP files have 32-bit
width and height fields, the value of biWidth can be up to 4294967295.
Specifically, if biWidth is 1073741824 and cinfo->input_components = 4,
then the samplesperrow argument in alloc_sarray() would wrap around to
0, and a division by zero error would occur at line 458 in jmemmgr.c.
If biWidth is set to a higher value, then samplesperrow would wrap
around to a small number, which would likely cause a buffer overflow
(this has not been tested or verified.)
... in tjLoadImage()/tjSaveImage(). These error codes require an add-on
message table, and if it isn't initialized, then format_message()
produces "Bogus message code XXXX" instead.
This change is proposed to enable cases where this library and it's CMakeLists.txt are included in other projects/CMakeLists.txt files as a dependency via the add_subdirectory method, for example: add_subdirectory(mozjpeg). There are several reasons and workflows which "wrap" third party projects/builds using this method, including many enterprise build/devops pipelines.
This change will have no effect on users building mozjpeg by itself as usual, it very simply enables the wrapping use cases.
With this parem do not write JFIF APP0 marker segment. Reduce size in 18 bytes. This is a mandatory marker, but no error in know programs if are lost. Safe for web use.
(detected by enabling additional checkstyle modules)
This commit also removes unnecessary uses of the "private" modifier in
the Java tests/examples. The default access modifier disallows access
outside of the package, and none of these classes is in a package. The
only reason we use "private" with member variables in these classes is
to make checkstyle happy, because we want it to enforce that behavior in
the TurboJPEG API code.
... and modify tjbench.c to match the variable name changes made to
TJBench.java
("checkstyle" = http://checkstyle.sourceforge.net, not our regex-based
checkstyle script)
Arguably it doesn't make much sense for non-chroma components to be
subsampled (which is why this type of image was overlooked in
cd7c3e6672cce3779450c6dd10d0d70b0c2278b2-- I didn't realize it was a
thing), but certain Adobe applications apparently generate these images.
Fixes#236
The old Un*x (autotools-based) build system always auto-generated this
file, but that behavior was more or less a relic of the days before the
libjpeg-turbo colorspace extensions were implemented. The thinking was
that, if a particular developer wanted to change RGB_RED, RGB_GREEN,
RGB_BLUE, or RGB_PIXELSIZE in order to compress from/decompress to
different RGB pixel layouts, then the SIMD extensions should
automatically respond to those changes whenever they were made to
jmorecfg.h. The modern reality is that changing RGB_* is no longer
necessary because of the libjpeg-turbo colorspace extensions, and
changing any of the other constants in jsimdcfg.inc can't be done
without making deeper modifications to the SIMD extensions. In general,
we treat RGB_* as a de facto, immutable part of the legacy libpjeg API.
Realistically, since the values of those constants have been the same in
every Un*x distribution released in the past 20-30 years, any software
that uses a system-supplied build of libjpeg must assume that those
constants will have default values.
Furthermore, even if it made sense to auto-generate jsimdcfg.inc, it was
never possible to do so on Windows, so it was always going to be
necessary to manually generate the Windows version of the file whenever
any of the constants changed. This commit introduces a new custom CMake
target called "jsimdcfg" that can be used, on Un*x platforms, to
generate jsimdcfg.inc on demand, although this should only be necessary
when introducing new x86 SIMD instructions or making other deep
modifications, such as SIMD acceleration for 12-bit JPEGs.
For those who may be wondering why we don't do the same thing for
win/jconfig.h.in, it's because performing all of the necessary CMake
checks to populate that file is very slow on Windows.
These were necessary for the first iteration of the feature (see #46),
which provided a different C front end for the SIMD version of the
function. The final version of the feature uses a common C front end
for both SIMD and non-SIMD implementations, so these checks are no
longer necessary.
Closes#231
This commit merges the following paragraph from the latest libjpeg
release:
https://github.com/libjpeg-turbo/ijg/blob/jpeg-9c/README#L222-L229
which takes into account the fact that JFIF is now an official ISO/ITU-T
standard. I also included the ISO/IEC document number for the JFIF spec
(jpeg-9c included only the ITU-T rec number.)
This commit also heavily wordsmiths the "FILE FORMAT WARS" section.
In jpeg-7 and later, this section has become somewhat impolitic,
referring to JPEG 2000 and JPEG XR as "faulty technologies" and
"momentary mistakes." The original intent of this section, which was
introduced in jpeg-5 and refined in jpeg-6
(https://github.com/libjpeg-turbo/ijg/blob/jpeg-5/README#L317-L338,
https://github.com/libjpeg-turbo/ijg/blob/jpeg-6b/README#L335-L367)
was to highlight the problem of JPEG file format divergence that existed
in the 1990s prior to the adoption of JFIF as an official ISO/ITU-T
standard. That problem is fortunately no longer a problem, thanks in
part to the existence of libjpeg. I have attempted to preserve Tom's
intent of using this section to describe which file formats the code is
compatible with and why it isn't compatible with some file formats
bearing the name "JPEG." Such modifications always put our project in a
very awkward position, because we are not the IJG and do not claim to
be, but it is still necessary for us to modify the IJG README file from
time to time to eliminate obsolete information while attempting to
remain as neutral as possible.
Instructing the compiler to treat warnings as errors caused some of the
compiler tests to fail, because the test code was not 100% clean.
Note that we now use check_symbol_exists() to check for memset() and
memcpy(), since the test code for check_function_exists() produces a
compiler warning due to not including <string.h>.
... instead of the RSA code, the license for which contains an
advertising clause. It is strongly believed that the RSA advertising
clause is innocuous, because:
- A clarification from RSA
(http://www.ietf.org/ietf-ftp/IPR/RSA-MD-all), published in 2000,
stated:
"Implementations of these message-digest algorithms, including
implementations derived from the reference C code in RFC-1319,
RFC-1320, and RFC-1321, may be made, used, and sold without license
from RSA for any purpose."
Referring to the opinion from Fedora's legal team
(https://fedoraproject.org/wiki/Licensing:FAQ?rd=Licensing/FAQ#What_about_the_RSA_license_on_their_MD5_implementation.3F_Isn.27t_that_GPL-incompatible.3F),
this means that md5.c and md5.h, which were derived from the original
RFC 1321 reference code (http://www.faqs.org/rfcs/rfc1321.html), can
be used without the RSA license.
- In the context of libjpeg-turbo, RSA's MD5 code was used only in the
build/test system. It was not part of the libjpeg-turbo binary
distribution, and thus the only "material mentioning or referencing"
the MD5 code was the libjpeg-turbo source code, which-- by virtue of
including RSA's original copyright headers-- properly attributed the
code as required under the RSA license.
However, in light of the open source community's tendency to have
knee-jerk reactions to stuff like this, it would've been necessary to
include the above explanation in our source tree in order to head off
potential FUD, and a simple fix is always better than a complex
explanation.
This commit also assigns the 3-clause BSD license to my modifications of
the MD5 code. This license is the same one used by md5cmp and other
parts of the build system.
When attempting to configure an iOS/ARM build with Xcode 7.2 and CMake
2.8.12, I got the following errors:
CMake Error at CMakeLists.txt:560 (add_library):
Attempting to use MACOSX_RPATH without CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG
being set. This could be because you are using a Mac OS X version less
than 10.5 or because CMake's platform configuration is corrupt.
(x 3)
CMake Error at sharedlib/CMakeLists.txt:38 (add_library):
Attempting to use MACOSX_RPATH without CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG
being set. This could be because you are using a Mac OS X version less
than 10.5 or because CMake's platform configuration is corrupt.
(x 3)
Upgrading to CMake 3.x (tried 3.0 and 3.1) got rid of the errors, but
the resulting shared libs still did not use @rpath as expected. Note
also that CMake 3.x (at least the two versions I tested) does not
automatically set the MACOSX_RPATH property as claimed. I could find
nothing in the release notes for later CMake releases to indicate that
either problem has been fixed. What I did find was this little nugget
of code in the Darwin platform module:
f6b93fbf3a/Modules/Platform/Darwin.cmake (L33-L36)
This sets CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG="-Wl,-rpath," only if you
are running OS X 10.5 or later. It makes no such check for iOS, perhaps
because shared libraries aren't much of a thing with iOS apps. In any
event, this commit simply sets CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG if it
isn't set already, and that fixes all of the aforementioned problems.
- Travis doesn't set the $encrypted_* variables for PRs, so disable GPG
signing when building a PR (artifacts aren't deployed for PRs anyhow,
and even if they were, I wouldn't want them to be signed, as they may
contain unvetted code.)
- Take advantage of the new -d option in buildljt, which allows for
building from an existing Git clone directory. This eliminates the need
to rename and restore .git/shallow, allows the official build scripts to
work properly when building PRs, and prevents 'git clone' being invoked
twice in CI builds.
Refer to #217
These files are potentially useful to MinGW users, since MSYS2 MinGW
environments have a man command by default and provide an easy way to
install pkg-config.
Closes#223
This commit adds C and SSE2 optimizations for the encode_mcu_AC_first()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@293263c using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +19%
gcc-5 x86_64: +80%
gcc-7 x86_64: +57%
clang i386: +5%
gcc-5 i386: +59%
gcc-7 i386: +51%
SSE2
clang x86_64: +79%
gcc-5 x86_64: +158%
gcc-7 x86_64: +122%
clang i386: +71%
gcc-5 i386: +134%
gcc-7 i386: +135%
Discussion in libjpeg-turbo/libjpeg-turbo#46
This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@3c54642 using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +7%
gcc-5 x86_64: +30%
gcc-7 x86_64: +33%
clang i386: +0%
gcc-5 i386: +24%
gcc-7 i386: +23%
SSE2
clang x86_64: +42%
gcc-5 x86_64: +53%
gcc-7 x86_64: +64%
clang i386: +35%
gcc-5 i386: +46%
gcc-7 i386: +49%
Discussion in libjpeg-turbo/libjpeg-turbo#46
Within the libjpeg API code, it seems to be more the convention than not
to separate the macro name and value by two or more spaces, which
improves general readability. Making this consistent across all of
libjpeg-turbo is less about my individual preferences and more about
making it easy to automatically detect variations from our chosen
formatting convention. I intend to release the script I'm using to
validate this stuff, once it matures and stabilizes a bit.
* Modify the SIMD dispatchers so they guard their usage of getenv() with
the existing NO_GETENV preprocessor definition.
* Introduce a new NO_PUTENV preprocessor definition to guard the
usage of putenv() in the TurboJPEG API library.
This at least puts Windows Store compatibility within the realm of
possibility, although further steps are required.
Broken by previous commit. Although turbojpeg.c no longer needs
tjutil.h on Un*x, it still needs to include that file on Windows in
order to use snprintf() and strcasecmp() (which, on Windows, are macros
that wrap _snprintf_s() and stricmp().)
With rare exceptions ...
- Always separate line continuation characters by one space from
preceding code.
- Always use two-space indentation. Never use tabs.
- Always use K&R-style conditional blocks.
- Always surround operators with spaces, except in raw assembly code.
- Always put a space after, but not before, a comma.
- Never put a space between type casts and variables/function calls.
- Never put a space between the function name and the argument list in
function declarations and prototypes.
- Always surround braces ('{' and '}') with spaces.
- Always surround statements (if, for, else, catch, while, do, switch)
with spaces.
- Always attach pointer symbols ('*' and '**') to the variable or
function name.
- Always precede pointer symbols ('*' and '**') by a space in type
casts.
- Use the MIN() macro from jpegint.h within the libjpeg and TurboJPEG
API libraries (using min() from tjutil.h is still necessary for
TJBench.)
- Where it makes sense (particularly in the TurboJPEG code), put a blank
line after variable declaration blocks.
- Always separate statements in one-liners by two spaces.
The purpose of this was to ease maintenance on my part and also to make
it easier for contributors to figure out how to format patch
submissions. This was admittedly confusing (even to me sometimes) when
we had 3 or 4 different style conventions in the same source tree. The
new convention is more consistent with the formatting of other OSS code
bases.
This commit corrects deviations from the chosen formatting style in the
libjpeg API code and reformats the TurboJPEG API code such that it
conforms to the same standard.
NOTES:
- Although it is no longer necessary for the function name in function
declarations to begin in Column 1 (this was historically necessary
because of the ansi2knr utility, which allowed libjpeg to be built
with non-ANSI compilers), we retain that formatting for the libjpeg
code because it improves readability when using libjpeg's function
attribute macros (GLOBAL(), etc.)
- This reformatting project was accomplished with the help of AStyle and
Uncrustify, although neither was completely up to the task, and thus
a great deal of manual tweaking was required. Note to developers of
code formatting utilities: the libjpeg-turbo code base is an
excellent test bed, because AFAICT, it breaks every single one of the
utilities that are currently available.
- The legacy (MMX, SSE, 3DNow!) assembly code for i386 has been
formatted to match the SSE2 code (refer to
ff5685d5344273df321eb63a005eaae19d2496e3.) I hadn't intended to
bother with this, but the Loongson MMI implementation demonstrated
that there is still academic value to the MMX implementation, as an
algorithmic model for other 64-bit vector implementations. Thus, it
is desirable to improve its readability in the same manner as that of
the SSE2 implementation.
+ "JSIMD_ARM_NEON" = "JSIMD_NEON"
+ "JSIMD_MIPS_DSPR2" = "JSIMD_DSPR2"
+ "*_mips_dspr2" = "*_dspr2"
It's obvious that "NEON" refers to Arm and "DSPr2" refers to MIPS, and
this naming convention is consistent with the other SIMD extensions.
Newer versions of CMake (known to be the case with 3.7.x and 3.10.x)
fail to add a space between CMAKE_C_FLAGS and CMAKE_ASM_FLAGS, which
causes the build to fail when using the official build procedure.
Closes#216
Referring to https://docs.microsoft.com/en-US/cpp/build/stack-usage:
"All memory beyond the current address of RSP is considered volatile:
The OS, or a debugger, may overwrite this memory during a user debug
session, or an interrupt handler. Thus, RSP must always be set before
attempting to read or write values to a stack frame."
Basically, if-- under extremely rare circumstances-- a context swap were
to occur between saving the values of xmm8-xmm11 and setting the new
value of rsp, the O/S might not preserve that area of the stack. In
general, libjpeg-turbo should not be using xmm8-xmm11 before or after
the call to jsimd_huff_encode_one_block_sse2(), so this is probably a
non-issue, but it's still a good idea to fix it.
Based on
ff7d2030dd
NDK r16b moved some things around, so modify the Android build recipes
to take that into account while preserving compatibility with previous
NDK releases.
NOTE: the GCC 4.9 NDK toolchain is deprecated, so we will need to
develop new Android build recipes for libjpeg-turbo 1.6 that use the
Clang toolchain.
Closes#196
Loading RGB image files into a grayscale buffer isn't a particularly
useful feature, given that libjpeg-turbo can perform this conversion
much more optimally (with SIMD acceleration on some platforms) during
the compression process. Also, the RGB2GRAY() macro was not producing
deterministic cross-platform results because of variations in the
round-off behavior of various floating point implementations, so
`tjunittest -bmp` was failing in i386 builds.
Also, set the red/green/blue offsets for TJPF_GRAY to -1 rather than 0.
It was undefined behavior for an application to use those arrays/methods
with TJPF_GRAY anyhow, and this makes it easier for applications to
programmatically detect whether a given pixel format has red, green, and
blue components.
The main justification for this is to provide new libjpeg-turbo users
with a quick & easy way of developing a complete JPEG
compression/decompression program without requiring them to build
libjpeg-turbo from source (which was necessary in order to use the
project-private bmp API) or to use external libraries. These new
functions build upon significant enhancements to rdbmp.c, wrbmp.c,
rdppm.c, and wrppm.c which allow those engines to convert directly
between the native pixel format of the file and a pixel format
("colorspace" in libjpeg parlance) specified by the calling program.
rdbmp.c and wrbmp.c have also been modified such that the calling
program can choose to read or write image rows in the native (bottom-up)
order of the file format, thus eliminating the need to use an inversion
array. tjLoadImage() and tjSaveImage() leverage these new underlying
features in order to significantly improve upon the performance of the
old bmp API.
Because these new functions cannot work without the libjpeg-turbo
colorspace extensions, the libjpeg-compatible code in turbojpeg.c has
been removed. That code was only there to serve as an example of how
to use the TurboJPEG API on top of libjpeg, but more specific, buildable
examples now exist in the https://github.com/libjpeg-turbo/ijg
repository.
tjbenchtest and its Java derivatives are useful for rooting out hidden
problems with the more esoteric TJBench and TurboJPEG features. For
instance, on Windows, running tjbenchtest uncovered
5fce2e9421.
This commit also causes tjbenchtest and tjbenchtest.java to append -yuv
and -alloc to their log file names, depending on the arguments passed,
and it causes the build system to clean up those log files when the
'testclean' target is built.
Setting _libdir to CMAKE_INSTALL_FULL_LIBDIR only works when doing an
in-tree RPM build. SRPMs are architecture-agnostic, so the spec needs
to compute_libdir at the time the SRPM is rebuilt, not at the time it
is generated.
This is a regression introduced when implementing the new CMake-based
cross-platform build system.
Tag 1.5.2 release
* tag '1.5.2': (54 commits)
x86: Fix "short jump is out of range" w/ NASM<2.04
TurboJPEG: Document xform issue w/ big marker data
Java TJBench: Fix parsing of -warmup argument
Build: Disable warmup in TJBench regression tests
TJBench: Improve consistency of results
TurboJPEG: C API documentation buglet
TJBench: Code formatting tweaks
TJBench: Fix errors when decomp. files w/ ICC data
BUILDING.md: Include Android/x86 build recipes
Travis: Fix OS X build
Restore compatibility with older autoconf releases
Attribute ARM runtime detection code to Nokia
Honor max_memory_to_use/JPEGMEM/-maxmemory
AppVeyor: Fix CI build
TurboJPEG: Fix potential memory leaks
Always tweak EXIF w/h tags w/ lossless transforms
Fix error w/ lossless crop & libjpeg v7 emulation
Include jpeg_skip/crop_scanlines() in jpeg7.dll
libjpeg.txt: Include partial decomp. in TOC
Slightly de-confusify cjpeg, jpegtran usage info
...
Previously, -stoponwarning only had an effect on the underlying
TurboJPEG C functions, but TJBench still aborted if a non-fatal error
occurred. This commit modifies the C version of TJBench such that it
always recovers from a non-fatal error unless -stoponwarning is
specified. Furthermore, the benchmark stores the details of the last
non-fatal error and does not print any subsequent non-fatal error
messages unless they differ from the last one.
Due to limitations in the Java API (specifically, the fact that it
cannot communicate errors, fatal or otherwise, to the calling program
without throwing a TJException), it was only possible to make
decompression operations fully recoverable within TJBench. With other
operations, -stoponwarning still has an effect on the underlying C
library but has no effect at the Java level.
The Java API documentation has been amended to reflect that only certain
methods are truly recoverable, regardless of the state of
TJ.FLAG_STOPONWARNING.
Allow progressive entropy coding to be enabled on a
transform-by-transform basis, and implement a new transform option for
disabling the copying of markers.
Closes#153
- Provide a new C API function and TJException method that allows
calling programs to query the severity of a compression/decompression/
transform error.
- Provide a new flag that instructs the library to immediately stop
compressing/decompressing/transforming if a warning is encountered.
Fixes#151
Introduce a new C API function (tjGetErrorStr2()) that can be used to
retrieve compression/decompression/transform error messages in a
thread-safe (i.e. instance-specific) manner. Retrieving error messages
from global functions is still thread-unsafe.
Addresses a concern expressed in #151.
Tag 1.5.1 release
* tag '1.5.1':
ARM64 NEON: Fix another ABI conformance issue
Build: Remove ARMv6 support from 'make iosdmg'
Fix out-of-bounds write in partial decomp. feature
Silence additional UBSan warnings
Fix unsigned int overflow in libjpeg memory mgr.
TurboJPEG: Decomp. 4:2:2/4:4:0 JPEGs w/unusual SFs
Silence pedantic GCC6 code formatting warnings
Use plain upsampling if merged isn't accelerated
Implement h1v2 fancy upsampling
Fix AArch64 ABI conformance issue in SIMD code
Don't install libturbojpeg.pc if TJPEG disabled
Linux/PPC: Only enable AltiVec if CPU supports it
ARM/MIPS: Change the behavior of JSIMD_FORCE*
Bump version to 1.5.1 to prepare for new commits
This commit does the following:
-- Merges the two glueware functions (read_icc_profile() and
write_icc_profile()) from iccjpeg.c, which is contained in downstream
projects such as LCMS, Ghostscript, Mozilla, etc. These functions were
originally intended for inclusion in libjpeg, but Tom Lane left the IJG
before that could be accomplished. Since then, programs and libraries
that needed to embed/extract ICC profiles in JPEG files had to include
their own local copy of iccjpeg.c, which is suboptimal.
-- The new functions were prefixed with jpeg_ and split into separate
files for the compressor and decompressor, per the existing libjpeg
coding standards.
-- jpeg_write_icc_profile() was made slightly more fault-tolerant.
It will now trigger a libjpeg error if it is called before
jpeg_start_compress() or if it is passed NULL arguments.
-- jpeg_read_icc_profile() was made slightly more fault-tolerant.
It will now trigger a libjpeg error if it is called before
jpeg_read_header() or if it is passed NULL arguments. It will also
now trigger libjpeg warnings if the ICC profile data is corrupt.
-- The code comments have been wordsmithed.
-- Note that the one-line setup_read_icc_profile() function was not
included. Instead, libjpeg.txt now documents the need to call
jpeg_save_markers(cinfo, JPEG_APP0 + 2, 0xFFFF) prior to calling
jpeg_read_header(), if jpeg_read_icc_profile() is to be used.
-- Adds documentation for the new functions to libjpeg.txt.
-- Adds an -icc switch to cjpeg and jpegtran that allows those programs
to embed an ICC profile in the JPEG files they generate.
-- Adds an -icc switch to djpeg that allows that program to extract an
ICC profile from a JPEG file while decompressing.
-- Adds appropriate unit tests for all of the above.
-- Bumps the SO_AGE of the libjpeg API library to indicate the presence
of new API functions.
Note that the licensing information was obtained from:
https://github.com/mm2/Little-CMS/issues/37#issuecomment-66450180
The whole point of `make tarball` is to make it easy for users to create
a binary distribution of libjpeg-turbo on platforms that aren't
supported by our official build system, so requiring root permissions
somewhat defeated that purpose. Intead, the script now attempts to
detect whether the system has GNU tar or a recent version of BSD tar
that supports setting the ownership of the files in the tarball.
Although there is little chance that we will ever have a package
conflict on OS X, the convention from our Linux packages is to use the
package name, not the project name, for the name of the documentation
directory.
These improvements enable build systems to use GNUInstallDirs to define
custom directory variables.
- The set_dir() macro was renamed to GNUInstallDirs_set_install_dir(),
in keeping with the module's established macro naming convention.
- Rather than detecting whether the prefix has changed, the new
GNUInstallDirs_set_install_dir() macro instead examines whether the
default for the variable in question has changed. This allows for
more flexibility, since build systems may decide to change the
defaults based on factors other than the prefix. It also enables the
macro to work properly outside of the module.
- The module now performs directory variable substitution within the
body of GNUInstallDirs_get_absolute_install_dir().
- The JAVADIR variable is no longer included in GNUInstallDirs. That
directory is not part of the GNU spec, and it turns out that various
operating systems use different conventions for the location of Java
classes. Instead, the variable is now implemented in our build
system as a demonstration of the aforementioned GNUInstallDirs
enhancements.
- GNUInstallDirs: any directory variable can now reference any other
directory variable by including its name in angle brackets (<>).
- Changed the documentation of the directory variables in BUILDING.md
accordingly. This commit also includes some formatting tweaks to
that section (using boldface for directory names, as is our
convention.)
- Changed the package scripts such that they use
CMAKE_INSTALL_DATAROOTDIR rather than CMAKE_INSTALL_DATADIR.
- We no longer override the install dir. defaults on Windows unless
performing an official build. It may be useful, for instance, to
use the GNU defaults when installing into an MSYS environment.
It isn't actually necessary to specify `CMAKE_INSTALL_DEFAULT_MANDIR`
for our official build. Because `CMAKE_INSTALL_DEFAULT_DATAROOTDIR` is
blank for the official build, the default of "<DATAROOTDIR>/man" will
resolve to "man".
For the same reason, this commit changes the specification of
`CMAKE_INSTALL_DEFAULT_DOCDIR` and `CMAKE_INSTALL_DEFAULT_JAVADIR` in
the official build to be dependent on the data root directory (mainly to
make it obvious what we're doing.)
This commit also tweaks the example CMake command line in the directory
variable documentation so that it shows the correct location of the
CMake argument.
YASM requires a debug format to be specified with -g. Currently the
only combination that I can make work at all is DWARF-2/ELF (YASM
doesn't support Mach-O debugging at all, and its support for CV8/MSVC
and MinGW/DWARF-2 appears to be broken), so debugging is only enabled
automatically for ELF at the moment. For other formats, we don't
specify -g at all, which is how the old build system behaved.
Fixes#125, Closes#126
This builds upon the existing GNUInstallDirs module in CMake but adds
the following features to that module:
- The ability to override the defaults for each install directory
through a new set of variables (`CMAKE_INSTALL_DEFAULT_*DIR`).
Before operating system vendors began shipping libjpeg-turbo, it was
meant to be a run-time drop-in replacement for the system's
distribution of libjpeg, so it has traditionally installed itself
under /opt/libjpeg-turbo on Un*x systems by default. On Windows, it
has traditionally installed itself under %SystemDrive%\libjpeg-turbo*,
which is not uncommon behavior for open source libraries (open source
SDKs tend to install outside of the Program Files directory so as to
avoid spaces in the directory name.) At least in the case of Un*x,
the install directory behavior is based somewhat on the Solaris
standard, which requires all non-O/S packages to install their files
under /opt/{package_name}. I adopted that standard for VirtualGL and
TurboVNC while working at Sun, because it allowed those packages to be
located under the same directory on all platforms. I adopted it for
libjpeg-turbo because it ensured that our files would never conflict
with the system's version of libjpeg. Even though many Un*x
distributions ship libjpeg-turbo these days, not all of them ship the
TurboJPEG API library or the Java classes or even the latest version
of the libjpeg API library, so there are still many cases in which it
is desirable to install a separate version of libjpeg-turbo than the
one installed by the system. Furthermore, installing the files under
/opt mimics the directory structure of our official binary packages,
and it makes it very easy to uninstall libjpeg-turbo.
For these reasons, our build system needs to be able to use
non-GNU-compliant defaults for each install directory if
`CMAKE_INSTALL_PREFIX` is set to the default value.
- For each directory variable, the module now detects changes to
`CMAKE_INSTALL_PREFIX` and changes the directory variable accordingly,
if the variable has not been changed by the user.
This makes it easy to switch between our "official" directory
structure and the GNU-compliant directory structure "on the fly"
simply by changing `CMAKE_INSTALL_PREFIX`. Also, this new mechanism
eliminated the need for the crufty mechanism that previously did the
same thing just for the library directory variable.
How it should work:
- If a dir variable is unset, then the module will set an internal
property indicating that the dir variable was initialized to its
default value.
- If the dir variable ever diverges from its default value, then the
internal property is cleared, and it cannot be set again without
unsetting the dir variable.
- If the install prefix changes, and if the internal property
indicates that the dir variable is still set to its default value,
and if the dir variable's value is not being manually changed at the
same time that the install prefix is being changed, then the dir
variable's value is automatically changed to the new default value
for that variable (as determined by the new install prefix.)
- The directory variables are now always cached, regardless of whether
they were set on the command line or not. This ensures that they can
easily be examined and modified after being set, regardless of how they
were set.
This was made possible by the introduction of the aforementioned
`CMAKE_INSTALL_DEFAULT_*DIR` variables.
- Improved directory variable documentation (based on descriptions at
https://www.gnu.org/prep/standards/html_node/Directory-Variables.html)
- The module now allows "<DATAROOTDIR>" to be used as a placeholder in
relative directory variables.
It is replaced "on the fly" with the actual path of
`CMAKE_INSTALL_DATAROOTDIR`.
This should more closely mimic the behavior of the old autotools build
system while retaining our customizations to it, and it should retain
the behavior of the old CMake build system.
Closes#124
Strict C89-conformant compilers don't support the "inline" keyword, but
most of them support "__inline__", and that keyword can be used with the
always_inline atribute as well. This commit also removes duplicate code
by using a foreach() loop to test the various keywords.
- Replace CMAKE_SOURCE_DIR with CMAKE_CURRENT_SOURCE_DIR
- Replace CMAKE_BINARY_DIR with CMAKE_CURRENT_BINARY_DIR
- Don't use "libjpeg-turbo" in any of the package system filenames
(because CMAKE_PROJECT_NAME will not be the same if building LJT as
a submodule.)
Closes#122
The previous hack (adding ${CMAKE_ASM_COMPILER} to CMAKE_ASM_FLAGS)
didn't work in all cases, because more recent versions of CMake place
the includes ahead of the flags (which meant that the real assembler
wasn't the first argument to gas-preprocessor.pl.)
CMAKE_INSTALL_RPATH has to be set before the targets are defined (oops.)
This also explicitly turns on MACOSX_RPATH for the shared libraries
(which is the default with newer versions of CMake but not with 2.8.x.)
The old autotools/libtool build system hard-coded the install name
directory of the OS X shared libraries to libdir, which meant that any
executable that linked against those libraries would also be hard-coded
to look for the libjpeg-turbo libraries in that directory. @rpath makes
the OS X version of libjpeg-turbo behave like the Linux version, in the
sense that the executables under /opt/libjpeg-turbo/bin will
automatically pick up the libraries under /opt/libjpeg-turbo/lib* by
default, but other executables won't unless they are linked with -rpath.
-- Use trusty for SIMD builds. Ubuntu 12.04 is still using NASM 2.09.x,
which isn't new enough to support AVX2.
-- Add a special test for the SSE2 code path, since it is no longer the
default.
cpuid tells us whether the O/S uses extended state management via
XSAVE/XRSTOR, but we have to call xgetbv to verify that it is using
XSAVE/XRSTOR to manage the state of XMM/YMM registers.
This fixes crashes that would occur when attempting to use
libjpeg-turbo's AVX2 extensions on older O/S's (such as Windows XP or
RHEL 5.) Even if the CPU supports AVX2, the O/S has to also support
saving/restoring YMM registers when switching contexts.
This commit adds back instructive comments in the image-space
algorithms, similar to those in the SSE2 code. These comments make it
easier to follow the flow of data through the algorithms.
Expand collect_args/uncollect_args macros so that the number of
arguments can be specified. This prevents unnecessary push and mov
instructions.
NOTE: On Windows, the push/pop of xmm6 and xmm7 had to be moved to the
other end of the macro to ensure that rsp is aligned on a 16-byte
boundary.
* libjpeg-turbo/master: (140 commits)
Increase severity of tjDecompressToYUV2() bug desc
Catch libjpeg errors in tjDecompressToYUV2()
BUILDING.md: Fix "... OR ..." indentation again
BUILDING.md: Fix confusing Windows build reqs
ChangeLog.md: Improve readability of plain text
change.log: Refer users to ChangeLog.md
Markdown version of ChangeLog.txt
Rename ChangeLog.txt
README.md: Link to BUILDING.md
BUILDING.md and README.md: Cosmetic tweaks
ChangeLog: "1.5 beta1" --> "1.4.90 (1.5 beta1)"
Java: Fix parallel make with autotools
Win/x64: Fix improper callee save of xmm8-xmm11
Bump TurboJPEG C API revision to 1.5
ChangeLog: Mention jpeg_crop_scanline() function
1.5 beta1
Fix v7/v8-compatible build
libjpeg API: Partial scanline decompression
Build: Make the NASM autoconf variable persistent
Use consistent/modern code formatting for dbl ptrs
...
* libjpeg-turbo/1.4.x: (94 commits)
CMakeLists.txt: Clarify that Un*x isn't supported
Catch libjpeg errors in tjDecompressToYUV2()
cjpeg: Fix buf overrun caused by bad bin PPM input
Add version/build info to global string table
Ensure that default Huffman tables are initialized
Fix memory leak when running tjunittest -yuv
Prevent overread when decoding malformed JPEG
Guard against wrap-around in alloc functions
Fix Visual C++ compiler warnings
rdppm.c: formatting tweaks
jmemmgr.c: formatting tweaks
TurboJPEG: Avoid dangling pointers
Update Android build instr. for ARMv8, PIE, etc.
Makefile.am: formatting tweak
Update build instructions for new autoconf, GitHub
1.4.3
Regression: Allow co-install of 32-bit/64-bit RPMs
Build: Use FILEPATH type for NASM CMake variable
Comment formatting tweaks
Fix 'make dist'
...
Tag 1.4.1 release
* tag '1.4.1': (427 commits)
Now that the TurboJPEG API is reporting libjpeg warnings as errors, an "Invalid SOS parameters for sequential JPEG" warning surfaced in tjDecodeYUV*(). This was caused by the Se member of jpeg_decompress_struct being set to 0 (it is normally set to a non-zero value when the start-of-scan markers are read, but there are no SOS markers in this case, because we're not actually decompressing a JPEG file.)
Fix a segfault that occured in the MIPS DSPr2 fancy upsampling routine when downsampled_width==3. Because the DSPr2 code unrolls the loop for the middle columns (refer to jdsample.c), it has the effect of performing two column iterations, and that only works properly if the number of columns (minus the first and last) is >= 2. For the specific case of downsampled_width==3, this patch skips to the second iteration of the unrolled column loop.
If a warning (such as "Premature end of JPEG file") is triggered in the underlying libjpeg API, make sure that the TurboJPEG API function returns -1. Unlike errors, however, libjpeg warnings do not make the TurboJPEG functions abort.
Back out r1555 and r1548. Using setenv() didn't fix the iOS simulator issue. It just replaced an undefined _putenv$UNIX2003 symbol with an undefined _setenv$UNIX2003 symbol. The correct solution seems to be to use -D_NONSTD_SOURCE when generating our official builds.
Fix the Windows build. I remember now why I used putenv() originally-- because Windows doesn't have setenv(). We could use _putenv_s(), but older versions of MinGW don't have that either. Fortunately, since all of the environment values we're setting in turbojpeg.c are static, we can just map setenv() to putenv() using a macro. NOTE: we still have to use _putenv_s() in turbojpeg-jni.c, but at least people who may need to build with an older version of MinGW can still do so by disabling the Java build.
Allow building only static or only shared libraries on Windows
__WORDSIZE doesn't seem to be available on platforms other than Mac or Linux, and best practices are for user-level code not to rely on it anyhow, since it's meant to be an internal macro. Fortunately, autoconf already has a way of determining the word size at configure time, so it can be passed into the compiler. This should work on any platform and has been tested on all of the Un*x platforms we support (Linux, Mac, FreeBSD, Solaris.)
Unless you define _ANSI_SOURCE, then putenv() on Mac is renamed to putenv$UNIX2003(), and this causes problems when trying to link an i386 iOS application (for the simulator) against the TurboJPEG static library. It's easiest to just use setenv() instead.
Fix a bug in the 64-bit Huffman encoder that Google discovered when encoding some very specific (and proprietary) aerial images using quality=98, an optimized Huffman table, and the ISLOW DCT. These images were causing the Huffman bit buffer to overflow, because the code for encoding the DC coefficient was using the equivalent of the 32-bit version of EMIT_BITS(). Thus, when 64-bit code was used, the DC coefficient code was not properly checking how many bits were in the buffer before attempting to add more bits to it. This issue appears to have existed in all versions of libjpeg-turbo.
Restore backward compatibility with MSVC < 2010 (broken by r1541)
Oops. OS X doesn't define __WORDSIZE unless you include stdint.h, so apparently the Huffman codec hasn't ever been fully accelerated on 64-bit OS X.
Allow the executables and libraries outside of the sharedlib/ directory to be linked against msvcr*.dll instead of libcmt*.lib. This is reported to be necessary when building libjpeg-turbo for use with C#.
Surround the usage of getenv() in the TurboJPEG API with #ifndef NO_GETENV so that developers can add -DNO_GETENV to the C flags when building for platforms that don't have getenv(). Currently this is known to be necessary when building for Windows Phone.
If libjpeg-turbo is configured with a non-default prefix, such as /usr, then use the docdir variable defined by autoconf 2.60 and later, if available. This will, for instance, install the documentation under /usr/share/doc/libjpeg-turbo by default if prefix=/usr, unless docdir is overridden. When using earlier versions of autoconf, docdir is set to ${datadir}/doc, as it always has been.
Enable silent build rules for the NASM objects, if the source is configured with automake 1.11 or later. NOTE: the build still spits out "error: ignoring unknown tag NASM" for each object, but unfortunately, if we remove "--tag NASM" from the command line, the build breaks under older versions of automake (it aborts with "unable to infer tagged configuration.")
Set the RPM and deb architecture properly on non-x86 platforms.
Come on, Cohaagen, you got what you want. Give these people air!
Oops. Need to set the alpha channel when using TYPE_4BYTE_ABGR*. This has no bearing on the actual tests, but it prevents the PNG pre-encode reference images for those tests from being blank.
Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error.
Introduce fast paths to speed up NULL color conversion somewhat, particularly when using 64-bit code; on the decompression side, the "slow path" also now use an approach similar to that of the compression side (with the component loop outside of the column loop rather than inside.) This is faster when using 32-bit code.
...
* origin/master: (108 commits)
Bump version number to 3.1.
jpegyuv: fix memory leak when path is invalid
jpegyuv: fix memory leak when @image_buffer allocation fails
yuvjpeg: fix memory leak when @image_buffer allocation fails
jpegtran: Do not leak the input and output buffers
Fix previous commit
Scan optimization: return error when unable to copy data buffer
cjpeg option for baseline quant tables
Fix#153
rdpng: convert 16-bit input to 8-bit
Larger number of DC trellis candidates
Fix overflow issue #157
Const on getters
Const on simple getters and copy source
Expanded .gitignore
Add pkg-config requirement
Re-order links.
Declare inbuffer const
Oops. Delete the duplicate copy of [lib]turbojpeg.dll in the binary directory when uninstalling the package.
Get rid of changelog file that we don't update.
...
* commit 'eca0637c8150d3d1c08a60c64d7ee16eaea4b198':
Remove trailing spaces
Another oops. tjBufSizeYUV2() should return -1 if width < 1.
Oops. tjPlaneSizeYUV() should return -1 if componentID > 0 and subsamp==TJSAMP_GRAY.
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
Fix Windows build
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
Fix build on OS X PowerPC platforms
Oops. Forgot to alter the version header in the change log to indicate the release of 1.4 beta.
Create 1.4.x branch
#166 describes an issue where I/O suspension is not properly handled in
scan optimization. Supporting I/O suspension may be difficult to
achieve here, thus return an error to make it explicit that I/O
suspension is unsupported.
Add command line option -quant-baseline to cjpeg to force quantization
table entries to be in 1-255 range for JPEG baseline compatibility. See
related discussion in #145
* libjpeg-turbo: (39 commits)
Oops. Delete the duplicate copy of [lib]turbojpeg.dll in the binary directory when uninstalling the package.
AltiVec SIMD implementation of sample conversion and integer quantization
Document the fact that the AltiVec implementation uses the same modified algorithms as the SSE2 implementation
Use intrinsics for loading/storing data in the DCT/IDCT functions. This has no effect on the performance of the aligned loads/stores, but it makes it more obvious what that code is doing. Using intrinsics for the unaligned stores in the inverse DCT functions increases overall decompression performance by 1-2%.
AltiVec SIMD implementation of RGB-to-Grayscale color conversion
Remove unneeded code; Make sure jccolor-altivec.o will be rebuilt if jccolext-altivec.c changes.
AltiVec SIMD implementation of RGB-to-YCC color conversion
Make test a phony target so things don't go haywire if there is a file named test.c in the current directory.
Maintain the traditional order of the regression tests while allowing the TurboJPEG and libjpeg portions to be executed separately
Make comments more consistent
Add a "quicktest" pseudo-target, for those times when you just don't want to sit through 11 iterations of TJUnitTest.
Cosmetic tweaks to the PowerPC SIMD stubs
Split AltiVec algorithms into separate files for ease of maintenance; Rename constants using lowercase so they are not confused with macros
Optimizations to the AltiVec DCT algorithms (pre-compute constants and combine multiply/add operations)
AltiVec SIMD implementation of slow integer inverse DCT
Use macros to allocate constants statically, rather than reading them from a table using vec_splat*(). This improves code readability and probably improves performance a bit as well.
Swap the order of the IFAST and ISLOW FDCT functions so that it matches the order of the prototypes in jsimd.h and the stubs in jsimd_powerpc.c.
Include ARMv8 binaries when generating a combined OS X/iOS package using 'make iosdmg'
In the output of the configure script, indicate whether gas-preprocessor.pl is being used along with the assembler.
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
...
Conflicts:
BUILDING.txt
ChangeLog.txt
cjpeg.c
jpegtran.c
Initial implementation of trellis quantization for arithmetic coding.
The rate computation does not yet implement all rules of the entropy
coder and may thus be suboptimal.
Fix pass number computation in scan optimization to support case where
Huffman table optimization is not done, e.g. when arithmetic coding is
used
Enable combination of arithmetic coding and scan optimization
(previously disabled)
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1444 632fc199-4ca6-4c93-a231-07263d6284db
Add extension parameter JFLOAT_TRELLIS_DELTA_DC_WEIGHT that controls
how distortion is calculated in DC trellis quantization. The parameter
defines weighting between actual distortion of DC and distortion of
vertical gradient of DC.
By default the parameter is 0.0 and has no effect.
Addresses #117
This incorporates an upstream fix to add jdmrg565.c to the tarball created
by 'make dist', as well as a fix to add the new jcmaster.h file to same. There
are still some mozjpeg-specific files that aren't added when doing 'make dist'.
I'll let someone else worry about those. This patch mainly ensures that any
files that might be eventually adopted upstream are included.
mozjpeg should produce identical output to libjpeg-turbo when the JCP_FASTEST
compression profile is used. That means that that profile needs to revert to
the default libjpeg quantization/Huffman tables as well as disable mozjpeg's
duplicate table checking feature. This patch also adds -revert to any instance
of cjpeg and jpegtran called by 'make test' (or ctest on Windows), so that
those tests actually work again. The tests aren't useful for regression
testing the mozjpeg extensions, but at least they can now be used to regression
test the underlying code.
There was an oversight in the extension framework. jpeg_start_compress() can
be called multiple times between the time that a compress structure is created
and the time it is destroyed. If this happened, then the following sequence
would occur:
-- heap alloc of master struct within jpeg_create_compress()
-- heap free of master struct within jinit_c_master_control()
-- static alloc of extended master struct (JPOOL_IMAGE) within
jinit_c_master_control()
-- free extended master struct in jpeg_finish_compress()
-- jinit_c_master_control() now sees that cinfo->master is set and tries to
free it, even though it has already been freed. Chaos ensues.
The fix involved breaking out the extended master struct into a header so that
jpeg_create_compress() can go ahead and allocate it to the correct size, thus
eliminating the need to free and reallocate it in jinit_c_master_control().
Further, the master struct is now created in the permanent pool, so it will
survive until the compression struct is destroyed. Further,
jinit_c_master_control() now resets all fields in the master struct that
are not related to the extension parameters.
"jcext" is a bit more descriptive, since this code is primarily intended to
extend the libjpeg API. It does so in a backward-ABI-compatible manner, but
"jccompat" could be misinterpreted to mean that the code is providing backward
compatibility at the code level..
This eliminates JBOOLEAN_USE_MOZ_DEFAULTS and replaces it with
JINT_COMPRESS_PROFILE, a more flexible and descriptive parameter. Currently,
this new parameter works in much the same way as the old-- it changes the
behavior of jpeg_set_defaults(). It currently supports only two values
(max. compression, i.e. mozjpeg defaults, and fastest, i.e. libjpeg-turbo
defaults), but it can be extended in the future with additional profiles that
balance compression ratio with performance.
This includes more descriptive text for the project summary (the same
text that is in the package descriptions), a more thorough description of the
libjpeg API extensibility framework, reformatting to improve readability
(particularly on 80-column terminals), and numerous grammar tweaks.
JBOOLEAN_ONE_DC_SCAN and JBOOLEAN_SEP_DC_SCAN are merged into a single
parameter JINT_DC_SCAN_OPT_MODE
Default behavior is modified to use one DC scan per component
Since mozjpeg is now backward ABI-compatible with libjpeg[-turbo], it is now
possible to temporarily load mozjpeg into a binary application and cause that
application to generate uber-compressed JPEGs (at the expense of an extreme
performance loss, of course.) For instance, someone could do
LD_LIBRARY_PATH=/opt/mozjpeg/lib convert blah_blah_blah
to make ImageMagick use mozjpeg instead of the system's pre-installed JPEG
library (libjpeg-turbo, in most cases.) However, this only makes sense if
mozjpeg is actually producing different behavior by default than libjpeg-turbo.
Currently it isn't. Currently it requires the application to set
JBOOLEAN_USE_MOZ_DEFAULTS to TRUE in order to enable the mozjpeg-specific
behavior, but of course applications that were built to use libjpeg[-turbo]
won't do that. Thus, this patch sets use_moz_defaults to TRUE by default,
requiring an application to explicitly set it to FALSE in order to revert to
the libjpeg[-turbo] behavior (makes sense, since the only applications that
would need to revert to the libjpeg[-turbo] behavior would be mozjpeg-aware
applications.)
Note that we discussed the possibility of adding a function
(jpeg_revert_defaults()), which would act the same as jpeg_set_defaults() does
in libjpeg[-turbo]. This is a good solution for implementing the -revert
switch in cjpeg, but unfortunately it doesn't work for jpegtran. The reason
is that jpeg_set_defaults() is called within the body of
jpeg_copy_critical_parameters(), which is part of the API. So yet again,
if mozjpeg were loaded into a non-mozjpeg-aware application at run time, it
would be desirable for jpeg_copy_critical_parameters() to set the parameters
to mozjpeg defaults. That means that, in order to implement the -revert
switch in jpegtran, it would be necessary to introduce a new function
(jpeg_revert_critical_parameters(), perhaps). It seems cleaner to just keep
using the JBOOLEAN_USE_MOZ_DEFAULTS parameter to control the behavior of
jpeg_set_defaults(), even though this represents a minor abuse of the libjpeg
API (jpeg_set_defaults() is technically supposed to set all of the parameters
to defaults, irrespective of any previous state. However, as long as we
document that JBOOLEAN_USE_MOZ_DEFAULTS works differently, then it should be
OK.)
* libjpeg-turbo:
Remove trailing spaces
Another oops. tjBufSizeYUV2() should return -1 if width < 1.
Oops. tjPlaneSizeYUV() should return -1 if componentID > 0 and subsamp==TJSAMP_GRAY.
The AltiVec code actually works on 32-bit PowerPC platforms as well, so change the "powerpc64" token to "powerpc". Also clean up the shift code, which wasn't building properly on OS X.
AltiVec SIMD implementation of fast forward DCT
Bump version to 1.5 alpha1 to prepare for new features
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
Fix Windows build
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
Fix build on OS X PowerPC platforms
Oops. Forgot to alter the version header in the change log to indicate the release of 1.4 beta.
For whatever reason, some of these files didn't get fully merged from
libjpeg-turbo 1.4. They still contained tab characters and other formatting
conventions from libjpeg-turbo 1.3. This patch also fixes some obvious
indentation errors in the mozjpeg-specific code. There is more formatting work
that needs to be done to the mozjpeg-specific code, to fix line overruns,
incorrect operator whitespace, and other issues that make it not consistent
with the libjpeg/libjpeg-turbo code.
This might be slightly more controversial, since it changes the CMake and
autotools project names and the binaty package names to "mozjpeg", and it
changes the default install directory to /opt/mozjpeg. To me, this makes much
more sense, but it does represent a change in operational behavior, which is
why I put it in a separate commit.
This patch does the following:
-- Implements some (hopefully non-controversial) changes to the package
descriptions, in order to prevent confusion (the existing descriptions from
libjpeg-turbo are not appropriate for mozjpeg.)
-- Replaces "libmozjpeg" with "mozjpeg" in all documentation and comments. The project is called "mozjpeg", and it doesn't actually generate a library called
"libmozjpeg", so it doesn't make sense to use "libmozjpeg" to describe it.
-- Replaces "MozJPEG" with "TurboJPEG" in all documentation and comments.
"MozJPEG" appears to have been the product of blindly searching/replacing
instances of "Turbo". TurboJPEG is the name of the API, and that name still
applies to the implementation in mozjpeg. Furthermore, the TurboJPEG libraries
are still called "libturbojpeg" in mozjpeg.
-- Attempts to remove build instructions that are irrelevant or not applicable
to mozjpeg. Further work possibly needs to be done here-- for instance, it
doesn't make much sense to have build instructions for mobile devices when the
library is not intended to be used for decoding.
-- Changes the vendor in the DEB and RPM files from "The libmozjpeg Project" to
"Mozilla Research".
-- Changes the source tarball location in the RPM spec file to correctly point
to the release tarball on github.
-- Changes the source directory in the RPM spec file to "mozjpeg-%{version}",
which is the actual name of the source directory in the mozjpeg tarballs.
The ABI compatibility feature was developed by the current maintainer of
libjpeg-turbo with an eye toward eventual inclusion in libjpeg-turbo (once
other features are added to libjpeg-turbo that necessitate the inclusion.)
Thus, it is easy to ensure that the DLL function ordinals will be synchronized
between libjpeg-turbo and mozjpeg. However, it still makes sense to allow for
a little bit of breathing room, just in case. Thus, this patch uses ordinals
starting at 200 for the accessor functions. It would probably make sense to
start the equivalent decompressor get/set functions at ordinal 300, once they
are implemented.
Windows requires exported symbols to be explicitly declared.
Also, use a very large ordinal number so that any future symbols
added by IJG or TurboJPEG will not break ABI.
Fixes#104.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '73edb3d734a628fd88994bc974dc6737a58bd956': (45 commits)
Rename the ARM64 assembly file to match the C file
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches: https://sourceforge.net/p/libjpeg-turbo/patches/64/
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
Clarify forward compatibility of iOS/ARM builds
ARM64 NEON SIMD support for YCC-to-RGB565 conversion
ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
Windows doesn't have setenv(). Go, go Gadget Macros.
1.4 beta1
Fix 'make dist'
Don't use sudo when building a Debian package unless the user is non-root
Add a set of undocumented environment variables and Java system properties that allow compression features of libjpeg that are not normally exposed in the TurboJPEG API to be enabled. These features are not normally exposed because, for the most part, they aren't "turbo" features, but it is still useful to be able to benchmark them without modifying the code.
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
Oops
Subtle point, but dest->outbuffer is a pointer to the address of the JPEG buffer, which is stored in the calling program. Thus, *(dest->outbuffer) will always equal *outbuffer. We need to compare *outbuffer with dest->buffer instead to determine if the pointer is being reused.
If the output buffer in the TurboJPEG destination manager was allocated by the destination manager and is being reused from a previous compression operation, then we need to get the buffer size from the previous operation, since the calling program doesn't know the actual buffer size.
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
...
Conflicts:
CMakeLists.txt
Makefile.am
configure.ac
jcdctmgr.c
release/deb-control.tmpl
sharedlib/CMakeLists.txt
simd/CMakeLists.txt
turbojpeg.c
* origin/master: (23 commits)
Update .gitignore
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Enable DC trellis by default
Avoid double inline attribute
Detect libpng
Implement DHT Merging
Add .gitignore for autotools files
Check memory alloc success
Update cjpeg usage text
Implement DQT merging
Fix issue with scan printout
Get rid of unnecessary and obsolete platform configuration instructions.
Add error checks for malloc calls that don't already have them. Issue #87.
yuvjpeg: fix trivial leak
Parse quality as float
PNG reading support
Fix issue with DC trellis
Add option to split DC scans
Add trellis for DC
Bump version to 2.1.
...
Conflicts:
BUILDING.txt
cdjpeg.h
jcdctmgr.c
jchuff.h
jcmarker.c
jcmaster.c
jconfig.txt
jpeglib.h
rdswitch.c
* commit 'b8d044a666056d4d8d28d7a5d0805ac32b619b36': (58 commits)
Big oops. wrjpgcom on Windows was being built using the rdjpgcom source.
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
Remove VMS-specific code
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We don't support non-ANSI C compilers
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
SIMD-accelerated int upsample routine for MIPS DSPr2
Fix MIPS build
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
Further copyright header cleanup
Further copyright header cleanup
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
Clean up code formatting in the SIMD interface functions
SIMD-accelerated NULL convert routine for MIPS DSPr2
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
Fix error in MIPS DSPr2 accelerated smooth downsample routine
SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2
Minor tweak to improve code readability
...
Conflicts:
BUILDING.txt
CMakeLists.txt
Makefile.am
cdjpeg.h
cjpeg.1
cjpeg.c
configure.ac
djpeg.1
example.c
jccoefct.c
jcdctmgr.c
jchuff.c
jchuff.h
jcinit.c
jcmaster.c
jcparam.c
jcphuff.c
jidctflt.c
jpegint.h
jpeglib.h
jversion.h
libjpeg.txt
rdswitch.c
simd/CMakeLists.txt
tjbench.c
turbojpeg.c
usage.txt
wrjpgcom.c
* commit '93ddfcfc1a814789ed64d967a6118616753bb9d5': (65 commits)
Use clz/bsr instructions on ARM for bit counting rather than the lookup table (reduces memory footprint and can improve performance in some cases.)
Make iOS build instructions more generic and applicable to all versions of Xcode; modify iOS build procedure for Xcode 5.0 and later to fix a build issue with Xcode 5.1.
Update build instructions to reflect the use of pkgbuild/productbuild
Remove any claims of support for OS X 10.4 "Tiger" (the packaging system overhaul produces packages that require Leopard or later, and I haven't been able to test Tiger for years anyhow.) Update TurboJPEG shared library version.
Migrate Mac packaging system to pkgbuild, since PackageMaker is no longer supported.
Remove the sections about replacing libjpeg at run time and compile time. These were written before O/S distributions started shipping libjpeg-turbo, and they are either pedantic or no longer relevant. Also remove any text that assumes the use of our official project binaries. Notes specific to the official binaries have been moved into the project wiki.
Fix Windows build
Since we're now maintaining our own Cygwin pseudo-repository directories instead of recommending that users install these packages from a local source, it makes more sense to name the packages according to Cygwin specs, so they can be copied as-is into the pseudo-repository.
39dbc2db9718f9af2f62eb486fd73328fe8bf5e8
Fix 'make dist'
RHEL 6 (and probably other platforms as well) sets _defaultdocdir=%{_datadir}/doc, which screws things up, since we're overriding _datadir. Since we intend _defaultdocdir to be /usr/share/doc, just be explicit about it.
Fix compiler warning about unused function when building with the libjpeg v6b API/ABI
Fix compiler warning ("always_inline function might not be inlinable") when building with recent versions of GCC
Enable silent build (can be overridden with 'make V=1') if the version of autotools being used is new enough.
Extend YUVImage class to allow reuse of the same buffer with different metadata; port TJBench changes that treat YUV encoding/decoding as an intermediate step of the JPEG compression/decompression pipeline rather than a separate test case; add YUV encode/decode tests to the Java version of tjbenchtest
formatting tweaks
Fix an error that occurred when trying to use the lossless transform feature without specifying -quiet; formatting tweak
Move the garbage collection of the JPEG tiles into the decompression function to increase the chances that tiled decompression of large images will succeed without an OutOfMemoryError.
Generate the Java documentation using javadoc 7, to improve readability.
This should have been checked in with the previous commit.
...
Conflicts:
BUILDING.txt
configure.ac
jversion.h
release/Info.plist.in
release/ReadMe.rtf
tjbench.c
turbojpeg.c
* mozjpeg: (94 commits)
Disable scan optimization if no scan given
Fixed mozjpeg build with Visual C++ 2010
Added few error messages in cjpeg
Fix trellis for nonprogressive mode (#69)
Bugfix: AM_PROG_AR is not recognized by older automake, so only use it when defined
Bump version number for 2.0, make this version 2.0.1.
Update MS-SSIM tuning
Fix#64
Updating yuvjpeg and jpegyuv to match Daala tools.
Fix#56
Silence compiler warning
Silence compiler warning
Improve support of JPEG input in cjpeg
Fix issue with JPEG read in cjpeg
Add support for JPEG input in cjpeg
Fix#50
Disable trellis in jpegtran
Update version to 2.0pre.
Use single DC scan by default
Add configure check for libm/pow. Fixes Linux build issue.
...
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1403 632fc199-4ca6-4c93-a231-07263d6284db
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1402 632fc199-4ca6-4c93-a231-07263d6284db
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
The current way multipass works, write_scan_header() seems to be called
multiple times, so only DC tables get merged, however, this will still
merge AC tables if possible.
Implements the second half of #30.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Command line option -split-dc-scan is added to code DC scans
independently (instead of interleaved). It should be determined whether
this option introduces any decoder compatibility issues ( see #83 )
Option -multidcscan is renamed to -opt-dc-scan
Add option to apply trellis quantization to the DC coefficients ( see
#57 ). May need further refinement to make sure block order during
trellis optimization matches order during coding.
if trellis_quant is enabled, increment total number of
passes by optimization beginning pass number.
Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
From jpeglib-turbo r1221:
Integrate a slightly modified version of Mozilla's patch for
precomputing the bit-counting LUT. This is useful if the table needs
to be shared among multiple processes, although the primary reason for
doing that is reduced footprint on mobile devices, which are probably
already covered by the clz intrinsic code.
From libjpeg-turbo r1288
Port the more accurate (and slightly faster) floating point IDCT
implementation from jpeg-8a and later. New research revealed that the
SSE/SSE2 floating point IDCT implementation was actually more accurate
than the jpeg-6b implementation, not less, which is why its
mathematical results have always differed from those of the jpeg-6b
implementation. This patch brings the accuracy of the C code in line
with that of the SSE/SSE2 code.
Add macro JPEG_RAW_READER that defines whether to pass RAW sample data
from input to output JPEG files (hence preserving color space and
sampling). Macro is now disabled by default.
Add code to copy metadata from input to output JPEG, hence preserving
color profiles and other important information
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1319 632fc199-4ca6-4c93-a231-07263d6284db
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1318 632fc199-4ca6-4c93-a231-07263d6284db
Value of cinfo->one_dc_scan is set to true by default to use a single
DC scan for all components.
Option -onedcscan is replaced by -multidcscan to enable multiple DC
scans.
While this change appears to degrade compression performance it
improves compatibility with a wider range of JPEG decoders.
Add option to have a single DC scan wherein all components are
interleaved when using progressive mode. This may resolve compatibility
issues raised in #29 and #48.
This option is available through -onedcscan in cjpeg
Optimizes quantization matrix by minimizing reconstruction error based
on quantized coefficients.
Feature is controlled by cinfo->trellis_q_opt; disabled by default.
The invocation of this function is wrapped in an ifdef JPEGLIB >70
but the definition of the function wasn't. This change adds the
ifdef around the function definition.
the throw macros set retval, but not all functions use/return this
value. This change, mainly to clean up compiler warnings, makes
those functions return a value.
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1258 632fc199-4ca6-4c93-a231-07263d6284db
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1257 632fc199-4ca6-4c93-a231-07263d6284db
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1253 632fc199-4ca6-4c93-a231-07263d6284db
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1251 632fc199-4ca6-4c93-a231-07263d6284db
This macro defined two local variables, cinfo and dinfo, but both
aren't always used. Add a (void)cinfo; and (void)dinfo in there
to hush up the compiler.
Derivation of sign value relies on shift right operator >> being an
arithmetic shift. It is thus not strictly portable since the C standard
defines the result of x >> y as "implementation-defined" when x is a
signed integer with a negative value.
Multiple trellis iterations may improve coding performance as Huffman
tables are updated with each iteration. In practice the benefit appears
to be very minimal
Trellis quantization is modified:
- to work on the configurable spectral range Ss to Se
- to optionally optimize runs of EOBs
- to optionally split optimization between 2 spectral ranges
In trellis quantization passes Huffman table code optimization is
modified such as to generable a valid code length for each possible
symbol by resetting frequency counters to 1 instead of 0
Fixes issues #23#24#31
Note that the original jpgcrush script optimizes scans only for YCbCr
and grayscale color spaces. Scan optimization is thus disabled for RGB
and CMYK color spaces and behavior reverts to the fast mode of jpgcrush
which uses predefined scans
Different lambda values may be used for each frequency in DCT domain.
The weighting table is currently not configurable but can be
enabled/disabled with cinfo-> use_lambda_weight_tbl
-tune-psnr, -tune-ssim and -tune-hvs-psnr are added to cjpeg to control
the trellis quantization process and optimize output for PSNR, SSIM and
HVS-PSNR distortion metrics
A new pass type trellis_pass is added. It defines a pass where trellis
quantization is done in the quantize_trellis() function.
Trellis quantization can be enabled by setting use_moz_defaults to 2 or
by using the -trellis option in cjpeg
Note that trellis does currently not work with scan optimization. Scan
optimization is disabled when trellis is enabled.
First implementation of scan optimisation as done by jpgcrush. Many
parameters are currently hardcoded which should be changed.
Implementation is missing for monochrome.
Add the fast mode of jpgcrush where:
- Huffman table optimisation is enabled
- Progressive coding mode is enabled
- New default scans are defined for progressive coding
-- The Mac and Cygwin packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version} on Cygwin and /Library/Documentation/{package_name} on Mac.
-- Fixed a duplicate filename warning when generating RPMs with the default prefix of /opt/libjpeg-turbo.
-- Moved the TurboJPEG libraries out of the system directory on Windows and Mac. It is no longer necessary to put them there, since we are not trying to be backward compatible with TurboJPEG/IPP anymore.
-- Fixed an issue whereby building the "installer" target on Windows would not build the Java JAR file, thus causing an error if the JAR had not been previously built.
-- Building the "install" target on Windows will now install libjpeg-turbo into c:\libjpeg-turbo[-gcc][64] (the same directories used by the installers.) This can be overridden by setting CMAKE_INSTALL_PREFIX.
-- The Java classes on all platforms will now look for the JNI library in the directory under which the build/packaging system installs it.
-- The Mac and Cygwin packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version} on Cygwin and /Library/Documentation/{package_name} on Mac.
-- Fixed a duplicate filename warning when generating RPMs with the default prefix of /opt/libjpeg-turbo.
-- Moved the TurboJPEG libraries out of the system directory on Windows and Mac. It is no longer necessary to put them there, since we are not trying to be backward compatible with TurboJPEG/IPP anymore.
-- Fixed an issue whereby building the "installer" target on Windows would not build the Java JAR file, thus causing an error if the JAR had not been previously built.
-- Building the "install" target on Windows will now install libjpeg-turbo into c:\libjpeg-turbo[-gcc][64] (the same directories used by the installers.) This can be overridden by setting CMAKE_INSTALL_PREFIX.
-- The Java classes on all platforms will now look for the JNI library in the directory under which the build/packaging system installs it.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@946 632fc199-4ca6-4c93-a231-07263d6284db
If the default prefix (/opt/libjpeg-turbo) is used, then we now always install 32-bit libraries in /opt/libjpeg-turbo/lib32 and 64-bit libraries in /opt/libjpeg-turbo/lib64 instead of trying to conform to the Debian or Red Hat conventions. The RPM and DEB packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version}.
If the default prefix (/opt/libjpeg-turbo) is used, then we now always install 32-bit libraries in /opt/libjpeg-turbo/lib32 and 64-bit libraries in /opt/libjpeg-turbo/lib64 instead of trying to conform to the Debian or Red Hat conventions. The RPM and DEB packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version}.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@944 632fc199-4ca6-4c93-a231-07263d6284db
The SIMD glue code has gotten a bit #ifdef heavy so clean it up by having
one file for each possible SIMD arch. This also allows a simplification of
the x86_64 code as SSE/SSE2 is always known to exist on that arch.
Older versions of automake doesn't properly support no-recursive make.
Reimplement the build system by having a local Makefile.am in the
simd/ directory.
Studio build. A static jconfig.h has been re-added, but in a separate
directory, to avoid clash with jconfig.h generated by configure
script. Also, jconfig.h now includes the inline macro. jpeg.dsp has
been modified to search in the "win" subdir, to find jconfig.h.
This patch is in spirit similar to r121.
This repository is governed by Mozilla's code of conduct and etiquette guidelines.
For more details, please read the
[Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/).
## How to Report
For more information on how to report violations of the Community Participation Guidelines, please read our '[How to Report](https://www.mozilla.org/about/governance/policies/participation/reporting/)' page.
<!--
## Project Specific Etiquette
In some cases, there will be additional project etiquette i.e.: (https://bugzilla.mozilla.org/page.cgi?id=etiquette.html).
Mozilla JPEG Encoder Project [](https://ci.appveyor.com/project/kornel/mozjpeg-4ekrx)
==========
============================
libjpeg-turbo is a JPEG image codec that uses SIMD instructions (MMX, SSE2,
MozJPEG improves JPEG compression efficiency achieving higher visual quality and smaller file sizes at the same time. It is compatible with the JPEG standard, and the vast majority of the world's deployed JPEG decoders.
NEON, AltiVec) to accelerate baseline JPEG compression and decompression on
x86, x86-64, ARM, and PowerPC systems. On such systems, libjpeg-turbo is
generally 2-6x as fast as libjpeg, all else being equal. On other types of
systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by
virtue of its highly-optimized Huffman coding routines. In many cases, the
performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
libjpeg-turbo implements both the traditional libjpeg API as well as the less
MozJPEG is a patch for [libjpeg-turbo](https://github.com/libjpeg-turbo/libjpeg-turbo). **Please send pull requests to libjpeg-turbo** if the changes aren't specific to newly-added MozJPEG-only compression code. This project aims to keep differences with libjpeg-turbo minimal, so whenever possible, improvements and bug fixes should go there first.
powerful but more straightforward TurboJPEG API. libjpeg-turbo also features
colorspace extensions that allow it to compress from/decompress to 32-bit and
big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java
interface.
libjpeg-turbo was originally based on libjpeg/SIMD, an MMX-accelerated
MozJPEG is compatible with the libjpeg API and ABI. It is intended to be a drop-in replacement for libjpeg. MozJPEG is a strict superset of libjpeg-turbo's functionality. All MozJPEG's improvements can be disabled at run time, and in that case it behaves exactly like libjpeg-turbo.
derivative of libjpeg v6b developed by Miyasaka Masaru. The TigerVNC and
VirtualGL projects made numerous enhancements to the codec in 2009, and in
early 2010, libjpeg-turbo spun off into an independent project, with the goal
of making high-speed JPEG compression/decompression technology available to a
broader range of users and developers.
MozJPEG is meant to be used as a library in graphics programs and image processing tools. We include a demo `cjpeg` command-line tool, but it's not intended for serious use. We encourage authors of graphics programs to use libjpeg's [C API](libjpeg.txt) and link with MozJPEG library instead.
License
## Features
=======
libjpeg-turbo is covered by three compatible BSD-style open source licenses.
* Progressive encoding with "jpegrescan" optimization. It can be applied to any JPEG file (with `jpegtran`) to losslessly reduce file size.
Refer to [LICENSE.md](LICENSE.md) for a roll-up of license terms.
* Trellis quantization. When converting other formats to JPEG it maximizes quality/filesize ratio.
* Comes with new quantization table presets, e.g. tuned for high-resolution displays.
* Fully compatible with all web browsers.
* Can be seamlessly integrated into any program that uses the industry-standard libjpeg API. There's no need to write any MozJPEG-specific integration code.
Refer to [BUILDING.md](BUILDING.md) for complete instructions.
## Compiling
See [BUILDING](BUILDING.md). MozJPEG is built exactly the same way as libjpeg-turbo, so if you need additional help please consult [libjpeg-turbo documentation](https://libjpeg-turbo.org/).
Using libjpeg-turbo
===================
libjpeg-turbo includes two APIs that can be used to compress and decompress
JPEG images:
- **TurboJPEG API**<br>
This API provides an easy-to-use interface for compressing and decompressing
JPEG images in memory. It also provides some functionality that would not be
straightforward to achieve using the underlying libjpeg API, such as
generating planar YUV images and performing multiple simultaneous lossless
transforms on an image. The Java interface for libjpeg-turbo is written on
top of the TurboJPEG API.
- **libjpeg API**<br>
This is the de facto industry-standard API for compressing and decompressing
JPEG images. It is more difficult to use than the TurboJPEG API but also
more powerful. The libjpeg API implementation in libjpeg-turbo is both
API/ABI-compatible and mathematically compatible with libjpeg v6b. It can
also optionally be configured to be API/ABI-compatible with libjpeg v7 and v8
(see below.)
There is no significant performance advantage to either API when both are used
to perform similar operations.
Colorspace Extensions
---------------------
libjpeg-turbo includes extensions that allow JPEG images to be compressed
directly from (and decompressed directly to) buffers that use BGR, BGRX,
RGBX, XBGR, and XRGB pixel ordering. This is implemented with ten new
colorspace constants:
JCS_EXT_RGB /* red/green/blue */
JCS_EXT_RGBX /* red/green/blue/x */
JCS_EXT_BGR /* blue/green/red */
JCS_EXT_BGRX /* blue/green/red/x */
JCS_EXT_XBGR /* x/blue/green/red */
JCS_EXT_XRGB /* x/red/green/blue */
JCS_EXT_RGBA /* red/green/blue/alpha */
JCS_EXT_BGRA /* blue/green/red/alpha */
JCS_EXT_ABGR /* alpha/blue/green/red */
JCS_EXT_ARGB /* alpha/red/green/blue */
Setting `cinfo.in_color_space` (compression) or `cinfo.out_color_space`
(decompression) to one of these values will cause libjpeg-turbo to read the
red, green, and blue values from (or write them to) the appropriate position in
the pixel when compressing from/decompressing to an RGB buffer.
Your application can check for the existence of these extensions at compile
time with:
#ifdef JCS_EXTENSIONS
At run time, attempting to use these extensions with a libjpeg implementation
that does not support them will result in a "Bogus input colorspace" error.
Applications can trap this error in order to test whether run-time support is
available for the colorspace extensions.
When using the RGBX, BGRX, XBGR, and XRGB colorspaces during decompression, the
X byte is undefined, and in order to ensure the best performance, libjpeg-turbo
can set that byte to whatever value it wishes. If an application expects the X
byte to be used as an alpha channel, then it should specify `JCS_EXT_RGBA`,
`JCS_EXT_BGRA`, `JCS_EXT_ABGR`, or `JCS_EXT_ARGB`. When these colorspace
constants are used, the X byte is guaranteed to be 0xFF, which is interpreted
as opaque.
Your application can check for the existence of the alpha channel colorspace
extensions at compile time with:
#ifdef JCS_ALPHA_EXTENSIONS
[jcstest.c](jcstest.c), located in the libjpeg-turbo source tree, demonstrates
how to check for the existence of the colorspace extensions at compile time and
run time.
libjpeg v7 and v8 API/ABI Emulation
-----------------------------------
With libjpeg v7 and v8, new features were added that necessitated extending the
compression and decompression structures. Unfortunately, due to the exposed
nature of those structures, extending them also necessitated breaking backward
ABI compatibility with previous libjpeg releases. Thus, programs that were
built to use libjpeg v7 or v8 did not work with libjpeg-turbo, since it is
based on the libjpeg v6b code base. Although libjpeg v7 and v8 are not
as widely used as v6b, enough programs (including a few Linux distros) made
the switch that there was a demand to emulate the libjpeg v7 and v8 ABIs
in libjpeg-turbo. It should be noted, however, that this feature was added
primarily so that applications that had already been compiled to use libjpeg
v7+ could take advantage of accelerated baseline JPEG encoding/decoding
without recompiling. libjpeg-turbo does not claim to support all of the
libjpeg v7+ features, nor to produce identical output to libjpeg v7+ in all
cases (see below.)
By passing an argument of `--with-jpeg7` or `--with-jpeg8` to `configure`, or
an argument of `-DWITH_JPEG7=1` or `-DWITH_JPEG8=1` to `cmake`, you can build a
version of libjpeg-turbo that emulates the libjpeg v7 or v8 ABI, so that
programs that are built against libjpeg v7 or v8 can be run with libjpeg-turbo.
The following section describes which libjpeg v7+ features are supported and
which aren't.
### Support for libjpeg v7 and v8 Features
#### Fully supported
- **libjpeg: IDCT scaling extensions in decompressor**<br>
libjpeg-turbo supports IDCT scaling with scaling factors of 1/8, 1/4, 3/8,
"Directory containing Armv8 iOS or macOS build to include in universal binaries")
set(MACOS_APP_CERT_NAME""CACHESTRING
"Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG. Leave this blank to generate an unsigned DMG.")
set(MACOS_INST_CERT_NAME""CACHESTRING
"Name of the Developer ID Installer certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo installer package. Leave this blank to generate an unsigned package.")
[C compiler flags needed to include jni.h (default: -I/System/Library/Frameworks/JavaVM.framework/Headers on OS X, '-I/usr/java/include -I/usr/java/include/solaris' on Solaris, and '-I/usr/java/default/include -I/usr/java/default/include/linux' on Linux)])
AC_MSG_CHECKING([whether to build TurboJPEG Java wrapper])
AC_ARG_WITH([java],
AC_HELP_STRING([--with-java], [Build Java wrapper for the TurboJPEG library]))
if test "x$with_12bit" = "xyes" -o "x$with_turbojpeg" = "xno"; then
['tjtransform_90',['tjtransform',['../structtjtransform.html',1,'tjtransform'],['../group___turbo_j_p_e_g.html#ga504805ec0161f1b505397ca0118bf8fd',1,'tjtransform(): turbojpeg.h'],['../group___turbo_j_p_e_g.html#ga9cb8abf4cc91881e04a0329b2270be25',1,'tjTransform(tjhandle handle, const unsigned char *jpegBuf, unsigned long jpegSize, int n, unsigned char **dstBufs, unsigned long *dstSizes, tjtransform *transforms, int flags): turbojpeg.h']]],
['tjtransform',['tjtransform',['../structtjtransform.html',1,'tjtransform'],['../group___turbo_j_p_e_g.html#gad02cd42b69f193a0623a9c801788df3a',1,'tjTransform(tjhandle handle, const unsigned char *jpegBuf, unsigned long jpegSize, int n, unsigned char **dstBufs, unsigned long *dstSizes, tjtransform *transforms, int flags): turbojpeg.h'],['../group___turbo_j_p_e_g.html#gaa29f3189c41be12ec5dee7caec318a31',1,'tjtransform(): turbojpeg.h']]],
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.