The loop in jsimd_quantize_neon() is only executed twice and should be
unrolled for AArch64 targets. GCC does that by default, but Clang 11
and later versions available at the time of this writing do not. This
patch adds an unroll pragma when targetting AArch64 with Clang. We do
not use the unroll pragma for AArch32 targets, because it causes the
Clang-generated assembly code to exhaust the available Neon registers
(32 x 64-bit) and spill to the stack. (DRC: Referring to the discussion
in #570, this is likely due to compiler confusion that results in poor
register allocation. It is possible to eliminate the spillage and
reduce the instruction count by loading the data on a just-in-time
basis, thus explicitly interleaving compute and I/O, but the performance
implications of that are currently unknown.)
The effects of unrolling the quantization loop are:
1) elimination of the loop control flow overhead and
2) enabling the use of LDP/STP instructions that work from a single
base pointer, instead of using double the number of LDR/STR
instructions, each requiring an address calculation.
Closes#570
- Suppress a UBSan warning regarding storing a 64-bit value to a
non-64-bit-aligned address. That behavior is technically undefined
per the C spec but is supported in the context of the AArch64
architecture and compilers.
- Explicitly promote block_diff[i] to unsigned int prior to left
shifting it, in order to avoid a UBSan warning. This warning also
described behavior that is technically undefined per the C spec but is
supported in the context of the AArch64 architecture and compilers.
Changing the type cast order eliminated the warning without changing
the generated assembly code.
Closes#582
- Make better use of 128-bit vector registers, thus reducing the number
of Neon instructions required to construct the AC coefficient bitmap.
- Refactor the Neon computations of 'nbits' and 'diff' to use shorter
and higher-throughput instruction sequences.
DRC's notes:
This commit partially integrates #570. Arm reported a 1-4% speedup on
Cortex-A55 and Neoverse-N1 cores when using recent compilers but little
or no speedup with Clang 10. I observed no speedup with Clang 10 on my
Cortex-A53 and Cortex-A72 cores. Thus, referring to #582, the primary
purpose of this commit is to fix UBSan warnings regarding the shift
operations previously located at Line 253:
d640a45730/simd/arm/aarch64/jchuff-neon.c (L253)
The primary purpose of this is to encourage adoption of libjpeg-turbo in
downstream Windows projects that forbid the use of "deprecated"
functions. libjpeg-turbo's usage of those functions was not actually
unsafe, because:
- libjpeg-turbo always checks the return value of fopen() and ensures
that a NULL filename can never be passed to it.
- libjpeg-turbo always checks the return value of getenv() and never
passes a NULL argument to it.
- The sprintf() calls in format_message() (jerror.c) could never
overflow the destination string buffer or leave it unterminated as
long as the buffer was at least JMSG_LENGTH_MAX bytes in length, as
instructed. (Regardless, this commit replaces those calls with
snprintf() calls.)
- libjpeg-turbo never uses sscanf() to read strings or multi-byte
character arrays.
- Because of b7d6e84d6a, wrjpgcom
explicitly checks the bounds of the source and destination strings
before calling strcat() and strcpy().
- libjpeg-turbo always ensures that the destination string is
terminated when using strncpy().
(548490fe5e made this explicit.)
Regarding thread safety:
Technically speaking, getenv() is not thread-safe, because the returned
pointer may be invalidated if another thread sets the same environment
variable between the time that the first thread calls getenv() and the
time that that thread uses the return value. In practice, however, this
could only occur with libjpeg-turbo if:
(1) A multithreaded calling application used the deprecated and
undocumented TJFLAG_FORCEMMX/TJFLAG_FORCESSE/TJFLAG_FORCESSE2 flags in
the TurboJPEG API or set one of the corresponding environment variables
(which are only intended for testing purposes.) Since the TurboJPEG API
library only ever passed string constants to putenv(), the only inherent
risk (i.e. the only risk introduced by the library and not the calling
application) was that the SIMD extensions may have read an incorrect
value from one of the aforementioned environment variables.
or
(2) A multithreaded calling application modified the value of the
JPEGMEM environment variable in one thread while another thread was
reading the value of that environment variable (in the body of
jpeg_create_compress() or jpeg_create_decompress().) Given that the
libjpeg API provides a thread-safe way for applications to modify the
default memory limit without using the JPEGMEM environment variable,
direct modification of that environment variable by calling applications
is not supported.
Microsoft's implementation of getenv_s() does not claim to be
thread-safe either, so this commit uses getenv_s() solely to mollify
Visual Studio. New inline functions and macros (GETENV_S() and
PUTENV_S) wrap getenv_s()/_putenv_s() when building for Visual Studio
and getenv()/setenv() otherwise, but GETENV_S()/PUTENV_S() provide no
advantages over getenv()/setenv() other than parameter validation. They
are implemented solely for convenience.
Technically speaking, strerror() is not thread-safe, because the
returned pointer may be invalidated if another thread changes the locale
and/or calls strerror() between the time that the first thread calls
strerror() and the time that that thread uses the return value. In
practice, however, this could only occur with libjpeg-turbo if a
multithreaded calling application encountered a file I/O error in
tjLoadImage() or tjSaveImage(). Since both of those functions
immediately copy the string returned from strerror() into a thread-local
buffer, the risk is minimal, and the worst case would involve an
incorrect error string being reported to the calling application.
Regardless, this commit uses strerror_s() in the TurboJPEG API library
when building for Visual Studio. Note that strerror_r() could have been
used on Un*x systems, but it would have been necessary to handle both
the POSIX and GNU implementations of that function and perform
widespread compatibility testing. Such is left as an exercise for
another day.
Fixes#568
- Since the ERREXITS() and TRACEMSS() macros are never used internally
(they are a relic of the legacy memory managers that libjpeg
provided), the only risk was that an external program might have
invoked one of those macros with a string longer than 79 characters
(JMSG_STR_PARM_MAX - 1).
- TJBench never invokes the THROW_TJ() macro with a string longer than
199 (JMSG_LENGTH_MAX - 1) characters, so there was no risk. However,
it's a good idea to explicitly terminate the destination strings so
that anyone looking at the code can immediately tell that it is safe.
The h2v2 (4:2:0) merged upsampler uses a spare row buffer so that it can
upsample two rows at a time but return only one row to the application,
if necessary. merged_2v_upsample() copies from this spare row buffer
into the application-supplied output buffer, using the out_row_width
field in the my_merged_upsampler struct to determine how many samples to
copy. out_row_width is set in jinit_merged_upsampler(), which is called
within the body of jpeg_start_decompress(). Since jpeg_crop_scanline()
must be called after jpeg_start_decompress(), jpeg_crop_scanline() must
modify the value of out_row_width if the h2v2 merged upsampler will be
used. Otherwise, merged_2v_upsample() can overflow the output buffer if
the number of bytes between the current output buffer position and the
end of the buffer is less than the number of bytes required to represent
an uncropped scanline of the output image. All of the destination
managers used by djpeg allocate either a whole image buffer or a
scanline buffer based on the uncropped output image width, so this issue
is not reproducible using djpeg.
Fixes#574
libjpeg-turbo has never supported non-ANSI C compilers. Per the spec,
ANSI C compilers must have locale.h, stddef.h, stdlib.h, memset(),
memcpy(), unsigned char, and unsigned short. They must also handle
undefined structures.
If NEON_INTRINSICS=0, then run the GAS sanity check from libjpeg-turbo
2.0.x and force-enable NEON_INTRINSICS if the test fails. This fixes
the AArch32 build when using Clang 6.0 on Linux or the Clang toolchain
in the Android NDK r15*-r16*. It also prevents users from manually
disabling NEON_INTRINSICS if doing so would break the build (such as
with Xcode 5.)
- Use check_c_source_compiles() rather than check_symbol_exists() to
detect the presence of vld1_s16_x3(), vld1_u16_x2(), and
vld1q_u8_x4(). check_symbol_exists() is unreliable for detecting
intrinsics, and in practice, it did not detect the presence of the
aforementioned intrinsics in versions of GCC that support them.
- Set DEFAULT_NEON_INTRINSICS=0 for GCC < 12, even if the aforementioned
intrinsics are available. The AArch64 back end in GCC 10 and 11
supports the necessary intrinsics, but the GAS implementation is still
faster when using those compilers.
Fixes#547
- Use JERR_NOTIMPL ("Not implemented yet") rather than JERR_NOT_COMPILED
("Requested feature was omitted at compile time") to indicate that
arithmetic coding is incompatible with Huffman table optimization.
This is more consistent with other parts of the libjpeg API code.
JERR_NOT_COMPILED is typically used to indicate that a major feature
was not compiled in, whereas JERR_NOTIMPL is typically used to
indicate that two features were compiled in but are incompatible with
each other (such as, for instance, two-pass color quantization and
partial image decompression.)
- Change the text of JERR_NOTIMPL to "Requested features are
incompatible". This is a more accurate description of the situation.
"Not implemented yet" implies that it may be possible to support the
requested combination of features in the future, but that is not true
in most of the cases where JERR_NOTIMPL is used.
Fixes#567
(regression introduced by aa7459050d)
cjpeg sets cinfo.in_color_space to JCS_RGB as an "arbitrary guess."
Since tjLoadImage() never uses JCS_RGB, the PGM reader should treat
JCS_RGB the same as JCS_UNKNOWN.
Fixes#566
This fixes an oversight from the integration of the arithmetic entropy
codec from libjpeg (66f97e6820). I chose
to integrate the latest implementation available at the time, which was
from jpeg-8b. However, I naively replaced cinfo->lim_Se with
DCTSIZE2 - 1, not realizing that-- because of SmartScale-- jpeg-8b
contains additional code
(https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-8b/jdinput.c#L249-L334)
that guards against illegal values of cinfo->Se >= DCTSIZE2. Thus,
libjpeg-turbo's implementation of arithmetic decoding has never guarded
against such illegal values. This commit restores the relevant check
from the original jpeg-6b arithmetic entropy codec patch ("jpeg-ari",
1e247ac854).
Fixes#564
CIFuzz runs the project's fuzzers for a limited period of time any time
a commit is pushed or a PR is submitted. This is not intended to
replace OSS-Fuzz but rather to allow us to more quickly catch some
fuzzing failures, including fuzzer build regressions like the one
introduced in ecf021bc0d.
Closes#559
Recent FreeBSD/PowerPC compilers, such as Clang 11.0.x on FreeBSD 13, do
the equivalent of passing -maltivec to the compiler by default, so
run-time AltiVec detection is unnecessary. However, it becomes
necessary when using other compilers or when passing -mno-altivec to the
compiler.
Closes#552
- Don't check for exceptions immediately after invoking the
GetPrimitiveArrayCritical() method. That method does not throw
exceptions, and checking for them caused -Xcheck:jni to warn about
calling other JNI functions in the scope of
Get/ReleasePrimitiveArrayCritical().
- Check for exceptions immediately after invoking the
CallStaticObjectMethod() method in the PROP2ENV() macro.
- Don't use the Get/ReleasePrimitiveArrayCritical() methods for small
arrays. -Xcheck:jni didn't complain about that, but there is no
performance advantage to using those methods rather than the
Get*ArrayRegion() methods for small arrays, and using
Get*ArrayRegion() makes the code less error-prone.
- Don't release the source/destination planes arrays in the YUV methods
until after the corresponding C TurboJPEG functions have returned.
Referring to https://cmake.org/cmake/help/latest/policy/CMP0065.html,
CMake 3.3 and earlier automatically added compiler/linker flags such as
-rdynamic/-export-dynamic, which caused symbols to be exported from
executables. The primary purpose of this is to allow plugins loaded via
dlopen() to access symbols from the calling program. libjpeg-turbo
does not need this functionality, and enabling it needlessly increases
the size of the libjpeg-turbo executables.
Setting CMP0065 to NEW when using CMake 3.4 and later prevents CMake
from automatically adding the aforementioned compiler/linker flags
unless the ENABLE_EXPORTS property is set for a target (or the
CMAKE_ENABLE_EXPORTS variable is set, which causes ENABLE_EXPORTS to be
set for all targets.)
Closes#554
When building for 32-bit Arm platforms, test whether basic Neon
intrinsics will compile with the specified compiler and C flags. This
prevents the build system from enabling the Neon SIMD extensions when
targetting Armv6 and other legacy architectures that do not support Neon
instructions.
Regression introduced by bbd8089297.
(Checking whether gas-preprocessor.pl was needed for 32-bit Arm builds
had the effect of checking whether Neon instructions were supported.)
Fixes#553
We would have been perfectly happy for AppVeyor to delete all artifacts
other than those from the latest builds in each branch. Instead, they
chose to change the global artifact retention policy to 1 month. In
addition to being unworkable, that new policy uses more storage than the
policy we requested. Lose/lose, so we'll just deploy to S3 like we do
with other platforms.
This issue was introduced in 5557fd2217
due to an oversight, so it has existed in libjpeg-turbo since the
project's inception. However, the issue is effectively a non-issue.
Although #325 proposes allowing programs to override jpeg_get_*() and
jpeg_free_*() externally, there is currently no way to override those
functions without modifying the libjpeg-turbo source code.
libjpeg-turbo only includes the malloc()/free() memory manager from
libjpeg, and the implementation of jpeg_free_*() in that memory manager
ignores the size argument. libjpeg had several additional memory
managers for legacy systems (MS-DOS, System 7, etc.), but those memory
managers ignored the size argument to jpeg_free_*() as well. Thus, this
issue would have only potentially affected custom memory managers in
downstream libjpeg-turbo forks, and since no one has complained until
now, apparently those are rare.
Fixes#542
Attempting to losslessly transform certain malformed JPEG images can
cause the nbits table index in the Huffman encoder to exceed 32768, so
we need to pad the SSE2 implementation of that table to 65536 entries as
we do with the C implementation.
Regression introduced by 087c29e07fFixes#543
This is the same error that d147be83e9
suppressed in decode_mcu_slow(). The image that reproduces this error
in decode_mcu_fast() has been added to the libjpeg-turbo seed corpora.
Closes#537
Although sizeof(void *) == sizeof(size_t) for all architectures that are
currently supported by libjpeg-turbo, such is not guaranteed by the C
standard. Specifically, CHERI-enabled architectures (e.g. CHERI-RISC-V
or Arm's Morello) use capability pointers that are twice the size of
size_t (128 bits for Morello and RV64), so casting to size_t strips the
upper bits of the pointer (including the validity bit) and makes it
non-deferenceable, as indicated by the following compiler warning:
warning: cast from provenance-free integer type to pointer type will
give pointer that can not be dereferenced
[-Werror,-Wcheri-capability-misuse]
cvalue = values = (JCOEF *)PAD((size_t)values_unaligned, 16);
Ignoring this warning results in a run-time crash. Casting pointers to
uintptr_t, if it is available, avoids this problem, since uintptr_t is
defined as an unsigned integer type that can hold a pointer value.
Since C89 compatibility is still necessary in libjpeg-turbo, this commit
introduces a new typedef for pointer-to-integer casts that uses a
GNU-specific extension available in GCC 4.6+ and Clang 3.0+ and falls
back to using size_t if the extension is unavailable. The only other
options would require C99 or Clang-specific builtins.
Closes#538
'buffer' is both passed into the inline assembly code and modified by
it. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, 6.47.2.3.
With GCC 4, this commit does not change the generated assembly code at
all.
With GCC 8, this commit fixes an assembly error:
/tmp/{foo}.s: Assembler messages:
/tmp/{foo}.s:775: Error: registers may not be the same --
`str r9,[r9],#4'
I'm not sure why that error went unnoticed, since I definitely
benchmarked the previous commit with GCC 8. Anyhow, this commit changes
the generated assembly code slightly but does not alter performance.
With Clang 10, this commit changes the generated assembly code slightly
but does not alter performance.
Refer to #529
Referring to the C standard
(http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf,
J.2 Undefined behavior), the behavior of the compiler is undefined if
"conversion between two pointer types produces a result that is
incorrectly aligned." Thus, the behavior of this code
*((uint32_t *)buffer) = BUILTIN_BSWAP32(put_buffer);
in the AArch32 version of the FLUSH() macro is undefined unless 'buffer'
is 32-bit-aligned. Referring to
https://bugs.llvm.org/show_bug.cgi?id=50785, certain versions of Clang,
when generating Thumb (T32) instructions, miscompile that code into an
assembly instruction (stm) that requires the destination to be
32-bit-aligned. Since such alignment cannot be guaranteed within the
Huffman encoder, this reportedly led to crashes (SIGBUS: illegal
alignment) with AArch32/Thumb builds of libjpeg-turbo running on Android
devices, although thus far I have been unable to reproduce those crashes
with a plain Linux/Arm system.
The miscompilation is visible with the Compiler Explorer:
https://godbolt.org/z/rv1ccx1Pb
However, it goes away when removing the return statement from the
function. Thus, it seems that Clang's behavior in this regard is
somewhat variable, which may explain why the crashes are only
reproducible on certain platforms.
The suggested workaround is to use memcpy(), but whereas Clang and
recent GCC releases are smart enough to compile a 4-byte memcpy() call
into a str instruction, GCC < 6 is not. Referring to
https://godbolt.org/z/ae7Wje3P6, the only way to consistently produce
the desired str instruction across all supported compilers is to use
inline assembly. Visual C++ presumably does not miscompile the code in
question, since no issues have been reported with it, but since the code
relies on undefined compiler behavior, prudence dictates that
e4ec23d7ae should be reverted for Visual
C++, which this commit does. The performance impact of
e4ec23d7ae for Visual C++/Arm builds is
unknown (I have no ability to test such builds), but regardless, this
commit reverts the Visual C++/Arm performance to that of libjpeg-turbo
2.1 beta1.
Closes#529
This resolves a conflict between the RPM generated by the libjpeg-turbo
build system and the Red Hat 'filesystem' RPM if
CMAKE_INSTALL_LIBDIR=/usr/lib[64]. This code was largely borrowed from
the VirtualGL RPM spec. (I can legally do that because I hold the
copyright on VirtualGL's implementation.)
Fixes#532
Arm compilers have three floating point ABI options:
'soft' compiles floating point operations as function calls into a
software floating point library, which emulates floating point
operations using integer operations. Floating point function arguments
are passed using integer registers.
'softfp' also compiles floating point operations as function calls into
a floating point library and passes floating point function arguments
using integer registers, but the floating point library functions can
use FPU instructions if the CPU supports them.
'hard' compiles floating point operations into inline FPU instructions,
similarly to x86 and other architectures, and passes floating point
function arguments using FPU registers.
Not all AArch32 CPUs have FPUs or support Neon instructions, so on Linux
and Android platforms, the AArch32 SIMD dispatcher in libjpeg-turbo only
enables the Neon SIMD extensions at run time if /proc/cpuinfo indicates
that the CPU supports Neon instructions or if Neon instructions are
explicitly enabled (e.g. by passing -mfpu=neon to the compiler.) In
order to support all AArch32 CPUs using the same code base, i.e. to
support run-time FPU and Neon auto-detection, it is necessary to compile
the scalar C source code using -mfloat-abi=soft. However, the 'soft'
floating point ABI cannot be used when compiling Neon intrinsics, so the
intrinsics implementation of the Neon SIMD extensions must be compiled
using -mfloat-abi=softfp if the scalar C source code is compiled using
-mfloat-abi=soft.
This commit modifies the build system so that it detects whether
-mfloat-abi=softfp must be explicitly added to the compiler flags when
building the intrinsics implementation of the Neon SIMD extensions.
This will be necessary if the build is using the 'soft' floating
point ABI along with run-time auto-detection of Neon instructions.
Fixes#523
The existing /*FALLTHROUGH*/ comments work with GCC but not Clang, so
this commit adds a FALLTHROUGH macro that uses the 'fallthrough'
attribute if the compiler supports it.
Refer to https://bugs.chromium.org/p/chromium/issues/detail?id=995993
NOTE: All versions of GCC that support -Wimplicit-fallthrough also
support the 'fallthrough' attribute, but certain other compilers (Oracle
Solaris Studio, for instance) support /*FALLTHROUGH*/ but not the
'fallthrough' attribute. Thus, this commit retains the /*FALLTHROUGH*/
comments, which have existed in the libjpeg code base in some form since
1994 (libjpeg v5.)
Closes#531
Referring to #527, the security community did not assign this CVE ID
until more than 8 months after the fix for the issue was released. By
the time they assigned the ID, libjpeg-turbo already had two production
releases containing the fix. This calls into question the usefulness of
assigning a CVE ID to the issue, particularly given that the buffer
overrun in question was fully contained in the stack, not detectable
with valgrind, and confined to lossless transformation (it did not
affect JPEG compression or decompression.)
https://vuldb.com/?id.176175
says that "the exploitability is told to be easy" but provides no
clarification, and given that the author of that page does not seem to
be aware that a fix for the issue has been available since early
December of 2019, it calls into question the accuracy of everything else
on the page.
It would really be nice if the security community approached me about
these things before wasting my time, but I guess it's my lot in life to
modify a change log entry from 2019 to include a CVE ID from 2020.
So it goes...
In theory, all objects that will be included in a Un*x shared library
must be built using PIC. In practice, most compilers don't require PIC
to be explicitly specified for jsimd_none.o, either because the compiler
automatically enables PIC in all cases (Ubuntu) or because the size of
the generated object is too small. But some rare compilers do require
PIC to be explicitly specified for jsimd_none.o.
Fixes#520
Our workflow script does not currently work with tags, and there is no
point to building tags anyhow, since we do not use the CI system to spin
official builds.
When using the in-memory destination manager, it is necessary to
explicitly call the destination manager's term_destination() method if
an error occurs. That method is called by jpeg_finish_compress() but
not by jpeg_abort_compress().
This fixes a potential double free() that could occur if tjCompress*()
or tjTransform() returned an error and the calling application tried to
clean up a JPEG buffer that was dynamically re-allocated by one of those
functions.
- UBSan complained that entropy->restarts_to_go was underflowing an
unsigned integer when it was decremented while
cinfo->restart_interval == 0. That was, of course, completely
innocuous behavior, since the result of the underflowing computation
was never used.
- d3a3a73f64 and
7bc9fca430 silenced a UBSan signed
integer overflow error, but unfortunately other malformed JPEG images
have been discovered that cause unsigned integer overflow in the same
computation. Since, to the best of our understanding, this behavior
is innocuous, this commit reverts the commits listed above, suppresses
the UBSan errors, and adds code comments to document the issue.
Referring to https://bugzilla.mozilla.org/show_bug.cgi?id=1050342,
there are certain very rare circumstances under which a malformed JPEG
image can cause different Huffman decoder output to be produced,
depending on the size of the source manager's I/O buffer. (More
specifically, the fast Huffman decoder didn't handle invalid codes in
the same manner as the slow decoder, and since the fast decoder requires
all data to be memory-resident, the buffering strategy determines
whether or not the fast decoder can be used on a particular MCU block.)
After extensive experimentation, the Mozilla and Chrome developers and I
determined that this truly was an innocuous issue. The patch that both
browsers adopted as a workaround caused a performance regression with
32-bit code, which is why it was not accepted into libjpeg-turbo. This
commit fixes the problem in a less disruptive way with no performance
regression.
After the completion of the start_input() method, it's too late to check
the image size, because the image readers may have already tried to
allocate memory for the image. If the width and height are excessively
large, then attempting to allocate memory for the image could slow
performance or lead to out-of-memory errors prior to the fuzz target
checking the image size.
NOTE: Specifically, the aforementioned OOM errors and slow units were
observed with the compression fuzz targets when using MSan.
Certain rare malformed input images can cause the Huffman encoder to
generate a value for nbits that corresponds to an uninitialized member
of the DC code table. The ramifications of this are minimal and would
basically amount to a different bogus JPEG image being generated from a
particular bogus input image.
- Referring to 3311fc0001, we need to use
unsigned intermediate math in order to make UBSan happy, even though
(JDIMENSION)(A * B) is effectively the same as
(JDIMENSION)A *(JDIMENSION)B, regardless of intermediate overflow.
- Because of the previous commit, it is now possible for bfOffBits to be
INT_MIN, which would cause the initial computation of bPad to
underflow a signed integer. Thus, we need to check for that
possibility as soon as we know the values of bfOffBits and headerSize.
The worst case from this regression is that bPad could wrap around to
a large positive value, which would cause a "Premature end of input
file" error in the subsequent read_byte() loop. Thus, this issue was
effectively innocuous as well, since it resulted in catching the same
error later and in a different way. Also, the issue was very
well-contained, since it was both introduced and fixed as part of the
ongoing OSS-Fuzz integration project.
- rdbmp.c: Because of 8fb37b8171,
bfOffBits, biClrUsed, and headerSize were made into unsigned ints.
Thus, if bPad would eventually be negative due to a malformed header,
UBSan complained about unsigned math being used in the intermediate
computations. It was unnecessary to make those variables unsigned,
since they are only meant to hold small values, so this commit makes
them signed again. The UBSan error was innocuous, because it is
effectively (if not officially) the case that
(int)((unsigned int)a - (unsigned int)b) == (int)a - (int)b.
- rdbmp.c: If (biWidth * source->bits_per_pixel / 8) would overflow an
unsigned int, then UBSan complained at the point at which row_width
was set in start_input_bmp(), even though the overflow would have been
detected later in the function. This commit adds overflow checks
prior to setting row_width.
- rdppm.c: read_pbm_integer() now bounds-checks the intermediate
value computations in order to catch integer overflow caused by a
malformed text PPM. It's possible, though extremely unlikely, that
the intermediate value computations could have wrapped around to a
value smaller than maxval, but the worst case is that this would have
generated a bogus pixel in the uncompressed image rather than throwing
an error.
A fuzzing test case that was effectively a 1-pixel PGM file with a
maximum value of 1 and an actual value of 8 caused an uninitialized
member of the rescale[] array to be accessed in get_gray_rgb_row() or
get_gray_cmyk_row(). Since, for performance reasons, those functions do
not perform bounds checking on the PPM values, we need to ensure that
unused members of the rescale[] array are initialized.
A fuzzing test case with an image width of 838860946 triggered a UBSan
error:
rdbmp.c:633:34: runtime error: signed integer overflow:
838860946 * 3 cannot be represented in type 'int'
Because the result is cast to an unsigned int (JDIMENSION), this error
is irrelevant, because
(unsigned int)((int)838860946 * (int)3) ==
(unsigned int)838860946 * (unsigned int)3
This limits the tjLoadImage() behavioral changes to the scope of the
compress_fuzzer target. Otherwise, TJBench in fuzzer builds would
refuse to load images larger than 1 Mpixel.
- The PPM reader now throws an error rather than segfaulting (due to a
buffer overrun) if an application attempts to load a 16-bit PPM file
into a grayscale uncompressed image buffer. No known applications
allowed that (not even the test applications in libjpeg-turbo),
because that mode of operation was never expected to work and did not
work under any circumstances. (In fact, it was necessary to modify
TJBench in order to reproduce the issue outside of a fuzzing
environment.) This was purely a matter of making the library bow out
gracefully rather than crash if an application tries to do something
really stupid.
- The PPM reader now throws an error rather than generating incorrect
pixels if an application attempts to load a 16-bit PGM file into an
RGB uncompressed image buffer.
- The PPM reader now correctly loads 16-bit PPM files into extended
RGB uncompressed image buffers. (Previously it generated incorrect
pixels unless the input colorspace was JCS_RGB or JCS_EXT_RGB.)
The only way that users could have potentially encountered these issues
was through the tjLoadImage() function. cjpeg and TJBench were
unaffected.
The non-default options were not being tested because of a pixel format
comparison buglet. This commit also changes the code in both
decompression fuzz targets such that non-default options are tested
based on the pixel format index rather than the pixel format value,
which is a bit more idiot-proof.
Otherwise, the targets will require libstdc++, the i386 version of which
is not available in the OSS-Fuzz runtime environment. The OSS-Fuzz
build environment passes -stdlib:libc++ in the CXXFLAGS environment
variable in order to mitigate this issue, since the runtime environment
has the i386 version of libc++, but using that compiler flag requires
using the C++ compiler.
This commit integrates OSS-Fuzz targets directly into the libjpeg-turbo
source tree, thus obsoleting and improving code coverage relative to
Google's OSS-Fuzz target for libjpeg-turbo (previously available here:
https://github.com/google/oss-fuzz).
I hope to eventually create fuzz targets for the BMP, GIF, and PPM
readers as well, which would allow for fuzz-testing compression, but
since those readers all require an input file, it is unclear how to
build an efficient fuzzer around them. It doesn't make sense to
fuzz-test compression in isolation, because compression can't accept
arbitrary input data.
Define compiler-independent byte-swap macros and use them instead of
executing 'rev' via inline assembly code with GCC-compatible compilers
or a slow shift-store sequence with Visual C++.
* This produces identical assembly code with:
- 64-bit GCC 8.4.0 (Linux)
- 64-bit GCC 9.3.0 (Linux)
- 64-bit Clang 10.0.0 (Linux)
- 64-bit Clang 10.0.0 (MinGW)
- 64-bit Clang 12.0.0 (Xcode 12.2, macOS)
- 64-bit Clang 12.0.0 (Xcode 12.2, iOS)
* This produces different assembly code with:
- 64-bit GCC 4.9.1 (Linux)
- 32-bit GCC 4.8.2 (Linux)
- 32-bit GCC 8.4.0 (Linux)
- 32-bit GCC 9.3.0 (Linux)
Since the intrinsics implementation of Huffman encoding is not used
by default with these compilers, this is not a concern.
- 32-bit Clang 10.0.0 (Linux)
Verified performance neutrality
Closes#507
(regression introduced by 16bd984557)
This implements the same fix for
jsimd_encode_mcu_AC_refine_prepare_sse2() that
a81a8c137b implemented for
jsimd_encode_mcu_AC_first_prepare_sse2().
Based on:
1a59587397eb176a91d8Fixes#509Closes#510
Referring to https://bugzilla.redhat.com/show_bug.cgi?id=1937385#c2,
it is my opinion that the severity of this bug was grossly overstated
and that a CVE never should have been assigned to it, but since one was
assigned, users need to know which version of libjpeg-turbo contains
the fix.
Dear security community, please learn what "DoS" actually means and stop
misusing that term for dramatic effect. Thanks.
* commit '8a2cad020171184a49fa8696df0b9e267f1cf2f6': (99 commits)
Build: Handle CMAKE_OSX_ARCHITECTURES=(i386|ppc)
Add Sponsor button for GitHub repository
Build: Support CMAKE_OSX_ARCHITECTURES
cjpeg: Fix FPE when compressing 0-width GIF
Fix build with Visual C++ and /std:c11 or /std:c17
Neon: Fix Huffman enc. error w/Visual Studio+Clang
Use CLZ compiler intrinsic for Windows/Arm builds
Build: Use correct SIMD exts w/VStudio IDE + Arm64
jcphuff.c: Fix compiler warning with clang-cl
Migrate from Travis CI to GitHub Actions
tjexample.c: Fix mem leak if tjTransform() fails
Build: Officially support Ninja
decompress_smooth_data(): Fix another uninit. read
LICENSE.md: Remove trailing whitespace
Build: Test for correct AArch32 RPM/DEBARCH value
LICENSE.md: Formatting tweak
Fix uninitialized read in decompress_smooth_data()
Fix buffer overrun with certain narrow prog JPEGs
Bump revision to 2.0.91 for post-beta fixes
Travis: Use Docker tag that matches Git branch
...
We don't officially support i386 or PowerPC Mac builds of libjpeg-turbo
anymore, but they still work (bearing in mind that PowerPC builds
require GCC v4.0 in Xcode 3.2.6, and i386 builds require Xcode 9.x or
earlier.) Referring to #495, apparently MacPorts needs this
functionality.
The GNU builtin function __builtin_clzl() accepts an unsigned long
argument, which is 8 bytes wide on LP64 systems (most Un*x systems,
including Mac) but 4 bytes wide on LLP64 systems (Windows.) This caused
the Neon intrinsics implementation of Huffman encoding to produce
mathematically incorrect results when compiled using Visual Studio with
Clang.
This commit changes all invocations of __builtin_clzl() in the Neon SIMD
extensions to __builtin_clzll(), which accepts an unsigned long long
argument that is guaranteed to be 8 bytes wide on all systems.
Fixes#480Closes#490
The __builtin_clz() compiler intrinsic was already used in the C Huffman
encoders when building libjpeg-turbo for Arm CPUs using a GCC-compatible
compiler. This commit modifies the C Huffman encoders so that they also
use__builtin_clz() when building for Arm CPUs using Visual Studio +
Clang, as well as the equivalent _CountLeadingZeros() compiler intrinsic
when building for Arm CPUs using Visual C++.
In addition to making the C Huffman encoders faster on Windows/Arm, this
also prevents jpeg_nbits_table from being included in Windows/Arm builds,
thus saving 128 KB of memory.
When configuring a Visual Studio IDE build and passing -A arm64 to
CMake, CMAKE_SYSTEM_PROCESSOR will be amd64, so we should set CPU_TYPE
based on the value of CMAKE_GENERATOR_PLATFORM rather than the value of
CMAKE_SYSTEM_PROCESSOR.
Note that this removes our ability to regression test the Armv8 and
PowerPC SIMD extensions, effectively reverting
a524b9b06b and
02227e48a9, but at the moment, there is no
other way.
Regression introduced by 42825b68d5
The test case
https://user-images.githubusercontent.com/3491627/101376530-fde56180-38b0-11eb-938d-734119a5b5ba.jpg
is a malformed progressive JPEG image containing an interleaved Y/Cb/Cr
DC scan followed by two non-interleaved Y DC scans. Thus, the
prev_coef_bits[] array was initialized for the Y component but not the
other components, the uninitialized values for Cb and Cr were
transferred to the prev_coef_bits_latch[] array in smoothing_ok(), and
because cinfo->master->last_good_iMCU_row was 0,
decompress_smooth_data() read those uninitialized values when attempting
to smooth the second iMCU row.
Possibly fixes#478
Regression introduced by 6d91e950c8
last_block_column in decompress_smooth_data() can be 0 if, for instance,
decompressing a 4:4:4 image of width 8 or less or a 4:2:2 or 4:2:0 image
of width 16 or less. Since last_block_column is an unsigned int,
subtracting 1 from it produced 0xFFFFFFFF, the test in line 590 passed,
and we attempted to access blocks from a second block column that didn't
actually exist.
Closes#476
- Use the _M_ARM and _M_ARM64 macros provided by Visual Studio for
compile-time detection of Arm builds, since __arm__ and __aarch64__
are only present in GNU-compatible compilers.
- Neon/intrinsics: Use the _CountLeadingZeros() and
_CountLeadingZeros64() intrinsics provided by Visual Studio, since
__builtin_clz() and __builtin_clzl() are only present in
GNU-compatible compilers.
- Neon/intrinsics: Since Visual Studio does not support static vector
initialization, replace static initialization of Neon vectors with the
appropriate intrinsics. Compared to the static initialization
approach, this produces identical assembly code with both GCC and
Clang.
- Neon/intrinsics: Since Visual Studio does not support inline assembly
code, provide alternative code paths for Visual Studio whenever inline
assembly is used.
- Build: Set FLOATTEST appropriately for AArch64 Visual Studio builds
(Visual Studio does not emit fused multiply-add [FMA] instructions by
default for such builds.)
- Neon/intrinsics: Move temporary buffer allocation outside of nested
loops. Since Visual Studio configures Arm builds with a relatively
small amount of stack memory, attempting to allocate those buffers
within the inner loops caused a stack overflow.
Closes#461Closes#475
Otherwise, because the file begins with an ASCII header, Git will
erroneously treat is as an ASCII file, and if Git for Windows is
configured with default options (specifically, "Checkout windows-style,
commit Unix-style line endings"), it will add carriage return characters
to all of the "linefeed" characters in the PPM file, thus corrupting it
and causing libjpeg-turbo's regression tests to fail.
There doesn't seem to be any performance or compatibility downside to
this, and it has the advantages of simplicity and consistency between
the PR and official builds.
- Rename IOS_ARMV8_BUILD to ARMV8_BUILD.
- Rename install_ios() to install_subbuild() in makemacpkg.
- Wordsmith the build instructions accordingly.
- Use xcode12.2 image in Travis CI.
This error occurs at the call to (*cinfo->cconvert->color_convert)() in
sep_upsample() whenever cinfo->upsample->need_context_rows == TRUE
(i.e. whenever h2v2 or h1v2 fancy upsampling is used.) The error is
innocuous, since (*cinfo->cconvert->color_convert)() points to a dummy
function (noop_convert()) in that case.
Fixes#470
The "32bit" vs. "64bit" floating point test results actually have
nothing to do with the FPU. That was a fallacious assumption based on
the observation that, with multiple CPU types, 32-bit and 64-bit builds
produce different floating point test results. It seems that this is,
in fact, due to differing compiler behavior-- more specifically, whether
fused multiply-add (FMA) instructions are used to combine multiple
floating point operations into a single instruction ("floating point
expression contraction".) GCC does this by default if the target
supports FMA instructions, which PowerPC and AArch64 targets both do.
Fixes#468
This allows the Neon intrinsics code to be built successfully (albeit
likely with reduced run-time performance) with Xcode 5.0-6.2
(iOS/AArch64) and Android NDK < r19 (AArch32). Note that Xcode 5.0-6.2
will not build the Armv8 GAS code without gas-preprocessor.pl, and no
version of Xcode will build the Armv7 GAS code without
gas-preprocessor.pl, so we always use the full Neon intrinsics
implementation by default with macOS and iOS builds.
Auto-detecting the completeness of the compiler's set of Neon intrinsics
also allows us to more intelligently set the default value of
NEON_INTRINSICS, based on the values of HAVE_VLD1*. This is a
reasonable, albeit imperfect, proxy for whether a compiler has a full
and optimal set of Neon intrinsics. Specific notes:
- 64-bit RGB-to-YCbCr color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer forward DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit Huffman encoding
uses vld1q_u8_x4(), regresses with GCC
- 64-bit YCbCr-to-RGB color conversion
does not use any of the intrinsics in question, regresses with GCC
- 64-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 64-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
- 32-bit RGB-to-YCbCr color conversion:
uses vld1_u16_x2(), regresses with GCC
- 32-bit accurate integer forward DCT
uses vld1_s16_x3(), regression irrelevant because there was no
previous implementation
- 32-bit accurate integer inverse DCT
uses vld1_s16_x3(), regresses with GCC
- 32-bit fast integer inverse DCT
does not use any of the intrinsics in question, regresses with GCC
- 32-bit 4x4 inverse DCT
uses vld1_s16_x3(). I did not test this algorithm in isolation, so
it may in fact regress with GCC, but the regression may be hidden by
the speedup from the new SIMD-accelerated upsampling algorithms.
Presumably when GCC includes a full and optimal set of Neon intrinsics,
the HAVE_VLD1* tests will pass, and the full Neon intrinsics
implementation will be enabled automatically.
- Remove gas-preprocessor.pl. None of the compilers that can build the
new intrinsics implementation require gas-preprocessor.pl (tested
with Xcode and with Clang 3.9+ for Linux.)
- Document that Xcode 6.3.x or later is now required for iOS builds
(older versions of Xcode do not have a full set of Neon intrinsics.)
- Add a change log entry.
- Do not enable the ASM CMake language unless NEON_INTRINSICS is false.
- Add a Clang/Arm64 test to .travis.yml in order to test the new
intrinsics implementation.
Closes#455
There was no previous GAS implementation.
NOTE: This doesn't produce much of a speedup when using -O3, because -O3
already enables Neon autovectorization, which works well for the scalar
C implementation of plain upsampling. However, the Neon SIMD
implementation will benefit other optimization levels.
There was no previous GAS implementation.
This commit also reverts 40557b2301 and
7723d7f7d0.
7723d7f7d0 was only necessary because
there was no Neon implementation of merged upsampling/color conversion,
and 40557b2301 was only necessary because
of 7723d7f7d0.
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using the new
NEON_INTRINSICS CMake variable.
The previous AArch32 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch64 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch32 GAS implementation of h2v1 fancy upsampling has
been removed, since the intrinsics implementation provides the same or
better performance. There was no previous GAS implementation of h2v2
fancy upsampling, and there was no previous AArch64 GAS implementation
of h2v1 fancy upsampling.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. The previous AArch32 GAS implementation has been
removed, since the intrinsics implementation provides the same or better
performance.
The previous AArch64 GAS implementation is retained by default when
using GCC, in order to avoid a performance regression. The intrinsics
implementation can be forced on or off using the new NEON_INTRINSICS
CMake variable. There was no previous AArch32 GAS implementation.
The previous AArch64 GAS implementation has been removed, since the
intrinsics implementation provides the same or better performance.
There was no previous AArch32 GAS implementation.
The previous AArch32 and AArch64 GAS implementations are retained by
default when using GCC, in order to avoid a performance regression. The
intrinsics implementation can be forced on or off using a new
NEON_INTRINSICS CMake variable.
Regression caused by
a46c111d9f
Because of 7723d7f7d0, which was
introduced in libjpeg-turbo 1.5.1 in response to #81, merged upsampling/
color conversion is disabled on platforms that have SIMD-accelerated
YCbCr -> RGB color conversion but not SIMD-accelerated merged
upsampling/color conversion. This was intended to improve performance
with the Neon SIMD extensions, since those are the only SIMD extensions
for which those circumstances apply. Under normal circumstances, the
separate "plain" (non-fancy) upsampling and color conversion routines
will produce bitwise-identical output to the merged upsampling/color
conversion routines, but that is not the case when skipping scanlines
starting at an odd-numbered scanline. The modified test introduced in
a46c111d9f does precisely that in order to
validate the fixes introduced in
9120a24743 and
a46c111d9f.
Because of 7723d7f7d0, the segfault fixed
in 9120a24743 and
a46c111d9f didn't affect the Neon SIMD
extensions, so this commit effectively reverts the test modifications in
a46c111d9f when using those SIMD
extensions. We can get rid of this hack, as well as
7723d7f7d0, once a Neon implementation of
merged upsampling/color conversion is available.
- Refer to the "slow" [I]DCT algorithms as "accurate" instead, since
they are not slow under libjpeg-turbo.
- Adjust documentation claims to reflect the fact that the "slow" and
"fast" algorithms produce about the same performance on AVX2-equipped
CPUs (because of the dual-lane nature of AVX2, it was not possible to
accelerate the "fast" algorithm beyond what was achievable with SSE2.)
Also adjust the claims to reflect the fact that the "fast" algorithm
tends to be ~5-15% faster than the "slow" algorithm on
non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo
SIMD extensions.
- Indicate the legacy status of the "fast" and float algorithms in the
documentation and cjpeg/djpeg usage info.
- Remove obsolete paragraph in the djpeg man page that suggested that
the float algorithm could be faster than the "fast" algorithm on some
CPUs.
It is our convention to use the term "region" when referring to crop
specs, since this is more consistent with the terminology used by the
rest of the image processing community.
The checkstyle script was hastily developed prior to libjpeg-turbo 2.0
beta1, so it has a lot of exceptions and is thus prone to false
negatives. This commit eliminates some of those exceptions.
- Restore GIF read/compressed GIF write support from jpeg-6a and
jpeg-9d.
- Integrate jpegtran -wipe and -drop options from jpeg-9a and jpeg-9d.
- Integrate jpegtran -crop extension (for expanding the image size) from
jpeg-9a and jpeg-9d.
- Integrate other minor code tweaks from jpeg-9*
- Set CPU_TYPE=arm if performing a 32-bit build on an AArch64 system.
This eliminates the need to use a CMake toolchain file.
- Set RPMARCH=armv7hl if building on a 32-bit Arm system with an FPU.
- Set RPMARCH=armv7hl and DEBARCH=armhf if performing a 32-bit build
using a gnueabihf toolchain.
- If performing a 32-bit Arm build, generate a 32-bit supplementary DEB
package for AArch64 systems.
This commit modifies decompress_smooth_data(), adding missing MCU column
offsets to the prev_block_row and next_block_row indices that are used
for block rows other than the first and last. Effectively, this
eliminates unexpected visual artifacts when using jpeg_crop_scanline()
along with interblock smoothing while decompressing the DC scan of a
progressive JPEG image.
Based on:
0227d4fb48Fixes#456Closes#457
* tag '2.0.5':
TurboJPEG: Make global error handling thread-safe
ChangeLog.md: Add missing sub-header for 2.0.5
ChangeLog.md: List CVE ID fixed by previous commit
rdppm.c: Fix buf overrun caused by bad binary PPM
Build: Add missing jpegtran-icc test dependency
rdswitch.c: Eliminate spaces before semicolons
TJCompressor.compress(int): Fix YUV-to-JPEG error
Bump version to 2.0.5; Document previous commit
MIPS DSPr2: Work around various 'make test' errors
MIPS DSPr2: Fix compiler warning with -mdspr2
MIPS SIMD: Always honor JSIMD_FORCE* env vars
Test: Honor CMAKE_CROSSCOMPILING_EMULATOR variable
- Introduce a partial image decompression regression test script that
validates the correctness of jpeg_skip_scanlines() and
jpeg_crop_scanlines() for a variety of cropping regions and libjpeg
settings.
This regression test catches the following issues:
#182, fixed in 5bc43c7821#237, fixed in 6e95c08649794f5018608f37250026a45ead2db8
#244, fixed in 398c1e9acc#441, fully fixed in this commit
It does not catch the following issues:
#194, fixed in 773040f9d9#244 (additional segfault), fixed in
9120a24743
- Modify the libjpeg-turbo regression test suite (make test) so that it
checks for the issue reported in #441 (segfault in
jpeg_skip_scanlines() when used with 4:2:0 merged upsampling/color
conversion.)
- Fix issues in jpeg_skip_scanlines() that caused incorrect output with
h2v2 (4:2:0) merged upsampling/color conversion. The previous commit
fixed the segfault reported in #441, but that was a symptom of a
larger problem. Because merged 4:2:0 upsampling uses a "spare row"
buffer, it is necessary to allow the upsampler to run when skipping
rows (fancy 4:2:0 upsampling, which uses context rows, also requires
this.) Otherwise, if skipping starts at an odd-numbered row, the
output image will be incorrect.
- Throw an error if jpeg_skip_scanlines() is called with two-pass color
quantization enabled. With two-pass color quantization, the first
pass occurs within jpeg_start_decompress(), so subsequent calls to
jpeg_skip_scanlines() interfere with the multipass state and prevent
the second pass from occurring during subsequent calls to
jpeg_read_scanlines().
The additional segfault mentioned in #244 was due to the fact that
the merged upsamplers use a different private structure than the
non-merged upsamplers. jpeg_skip_scanlines() was assuming the latter, so
when merged upsampling was enabled, jpeg_skip_scanlines() clobbered one
of the IDCT method pointers in the merged upsampler's private structure.
For reasons unknown, the test image in #441 did not encounter this
segfault (too small?), but it encountered an issue similar to the one
fixed in 5bc43c7821, whereby it was
necessary to set up a dummy postprocessing function in
read_and_discard_scanlines() when merged upsampling was enabled.
Failing to do so caused either a segfault in merged_2v_upsample() (due
to a NULL pointer being passed to jcopy_sample_rows()) or an error
("Corrupt JPEG data: premature end of data segment"), depending on the
number of scanlines skipped and whether the first scanline skipped was
an odd- or even-numbered row.
Fixes#441Fixes#244 (for real this time)
IIRC, this was only necessary with the version of Java 1.5 that shipped
with OS X 10.4 "Tiger". Apple's implementation of Java 6 ("Java for
OS X Systems") supported both .jnilib and .dylib extensions for JNI
libraries, but Oracle's implementation of Java has only ever supported
the .dylib extension.
The scales have now tilted overwhelmingly in favor of eliminating
support for 32-bit Macs:
- 32-bit applications are only necessary in order to support OS X 10.5
"Leopard" and OS X 10.6 "Snow Leopard". OS X 10.7 "Lion" requires a
64-bit Mac and supports all 64-bit Macs.
- 32-bit applications are no longer allowed in the macOS App Store.
- 32-bit applications no longer run in macOS 10.15 "Catalina".
- 32-bit applications do not support thread-local storage, so the
TurboJPEG API library's global error handler is not thread-safe with
such applications.
- libjpeg-turbo 2.1.x no longer supports 32-bit iOS apps, so it makes
sense to also eliminate support for 32-bit macOS applications.
It's time.
We haven't provided official Cygwin builds since 1.4.x, since Cygwin
now supplies its own libjpeg-turbo packages (although they apparently
haven't been updated past 1.5.3.)
This extends the fix in 1e81b0c3ea to
include binary PPM files with maximum values < 255, thus preventing a
malformed binary PPM input file with those specifications from
triggering an overrun of the rescale array and potentially crashing
cjpeg, TJBench, or any program that uses the tjLoadImage() function.
Fixes#433
Referring to #408, this commit #ifdefs DSPr2 SIMD functions that only
work on little endian processors, and it completely excludes
jsimd_h2v1_downsample_dspr2() and jsimd_h2v2_downsample_dspr2(). The
latter two functions fail with the TJBench tiling regression tests, most
likely because the implementation of the functions predates those tests.
Previously, these environment variables were not honored unless a 74K
CPU was detected, but this detection doesn't work properly with QEMU's
user mode emulation. With all other CPU types, libjpeg-turbo honors
JSIMD_FORCE* regardless of CPU detection.
This CMake variable is intended to define a wrapper program for
executing cross-compiled executables. However, CTest doesn't use
CMAKE_CROSSCOMPILING_EMULATOR, because it isn't obvious which tests
should be executed with the wrapper and which tests are scripts that
don't need it. This commit manually prepends
${CMAKE_CROSSCOMPILING_EMULATOR} to all unit test command lines that
execute a program built by the libjpeg-turbo build system. Thus, one
can set CMAKE_CROSSCOMPILING_EMULATOR in a CMake toolchain file to (for
instance) "qemu-{architecture} {qemu_arguments}") in order to execute
all eligible unit tests using QEMU.
+ document that tjFree() accepts NULL pointers without complaint.
Effectively, it has had that behavior all along, but the API does not
guarantee that tjFree() will be implemented with free() behind the
scenes, so it's best to formalize the behavior.
This programming practice (which exists in other code bases as well)
is a by-product of having used early C compilers that did not properly
handle free(NULL). All modern compilers should properly handle that.
Fixes#398
- Don't enumerate the types of SIMD instructions that libjpeg-turbo
supports, as this can change without notice.
- Use more clear terminology when describing support for libjpeg v7/v8
features ("libjpeg" is, colloquially but not officially, the name for
the IJG's software, whereas the "libjpeg API" refers to our emulation
of said software.)
- "PhotoShop" = "Photoshop" (StudLy Caps Police)
- Adjust dynamic library versions to reflect the addition of
jpeg_read_icc_profile() and jpeg_write_icc_profile() in
libjpeg-turbo 2.0.x.
Move constants out of the .text section in simd/arm64/jsimd_neon.S and
into a .rodata section. This ensures that the ARMv8 NEON SIMD
extensions are compatible with memory layouts that are marked
execute-only (and thus unreadable.)
Based on:
88f3ca7664Closes#318
libjpeg-turbo never included that code, because it requires an external
library (the Utah Raster Toolkit.) The RLE image format was supplanted
by GIF in the late 1980s, so it is rarely seen these days. (It had a
lousy Weissman score, anyhow.)
- Enable progress reporting at run time using a new -report argument
(cjpeg now supports that argument as well)
- Limit the allowable number of scans using a new -maxscans argument
- Treat warnings as fatal using a new -strict argument
This mainly demonstrates how to work around the two issues with the
JPEG standard described here:
https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf
since those and similar issues continue to be erroneously reported as
libjpeg-turbo bugs.
Modern Loongson processors are MIPS64-compatible, and MMI instructions
are now supported in the mainline of GCC. Thus, this commit adds
compile-time and run-time auto-detection of MMI instructions and moves
the MMI SIMD extensions for libjpeg-turbo from simd/loongson/ to
simd/mips64/. That will allow MMI and MSA instructions to co-exist
in the same build once #377 has been integrated.
Based on:
82953ddd61Closes#383
Because of 01e3032354 (officially
eliminating support for compilers without unsigned char, since we never
effectively supported those compilers anyhow), GETJOCTET() is now a
no-op. Since that macro is in jmorecfg.h, it is part of the de facto
libjpeg API and must remain in the public headers. However, there is no
reason to continue using it internally, and eliminating its internal use
improves code readability.
This commit adds ARM64 NEON optimizations for the
encode_mcu_AC_first() and encode_mcu_AC_refine() functions used in
progressive Huffman encoding.
Compression speedups for the typical set of five libjpeg-turbo test
images (https://libjpeg-turbo.org/About/Performance):
Cortex-A53: 23.8-39.2% (avg. 32.2%)
Cortex-A72: 26.8-41.1% (avg. 33.5%)
Apple A7: 29.7-45.9% (avg. 39.6%)
Closes#229
Homebrew tends to drop support for a macOS release the second that Apple
stops releasing security updates for it, and that makes HB difficult to
use with some of the Travis macOS images. Furthermore, even on
supported macOS releases, HB sometimes tries to build GCC from source
even if a binary (bottle) is available. Long story short, MacPorts just
generally has better backward compatibility. MacPorts is also what I
personally use on the official libjpeg-turbo build machine.
... detected by ASan. This is a similar issue to the issue that was
fixed with 402a715f82. Apparently it is
possible to create a malformed JPEG image that exceeds the Huffman
encoder's 256-byte local buffer when attempting to losslessly tranform
the image. That makes sense, given that it was necessary to extend the
Huffman decoder's local buffer to 512 bytes in order to handle all
pathological cases (refer to 0463f7c9aad060fcd56e98d025ce16185279e2bc.)
Since this issue affected only lossless transformation, a workflow that
isn't generally exposed to arbitrary data exploits, and since the
overrun did not overflow the stack (i.e. it did not result in a segfault
or other user-visible issue, and valgrind didn't even detect it), it did
not likely pose a security risk.
Fixes#392
... that caused some JPEG images with unusual sampling factors to be
misidentified as 4:4:4. This led to a buffer overflow when attempting
to decompress some such images using tjDecompressToYUV*().
Regression introduced by 479501b07c
The correct behavior is for the TurboJPEG API to refuse to decompress
such images, which it did prior to the aforementioned commit.
Fixes#389
Referring to #289, I'm not sure where I arrived at the conclusion that
the SSE2 progressive Huffman encoder doesn't provide any speedup for
x32. Upon re-testing, I discovered it to be about 50% faster than the
C encoder.
This commit also re-purposes one of the CI tests (specifically, the
jpeg-7 API/ABI test) so that it tests x32 as well.
... introduced by 42825b68d5. In fact,
fault-tolerant multi-scan block smoothing cannot currently be used with
the arithmetic decoder, because that decoder doesn't have any way of
distinguishing a normal end of scan from an unexpected end of scan.
Thus, this commit also modifies the change log to reset the expectations
regarding the scope of the fault-tolerant multi-scan block smoothing
feature. If, at some point in the future, the arithmetic decoder can be
modified to detect an unexpected end of scan, then one would need only
set entropy->pub.insufficient_data = TRUE when the arithmetic decoder
encounters an unexpected end of scan in order to make fault-tolerant
block smoothing work properly with that decoder.
This commit modifies the behavior of the block smoothing algorithm in
the libjpeg API library so that, if a scan in a multi-scan JPEG image is
incomplete (due to premature termination of the image stream), the block
smoothing parameters from the previous (complete) scan are used to
smooth any iMCU rows that the incomplete scan does not contain.
Closes#343
Modifying a locally-defined non-volatile variable below the setjmp()
return point results in undefined behavior whereby the variable may not
have the expected value after setjmp() returns.
Fixes#379
The primary impetus for this is to eliminate build warnings, such as
(32-bit only)
section "__textcoal_nt" is deprecated
object file (XXXXXX.o) was built for newer OSX version (10.XX) than
being linked (10.5)
Upgrading to GCC 6 results in neutral performance for compression,
a measured average overall decompression speedup of 2.5% for 64-bit
code, and a measured average overall decompression speedup of -4.3% for
32-bit code on a 3 GHz Core i7. The 4.3% slow-down for 32-bit code is
deemed acceptable, given that 32-bit macOS apps are deprecated.
Splitting the pointer arithmetic in GET_SYM() into a separate add and
sub instruction was an attempt to work around an error ("invalid operand
type") that occurred when assembling the file with NASM. However, this
created a link error on macOS ("ld: illegal text-relocation to
'_jconst_huff_encode_one_block' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o from
'_jsimd_huff_encode_one_block_sse2' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o for architecture i386")
and also changed the alignment of the code in ways that might have
affected the previous benchmark results (which took a great deal of time
to obtain.) Ultimately, the path of least resistance is just to
require NASM 2.13 or later.
This commit improves the C and SSE2 Huffman encoding implementations in
the following ways:
- Avoid using xmm8-xmm15 in the x86-64 SSE2 implementation. There is no
actual need to use those registers, and avoiding them produces a
cleaner WIN64 function entry/exit-- as well as shorter code, since REX
prefixes can be avoided (this is helpful on certain CPUs, such as
Intel Atom, for which instruction fetch and decoding can be a
bottleneck.)
- Optimize register usage so that fewer REX prefixes and
register-register moves are needed.
- Use the bit counter to store the number of free bits in the bit buffer
rather than the number of bits in the bit buffer. This changes the
method for inserting a code into the bit buffer to:
(put_buffer |= code << (free_bits -= code_size));
As a result:
* Only one bit counter needs to stay in a register (we just keep it in
cl.)
* The bit buffer contents are already properly aligned to be written
out (after a byte swap.)
* Adjusting the free bits counter and checking if the bit buffer is
full can be combined into a single operation.
* We can wait to flush the bit buffer until the buffer is actually
full and not just in danger of becoming full. Thus, eight bytes can
be flushed at a time.
- Speed is quite sensitive to the alignment of branch target labels, so
insert some padding and remove branches from the flush code.
(Flushing this way isn't actually faster when compared to using
branches, but the branchless code doesn't need extra alignment and is
thus smaller.)
- Speculatively write out the bit buffer as a single 8-byte write,
falling back to a byte-by-byte write only if there are any 0xFF bytes
in the bit buffer that need to be encoded as 0xFF 0x00.
- Use MMX registers for the 32-bit implementation (so the bit buffer can
be 64 bits wide.)
- Slightly reduce overall function code size.
- Eliminate or combine a few SSE instructions.
- Make some minor improvements to instruction scheduling.
- Adjust flush_bits() in jchuff.c to handle cases in which the bit
buffer has less than 7 free bits (apparently that couldn't happen
before.)
Based on:
947a09defa262ebb6b816e9a091221
See change log for performance claims.
Closes#292
This macro is a relic of libjpeg's historic need to support a wide
variety of C compilers with varying degrees of compatibility. Such was
necessary during the open systems era, because C compilers were often
supplied by the system vendor. Prior to 1989, there was no C standard
per se, and even after ANSI C became a thing, there were still compilers
in use that didn't conform to it (libjpeg was first released in 1991.)
Realistically, only a handful of C compilers are in widespread use these
days, and all modern C compilers should support structure assignment.
... to avoid backward compatibility issues with GCC 4-6 MinGW
toolchains. Apparently GCC 7+ MinGW toolchains introduce a link-time
dependency with internal MinGW CRT functions that are meant to provide
compatibility with Microsoft's Universal CRT (ucrt) library, but those
internal functions are not available in GCC 4-6 MinGW toolchains. This
made it impossible to use the official builds of libjpeg.a and
libturbojpeg.a with GCC 4-6 MinGW toolchains (a fatal link error--
"undefined reference to '__imp___acrt_iob_func'"-- occurred.)
This problem was not immediately apparent after switching to the MSYS2
implementation of MinGW (d6d7b53968)
because, for a while, MSYS2 was still using GCC 5 and 6.
Refer to libjpeg-turbo/libjpeg-turbo#382
byte, word, dword, qword, oword, and yword are all assembler keywords,
so it makes sense to use lowercase for these so as not to mistake them
for macros or constants.
(AKA "Java for OS X systems.") This implementation of Java 1.6 is long
obsolete and not supported on any version of macOS past High Sierra.
Oracle no longer provides a 32-bit JVM on macOS, so it is no longer
necessary to provide a 32-bit version of the TurboJPEG Java wrapper on
macOS.
If the TurboJPEG instance passed to tjDecodeYUV[Planes]() was previously
used to decompress a progressive JPEG image, then we need to disable the
progressive decompression parameters in the underlying libjpeg instance
before calling jinit_master_decompress().
This commit also modifies the build system so that the "tjtest" target
will test for this issue, and it corrects a previous oversight in the
build system whereby tjbenchtest did not test progressive
compression/decompression unless WITH_JAVA was true.
(regression introduced by 5b177b3cab)
The SSE2 implementation of progressive Huffman encoding performed
extraneous iterations when the scan length was a multiple of 16.
Based on:
bb7f1ef983Fixes#335Closes#367
- Re-purpose the non-SIMD test to test with MSan as well.
- Re-purpose the ASan test to test with UBSan as well.
- Use the default Travis build environment rather than specifying Ubuntu
14.04. I think I added 'dist: trusty' back when 14.04 was newer than
the default, but now it's older than the default.
- Enable verbose output for any unit tests that fail (so we can see the
sanitizer output.)
Prevent several integer overflow issues and subsequent segfaults that
occurred when attempting to compress or decompress gigapixel images with
the TurboJPEG API:
- Modify tjBufSize(), tjBufSizeYUV2(), and tjPlaneSizeYUV() to avoid
integer overflow when computing the return values and to return an
error if such an overflow is unavoidable.
- Modify tjunittest to validate the above.
- Modify tjCompress2(), tjEncodeYUVPlanes(), tjDecompress2(), and
tjDecodeYUVPlanes() to avoid integer overflow when computing the row
pointers in the 64-bit TurboJPEG C API.
- Modify TJBench (both C and Java versions) to avoid overflowing the
size argument to malloc()/new and to fail gracefully if such an
overflow is unavoidable.
In general, this allows gigapixel images to be accommodated by the
64-bit TurboJPEG C API when using automatic JPEG buffer (re)allocation.
Such images cannot currently be accommodated without automatic JPEG
buffer (re)allocation, due to the fact that tjAlloc() accepts a 32-bit
integer argument (oops.) Such images cannot be accommodated in the
TurboJPEG Java API due to the fact that Java always uses a signed 32-bit
integer as an array index.
Fixes#361
This commit modifies h1v2_fancy_upsample() so that it uses an ordered
dither pattern, similar to that of h2v1_fancy_upsample(), rounding up or
down the result for alternate pixels rather than always rounding down.
This ensures that the decompression error pattern for a 4:4:0 JPEG image
will be similar to the rotated decompression error pattern for a 4:2:2
JPEG image. Thus, the final result will be similar regardless of
whether a 4:2:2 JPEG image is rotated or transposed before or after
decompression.
Closes#356
xcode7.3 is based on El Capitan, which is EOL, and Homebrew no longer
provides El Cap bottles (pre-compiled binaries.) Thus, Homebrew was
trying to build GCC 5, YASM, and the other packages we need from source,
which caused the Mac CI builds to time out. I tried goading Homebrew
into installing GCC 5.5.0_2 and YASM 1.3.0_1, which still have El Cap
bottles available, by using the URLs of those specific versions of the
formulae (from the Homebrew GitHub repository) as package names. This
failed, however, because 'brew bundle' converted the URLs to all
lowercase. I then tried explicitly installing the old formulae by using
'brew install' with the aforementioned URLs (bypassing the Travis
Homebrew addon), but Homebrew still tried to build all of the
dependencies from source.
Upgrading to Xcode 8.3.x necessitated regression testing the performance
on iOS, but that proved less of a pain than figuring out how to install
all of the old Homebrew bottles we needed. Also, this future-proofs the
CI builds against the inevitable discontinuation of the xcode7.3 image.
Note that Xcode 8.3.x improves iOS 64-bit decompression performance
significantly relative to Xcode 7.2.x or 7.3.x. iOS 32-bit performance
unfortunately regresses by as much as 5%, but it can't be helped (32-bit
iOS apps are no longer supported on iOS 11+ anyhow, and the next major
release of libjpeg-turbo will remove support for them as well.)
... including, but not limited to:
- unused macros
- private functions not marked as static
- unprototyped global functions
- variable shadowing
(detected by various non-default GCC 8 warning options)
According to Intel's manual [1], "If a value entered for CPUID.EAX is
higher than the maximum input value for basic or extended function for
that processor then the data for the highest basic information leaf is
returned."
Right now, libjpeg-turbo doesn't first check that leaf 07H is supported
before attempting to use it, so the ostensible AVX2 bit (Bit 05) of the
CPUID result might actually be Bit 05 from a lower leaf. That bit might
be set, even if the CPU doesn't support AVX2.
This commit modifies the x86 and x86-64 SIMD feature detection code so
that it first checks whether CPUID leaf 07H is supported before
attempting to use it to check for AVX2 instruction support.
DRC:
This commit should fix
https://bugzilla.mozilla.org/show_bug.cgi?id=1520760
However, I have not personally been able to reproduce that issue,
despite using a Nehalem (pre-AVX2) CPU on which the maximum CPUID leaf
has been limited via a BIOS setting.
Closes#348
[1]
"Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z", https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf, page 3-192.
Same as d3a3a73f64 but in the fast decode
path. It was necessary to use a different-sized test image in order to
trigger the error in this location.
Refer to #347
5ea77d8b77 was insufficient to fix all of
these. In particular, we need to always release the primitive arrays
before throwing an exception, because throwing an exception qualifies as
"using JNI."
Refer to #300
CONTRIBUTING.md: Correct misuse of "as such" (Grammar Police)
bug-report.md: Clarify that the submitter should always test against the
latest stable code base.
Some pathological test images have been created that can cause s to
overflow or underflow the signed int data type during decompression.
This is technically undefined behavior according to the C spec, although
every modern implementation I'm aware of will treat the signed int as a
2's complement unsigned int, thus causing the value to wrap around to
INT_MIN if it exceeds INT_MAX. This commit simply makes that behavior
explicit in order to shut up UBSan. At least when building for x86-64
or i386 using Clang or GCC, this commit does not change the
compiler-generated assembly code at all.
The code that triggered this error has existed in the libjpeg code base
for at least 20 years (and probably much longer), so the fact that it
hasn't produced a user-visible problem in all of that time strongly
suggests that UBSan is being overly pedantic here. But if someone can
cough up a platform that doesn't wrap around to INT_MIN when 1 is added
to INT_MAX, then I'll happily change my opinion.
Fixes#347
... since www.nasm.us seems to be down frequently. This doesn't help us
at the moment, but hopefully once the site is back up this will prevent
future build failures.
When simd/arm/jsimd.c is compiled with __ARM_NEON__ defined (which will
be the case if -mfpu=neon is passed to the compiler), the
parse_proc_cpuinfo() and check_feature() functions and the bufsize
variable are unused and thus need to be #ifdef'ed out in order to avoid
compiler warnings. Note that the bufsize variable was already #ifdef'ed
out on Linux but not on Android due to lack of parentheses (&& takes
precedence over ||.)
Closes#331
%{_libdir}/pkgconfig is a directory and should thus be prefixed by
%{dir} (oops.) This issue caused the debuginfo build under RHEL 8
(which is apparently now enabled by default-- regardless of whether the
RPM actually contains debug info, but that's another matter) to fail
with:
RPM build errors:
File listed twice: /opt/libjpeg-turbo/lib64/pkgconfig/libjpeg.pc
File listed twice: /opt/libjpeg-turbo/lib64/pkgconfig/libturbojpeg.pc
On RHEL 7 and later (not sure exactly whether this is a product of the
newer RPM release or something distro-specific), macros are lazily
expanded, so we need to set _docdir using %global (which expands at
definition time) and prior to _prefix and _datarootdir (which affect
_defaultdocdir.) Otherwise, _docdir is set to a subdirectory of
/opt/libjpeg-turbo/share/doc or /opt/libjpeg-turbo/doc. The former
(which happens on RHEL 7) leads to incorrect documentation packaging
(the docs should be packaged under /usr/share/doc per Red Hat
standards), and the latter (which happens on RHEL 8) leads to an RPM
build error.
AFAICT, Requires and BuildRequires subsumed the functionality of Prereq
and BuildPrereq in RPM 4.0, and none of the platforms we support with
libjpeg-turbo 2.0.x has RPM < 4.4.
We have more than eight registers to work with, as well as three-operand
intrinsics, so there's no need for the implementation to be such a
literal port of the MMX code.
Unfortunately, this hack is necessary because:
- install(TARGETS, ...) doesn't support the RENAME option.
- We can't modify OUTPUT_NAME for the "-static" targets without breaking
the regression tests.
- ${CMAKE_CFG_INTDIR} doesn't seem to work properly in an install()
command.
Refer to #307
libjpeg-turbo has never really supported such compilers, since (AFAIK)
they are non-existent on any modern computing platform and thus
impossible for us to test. (Also, the TurboJPEG API would break without
unsigned chars.)
Furthermore, the unified CMake-based build system introduced in 2.0
always defines HAVE_UNSIGNED_CHAR, so retaining other code paths is
pointless. Eliminating support for compilers without unsigned char
eliminates the need for the GETJSAMPLE() macro, which improves the
readability of many parts of the code as well as improving the
performance of writing Targa and Windows BMP files.
Fixes#317
Including the license templates was confusing to some, since it made
it appear as if the copyright year and author were unspecified for the
libjpeg-turbo source. Thus, rather than include the zlib License
template, link to that template on opensource.org. For the Modified BSD
License, include a roll-up of copyright years and authors, since the
terms of that license require the text of it to be included in product
documentation for binary distributions without accompanying source code.
Use the android.toolchain.cmake toolchain file in the NDK (v13b or
later), since this toolchain file generally takes care of setting the
approprate compiler flags and dealing with the differences between
GCC and Clang. Our custom Android build procedure did not work with
Clang-based NDK toolchains, which meant that it could not be made to
work with NDK v18b or later.
Fixes#309
Normally, 4:4:4 JPEGs have horizontal x vertical luminance & chrominance
sampling factors of 1x1. However, it is technically legal to create
4:4:4 JPEGs with sampling factors of 2x1, 1x2, 3x1, or 1x3, since the
sums of the products of those sampling factors are still <= 10. The
libjpeg API correctly decodes such images, so the TurboJPEG API should
as well.
Fixes#323
If cinfo->quantize_colors == 1, then jpeg_calc_output_dimensions() will
set cinfo->output_components to 1, and if cinfo->out_color_space is not
RGB (or extended RGB), hilarity will ensue.
Fixes#305
Caused by 950580eb0c. Since the code that
sets CMAKE_INSTALL_RPATH now depends on ENABLE_SHARED, that code needed
to be moved to after the point at which ENABLE_SHARED is defined.
I give up on the public keyserver. It inexplicably just fails
sometimes. I was trying to use it out of an abundance of caution
(<cough> paranoia <cough>), but it seems like most open source projects
just serve up their public keys from their project web sites. The
private and public pre-release keys are still stored on separate sites,
the private key is still strongly encrypted by Travis, and we use a
separate key for pre-releases anyhow, so even if it's compromised, we
can quickly and easily deploy a new one.
... when downloading the RPM signing key. Apparently the key server
URL sometimes redirects to an https URL, which may explain why fetching
the RPM signing keys failed frequently when we used to run wget inside
of the CentOS 5 Docker container.
The build will consistently fail for days at a time with:
error: http://pool.sks-keyservers.net/pks/lookup?op=get&search=0x0575F26BD5B3FDB1: import read failed(-1).
I have a hunch that this is related to the CentOS 5 Docker container, so
this commit causes Travis to download the RPM signing key outside of
the container and share it with the container.
We shouldn't be making JNI calls between GetPrimitiveArrayCritical() and
ReleasePrimitiveArrayCritical(). Apparently Android is stricter about
this than desktop Java.
Issue was introduced in 0713c1bb54.
Fixes#300
Apparently Windows 7 without SP1 has O/S support for XSAVE but not for
YMM registers, and this exposed a bug in our usage of xgetbv. The test
instruction will set ZF only if none of the bits match between the two
operarands, so in effect, we were enabling AVX2 instructions if the O/S
supported XSAVE and the CPU supported AVX2 but the O/S only supported
XMM registers. This bug was not exposed on, for instance, Windows XP or
RHEL 5 because those O/S's do not support XSAVE.
Fixes#288
- CMake 3.10.x or later must be used with JDK 11, or an error
("regex not supported") will occur when CMake tries to parse the Java
version number.
- The JDK is no longer available at java.com.
The DSPr2 extensions have been verified to work with little endian MIPS.
Whether or not CMAKE_SYSTEM_PROCESSOR is set to "mips" or "mipsel" in a
little endian MIPS environment seems to be inconsistent, but our build
system needs to handle both cases.
(for instance, when passing -msoft-float to the compiler)
The instructions used by jsimd_quantize_float_dspr2() and
jsimd_convsamp_float_dspr2() don't work with the soft float ABI, so
disable those functions when soft float is enabled.
Based on:
129a739bfaCloses#272
When cli arguments request image-changing operation (like transform, scans or arith coding) to be applied, force output result file, even if it has bigger filesize than original
@rpath is only supported with 10.5 and later deployment targets.
libjpeg-turbo hasn't supported 10.4 "Tiger" since prior to 1.4, but I
still sometimes use the 10.4 SDK to test PowerPC code in a Snow Leopard
VM.
- When referring to specific clauses, annexes, tables, and figures, a
"timed reference" (a reference that includes the year) must be used in
order to avoid confusion.
- "CCITT" = "ITU-T"
- Replace ambiguous "JPEG spec" with the specific document number.
... in which one or more of the color indices is out of range for the
number of palette entries.
Fix partly borrowed from jpeg-9c. This commit also adopts Guido's
JERR_PPM_OUTOFRANGE enum value in lieu of our project-specific
JERR_PPM_TOOLARGE enum value.
Fixes#258
Normally the value of CMAKE_EXECUTABLE_SUFFIX is clobbered by project().
This allows for specifying an executable suffix of .html with Emscripten
builds, which causes Emscripten to build standalone HTML versions of the
libjpeg-turbo test programs.
The sentence:
"Indeed, one of the original reasons for developing this free software
was to help force convergence on common, interoperable format standards
for JPEG files."
might be seen to imply that JPEG 2000 and JPEG XR are not interoperable
with themselves, although it is certainly the case that those formats
are not interoperable with each other, nor with
ITU T.81 | ISO/IEC 10918. They are also certainly not as common as
ITU T.81 | ISO/IEC 10918, and (as an example) popular web browsers will
not display JPEG 2000 files.
The sentence in question was originally referring to proprietary,
non-standard formats and was meant to provide historical context.
libjpeg was originally released prior to the adoption of JFIF as an
official standard, so it encouraged adoption of JFIF as a de facto
standard by providing, under a business-friendly free software license,
a library for reading and writing images in that format.
This is basically the same test that was performed in acinclude.m4 in
the old autotools-based build system. It was not ported to the
CMake-based build system because I previously had no way of testing
a non-DSPr2 build environment.
Fixes#248
... caused by using certain specific combinations of
jpeg_skip_scanlines() and jpeg_read_scanlines() calls with progressive,
vertically-subsampled JPEG images.
Fixes#237
In rdbmp.c, it is necessary to guard against 32-bit overflow/wraparound
when allocating the row buffer, because since BMP files have 32-bit
width and height fields, the value of biWidth can be up to 4294967295.
Specifically, if biWidth is 1073741824 and cinfo->input_components = 4,
then the samplesperrow argument in alloc_sarray() would wrap around to
0, and a division by zero error would occur at line 458 in jmemmgr.c.
If biWidth is set to a higher value, then samplesperrow would wrap
around to a small number, which would likely cause a buffer overflow
(this has not been tested or verified.)
... in tjLoadImage()/tjSaveImage(). These error codes require an add-on
message table, and if it isn't initialized, then format_message()
produces "Bogus message code XXXX" instead.
This change is proposed to enable cases where this library and it's CMakeLists.txt are included in other projects/CMakeLists.txt files as a dependency via the add_subdirectory method, for example: add_subdirectory(mozjpeg). There are several reasons and workflows which "wrap" third party projects/builds using this method, including many enterprise build/devops pipelines.
This change will have no effect on users building mozjpeg by itself as usual, it very simply enables the wrapping use cases.
With this parem do not write JFIF APP0 marker segment. Reduce size in 18 bytes. This is a mandatory marker, but no error in know programs if are lost. Safe for web use.
(detected by enabling additional checkstyle modules)
This commit also removes unnecessary uses of the "private" modifier in
the Java tests/examples. The default access modifier disallows access
outside of the package, and none of these classes is in a package. The
only reason we use "private" with member variables in these classes is
to make checkstyle happy, because we want it to enforce that behavior in
the TurboJPEG API code.
... and modify tjbench.c to match the variable name changes made to
TJBench.java
("checkstyle" = http://checkstyle.sourceforge.net, not our regex-based
checkstyle script)
Arguably it doesn't make much sense for non-chroma components to be
subsampled (which is why this type of image was overlooked in
cd7c3e6672cce3779450c6dd10d0d70b0c2278b2-- I didn't realize it was a
thing), but certain Adobe applications apparently generate these images.
Fixes#236
The old Un*x (autotools-based) build system always auto-generated this
file, but that behavior was more or less a relic of the days before the
libjpeg-turbo colorspace extensions were implemented. The thinking was
that, if a particular developer wanted to change RGB_RED, RGB_GREEN,
RGB_BLUE, or RGB_PIXELSIZE in order to compress from/decompress to
different RGB pixel layouts, then the SIMD extensions should
automatically respond to those changes whenever they were made to
jmorecfg.h. The modern reality is that changing RGB_* is no longer
necessary because of the libjpeg-turbo colorspace extensions, and
changing any of the other constants in jsimdcfg.inc can't be done
without making deeper modifications to the SIMD extensions. In general,
we treat RGB_* as a de facto, immutable part of the legacy libpjeg API.
Realistically, since the values of those constants have been the same in
every Un*x distribution released in the past 20-30 years, any software
that uses a system-supplied build of libjpeg must assume that those
constants will have default values.
Furthermore, even if it made sense to auto-generate jsimdcfg.inc, it was
never possible to do so on Windows, so it was always going to be
necessary to manually generate the Windows version of the file whenever
any of the constants changed. This commit introduces a new custom CMake
target called "jsimdcfg" that can be used, on Un*x platforms, to
generate jsimdcfg.inc on demand, although this should only be necessary
when introducing new x86 SIMD instructions or making other deep
modifications, such as SIMD acceleration for 12-bit JPEGs.
For those who may be wondering why we don't do the same thing for
win/jconfig.h.in, it's because performing all of the necessary CMake
checks to populate that file is very slow on Windows.
These were necessary for the first iteration of the feature (see #46),
which provided a different C front end for the SIMD version of the
function. The final version of the feature uses a common C front end
for both SIMD and non-SIMD implementations, so these checks are no
longer necessary.
Closes#231
This commit merges the following paragraph from the latest libjpeg
release:
https://github.com/libjpeg-turbo/ijg/blob/jpeg-9c/README#L222-L229
which takes into account the fact that JFIF is now an official ISO/ITU-T
standard. I also included the ISO/IEC document number for the JFIF spec
(jpeg-9c included only the ITU-T rec number.)
This commit also heavily wordsmiths the "FILE FORMAT WARS" section.
In jpeg-7 and later, this section has become somewhat impolitic,
referring to JPEG 2000 and JPEG XR as "faulty technologies" and
"momentary mistakes." The original intent of this section, which was
introduced in jpeg-5 and refined in jpeg-6
(https://github.com/libjpeg-turbo/ijg/blob/jpeg-5/README#L317-L338,
https://github.com/libjpeg-turbo/ijg/blob/jpeg-6b/README#L335-L367)
was to highlight the problem of JPEG file format divergence that existed
in the 1990s prior to the adoption of JFIF as an official ISO/ITU-T
standard. That problem is fortunately no longer a problem, thanks in
part to the existence of libjpeg. I have attempted to preserve Tom's
intent of using this section to describe which file formats the code is
compatible with and why it isn't compatible with some file formats
bearing the name "JPEG." Such modifications always put our project in a
very awkward position, because we are not the IJG and do not claim to
be, but it is still necessary for us to modify the IJG README file from
time to time to eliminate obsolete information while attempting to
remain as neutral as possible.
Instructing the compiler to treat warnings as errors caused some of the
compiler tests to fail, because the test code was not 100% clean.
Note that we now use check_symbol_exists() to check for memset() and
memcpy(), since the test code for check_function_exists() produces a
compiler warning due to not including <string.h>.
... instead of the RSA code, the license for which contains an
advertising clause. It is strongly believed that the RSA advertising
clause is innocuous, because:
- A clarification from RSA
(http://www.ietf.org/ietf-ftp/IPR/RSA-MD-all), published in 2000,
stated:
"Implementations of these message-digest algorithms, including
implementations derived from the reference C code in RFC-1319,
RFC-1320, and RFC-1321, may be made, used, and sold without license
from RSA for any purpose."
Referring to the opinion from Fedora's legal team
(https://fedoraproject.org/wiki/Licensing:FAQ?rd=Licensing/FAQ#What_about_the_RSA_license_on_their_MD5_implementation.3F_Isn.27t_that_GPL-incompatible.3F),
this means that md5.c and md5.h, which were derived from the original
RFC 1321 reference code (http://www.faqs.org/rfcs/rfc1321.html), can
be used without the RSA license.
- In the context of libjpeg-turbo, RSA's MD5 code was used only in the
build/test system. It was not part of the libjpeg-turbo binary
distribution, and thus the only "material mentioning or referencing"
the MD5 code was the libjpeg-turbo source code, which-- by virtue of
including RSA's original copyright headers-- properly attributed the
code as required under the RSA license.
However, in light of the open source community's tendency to have
knee-jerk reactions to stuff like this, it would've been necessary to
include the above explanation in our source tree in order to head off
potential FUD, and a simple fix is always better than a complex
explanation.
This commit also assigns the 3-clause BSD license to my modifications of
the MD5 code. This license is the same one used by md5cmp and other
parts of the build system.
When attempting to configure an iOS/ARM build with Xcode 7.2 and CMake
2.8.12, I got the following errors:
CMake Error at CMakeLists.txt:560 (add_library):
Attempting to use MACOSX_RPATH without CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG
being set. This could be because you are using a Mac OS X version less
than 10.5 or because CMake's platform configuration is corrupt.
(x 3)
CMake Error at sharedlib/CMakeLists.txt:38 (add_library):
Attempting to use MACOSX_RPATH without CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG
being set. This could be because you are using a Mac OS X version less
than 10.5 or because CMake's platform configuration is corrupt.
(x 3)
Upgrading to CMake 3.x (tried 3.0 and 3.1) got rid of the errors, but
the resulting shared libs still did not use @rpath as expected. Note
also that CMake 3.x (at least the two versions I tested) does not
automatically set the MACOSX_RPATH property as claimed. I could find
nothing in the release notes for later CMake releases to indicate that
either problem has been fixed. What I did find was this little nugget
of code in the Darwin platform module:
f6b93fbf3a/Modules/Platform/Darwin.cmake (L33-L36)
This sets CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG="-Wl,-rpath," only if you
are running OS X 10.5 or later. It makes no such check for iOS, perhaps
because shared libraries aren't much of a thing with iOS apps. In any
event, this commit simply sets CMAKE_SHARED_LIBRARY_RUNTIME_C_FLAG if it
isn't set already, and that fixes all of the aforementioned problems.
- Travis doesn't set the $encrypted_* variables for PRs, so disable GPG
signing when building a PR (artifacts aren't deployed for PRs anyhow,
and even if they were, I wouldn't want them to be signed, as they may
contain unvetted code.)
- Take advantage of the new -d option in buildljt, which allows for
building from an existing Git clone directory. This eliminates the need
to rename and restore .git/shallow, allows the official build scripts to
work properly when building PRs, and prevents 'git clone' being invoked
twice in CI builds.
Refer to #217
These files are potentially useful to MinGW users, since MSYS2 MinGW
environments have a man command by default and provide an easy way to
install pkg-config.
Closes#223
This commit adds C and SSE2 optimizations for the encode_mcu_AC_first()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@293263c using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +19%
gcc-5 x86_64: +80%
gcc-7 x86_64: +57%
clang i386: +5%
gcc-5 i386: +59%
gcc-7 i386: +51%
SSE2
clang x86_64: +79%
gcc-5 x86_64: +158%
gcc-7 x86_64: +122%
clang i386: +71%
gcc-5 i386: +134%
gcc-7 i386: +135%
Discussion in libjpeg-turbo/libjpeg-turbo#46
This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@3c54642 using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +7%
gcc-5 x86_64: +30%
gcc-7 x86_64: +33%
clang i386: +0%
gcc-5 i386: +24%
gcc-7 i386: +23%
SSE2
clang x86_64: +42%
gcc-5 x86_64: +53%
gcc-7 x86_64: +64%
clang i386: +35%
gcc-5 i386: +46%
gcc-7 i386: +49%
Discussion in libjpeg-turbo/libjpeg-turbo#46
Within the libjpeg API code, it seems to be more the convention than not
to separate the macro name and value by two or more spaces, which
improves general readability. Making this consistent across all of
libjpeg-turbo is less about my individual preferences and more about
making it easy to automatically detect variations from our chosen
formatting convention. I intend to release the script I'm using to
validate this stuff, once it matures and stabilizes a bit.
* Modify the SIMD dispatchers so they guard their usage of getenv() with
the existing NO_GETENV preprocessor definition.
* Introduce a new NO_PUTENV preprocessor definition to guard the
usage of putenv() in the TurboJPEG API library.
This at least puts Windows Store compatibility within the realm of
possibility, although further steps are required.
Broken by previous commit. Although turbojpeg.c no longer needs
tjutil.h on Un*x, it still needs to include that file on Windows in
order to use snprintf() and strcasecmp() (which, on Windows, are macros
that wrap _snprintf_s() and stricmp().)
With rare exceptions ...
- Always separate line continuation characters by one space from
preceding code.
- Always use two-space indentation. Never use tabs.
- Always use K&R-style conditional blocks.
- Always surround operators with spaces, except in raw assembly code.
- Always put a space after, but not before, a comma.
- Never put a space between type casts and variables/function calls.
- Never put a space between the function name and the argument list in
function declarations and prototypes.
- Always surround braces ('{' and '}') with spaces.
- Always surround statements (if, for, else, catch, while, do, switch)
with spaces.
- Always attach pointer symbols ('*' and '**') to the variable or
function name.
- Always precede pointer symbols ('*' and '**') by a space in type
casts.
- Use the MIN() macro from jpegint.h within the libjpeg and TurboJPEG
API libraries (using min() from tjutil.h is still necessary for
TJBench.)
- Where it makes sense (particularly in the TurboJPEG code), put a blank
line after variable declaration blocks.
- Always separate statements in one-liners by two spaces.
The purpose of this was to ease maintenance on my part and also to make
it easier for contributors to figure out how to format patch
submissions. This was admittedly confusing (even to me sometimes) when
we had 3 or 4 different style conventions in the same source tree. The
new convention is more consistent with the formatting of other OSS code
bases.
This commit corrects deviations from the chosen formatting style in the
libjpeg API code and reformats the TurboJPEG API code such that it
conforms to the same standard.
NOTES:
- Although it is no longer necessary for the function name in function
declarations to begin in Column 1 (this was historically necessary
because of the ansi2knr utility, which allowed libjpeg to be built
with non-ANSI compilers), we retain that formatting for the libjpeg
code because it improves readability when using libjpeg's function
attribute macros (GLOBAL(), etc.)
- This reformatting project was accomplished with the help of AStyle and
Uncrustify, although neither was completely up to the task, and thus
a great deal of manual tweaking was required. Note to developers of
code formatting utilities: the libjpeg-turbo code base is an
excellent test bed, because AFAICT, it breaks every single one of the
utilities that are currently available.
- The legacy (MMX, SSE, 3DNow!) assembly code for i386 has been
formatted to match the SSE2 code (refer to
ff5685d5344273df321eb63a005eaae19d2496e3.) I hadn't intended to
bother with this, but the Loongson MMI implementation demonstrated
that there is still academic value to the MMX implementation, as an
algorithmic model for other 64-bit vector implementations. Thus, it
is desirable to improve its readability in the same manner as that of
the SSE2 implementation.
+ "JSIMD_ARM_NEON" = "JSIMD_NEON"
+ "JSIMD_MIPS_DSPR2" = "JSIMD_DSPR2"
+ "*_mips_dspr2" = "*_dspr2"
It's obvious that "NEON" refers to Arm and "DSPr2" refers to MIPS, and
this naming convention is consistent with the other SIMD extensions.
Newer versions of CMake (known to be the case with 3.7.x and 3.10.x)
fail to add a space between CMAKE_C_FLAGS and CMAKE_ASM_FLAGS, which
causes the build to fail when using the official build procedure.
Closes#216
Referring to https://docs.microsoft.com/en-US/cpp/build/stack-usage:
"All memory beyond the current address of RSP is considered volatile:
The OS, or a debugger, may overwrite this memory during a user debug
session, or an interrupt handler. Thus, RSP must always be set before
attempting to read or write values to a stack frame."
Basically, if-- under extremely rare circumstances-- a context swap were
to occur between saving the values of xmm8-xmm11 and setting the new
value of rsp, the O/S might not preserve that area of the stack. In
general, libjpeg-turbo should not be using xmm8-xmm11 before or after
the call to jsimd_huff_encode_one_block_sse2(), so this is probably a
non-issue, but it's still a good idea to fix it.
Based on
ff7d2030dd
NDK r16b moved some things around, so modify the Android build recipes
to take that into account while preserving compatibility with previous
NDK releases.
NOTE: the GCC 4.9 NDK toolchain is deprecated, so we will need to
develop new Android build recipes for libjpeg-turbo 1.6 that use the
Clang toolchain.
Closes#196
If jpeg_skip_scanlines() is used to skip to the end of a single-scan
image, then we need to change the library state such that subsequent
calls to jpeg_consume_input() will return JPEG_REACHED_EOI rather than
JPEG_SUSPENDED. (NOTE: not necessary for multi-scan images, since the
scans are processed prior to any call to jpeg_skip_scanlines().)
Unless I miss my guess, using jpeg_skip_scanlines() in this manner
will prevent any markers at the end of the JPEG image from being
read, but I don't think there is any way around that without actually
reading the data, which would defeat the purpose of
jpeg_skip_scanlines().
Fixes#194
Refer to travis-ci/travis-ci#8552. This was supposed to be fixed on
November 15, then on November 28. Travis blew through both deadlines,
so I have no confidence that the issue will be fixed as promised in a
timely manner. Adding 'brew update' to .travis.yml slows the OS X
build, but there is no choice at the moment.
Loading RGB image files into a grayscale buffer isn't a particularly
useful feature, given that libjpeg-turbo can perform this conversion
much more optimally (with SIMD acceleration on some platforms) during
the compression process. Also, the RGB2GRAY() macro was not producing
deterministic cross-platform results because of variations in the
round-off behavior of various floating point implementations, so
`tjunittest -bmp` was failing in i386 builds.
Also, set the red/green/blue offsets for TJPF_GRAY to -1 rather than 0.
It was undefined behavior for an application to use those arrays/methods
with TJPF_GRAY anyhow, and this makes it easier for applications to
programmatically detect whether a given pixel format has red, green, and
blue components.
The main justification for this is to provide new libjpeg-turbo users
with a quick & easy way of developing a complete JPEG
compression/decompression program without requiring them to build
libjpeg-turbo from source (which was necessary in order to use the
project-private bmp API) or to use external libraries. These new
functions build upon significant enhancements to rdbmp.c, wrbmp.c,
rdppm.c, and wrppm.c which allow those engines to convert directly
between the native pixel format of the file and a pixel format
("colorspace" in libjpeg parlance) specified by the calling program.
rdbmp.c and wrbmp.c have also been modified such that the calling
program can choose to read or write image rows in the native (bottom-up)
order of the file format, thus eliminating the need to use an inversion
array. tjLoadImage() and tjSaveImage() leverage these new underlying
features in order to significantly improve upon the performance of the
old bmp API.
Because these new functions cannot work without the libjpeg-turbo
colorspace extensions, the libjpeg-compatible code in turbojpeg.c has
been removed. That code was only there to serve as an example of how
to use the TurboJPEG API on top of libjpeg, but more specific, buildable
examples now exist in the https://github.com/libjpeg-turbo/ijg
repository.
tjbenchtest and its Java derivatives are useful for rooting out hidden
problems with the more esoteric TJBench and TurboJPEG features. For
instance, on Windows, running tjbenchtest uncovered
5fce2e9421.
This commit also causes tjbenchtest and tjbenchtest.java to append -yuv
and -alloc to their log file names, depending on the arguments passed,
and it causes the build system to clean up those log files when the
'testclean' target is built.
+ clean up log files when 'make testclean' is invoked
+ fix 'tjbenchtest -yuv -alloc'
+ fix tjexampletest so that it creates images under /tmp
+ clean up tjexampletest
The program crashed when a JPEG image was passed on the command line,
because we were mixing our metaphors vis-a-vis malloc()/free() and
tjAlloc()/tjFree() (malloc()/free() uses the tjbench.exe heap,
whereas tjAlloc()/tjFree() uses the turbojpeg.dll heap.)
- Referring to 073b0e88a1 and #185, the
reason why BMP and RLE didn't (and won't) work with partial image
decompression is that the output engines for both formats maintain a
whole-image buffer, which is used to reverse the order of scanlines.
However, it was straightforward to add -crop support for GIF and
Targa, which is useful for testing partial image decompression along
with color quantization.
- Such testing reproduced a bug reported by Mozilla (refer to PR #182)
whereby jpeg_skip_scanlines() would segfault if color quantization was
enabled. To fix this issue, read_and_discard_scanlines() now sets up
a dummy quantize function in the same manner that it sets up a dummy
color conversion function.
Closes#182
... and document that only PPM/PGM output images are supported with the
-crop option for the moment.
I investigated the possibility of supporting -crop with -bmp, but even
after resetting the buffer dimensions, I still kept getting virtual
array access errors. It seems that doing this the "right way" would
require creating a re-initialization function for each image format's
destination manager. I'm disinclined to do that right now, given that
this feature was Google's baby (developed as a prerequisite for
including libjpeg-turbo in Android), and the -crop option in djpeg is
intended only as an example of how to use the partial image
decompression API. Real-world applications would need to handle this
in their own destination managers.
It would probably be possible to make this work with Targa by employing
a similar hack to the one we used with PPM, but Targa isn't popular
enough to bother.
Fixes#185
Setting _libdir to CMAKE_INSTALL_FULL_LIBDIR only works when doing an
in-tree RPM build. SRPMs are architecture-agnostic, so the spec needs
to compute_libdir at the time the SRPM is rebuilt, not at the time it
is generated.
This is a regression introduced when implementing the new CMake-based
cross-platform build system.
The version of RPM on RHEL 5 and older platforms defines _libdir
as %{_exec_prefix}/%{_lib}, so defining _lib in the spec file redefined
_libdir. However, newer versions of RPM (probably >= 4.6, since that
was the version that introduced the ISA macros) define _libdir as either
%{_prefix}/lib or %{_prefix}/lib64. Thus, we need to explicitly
override _libdir in our spec file.
It is necessary for the C code to be aware of the machine's endianness,
which is why the TurboJPEG Java wrapper sets a different pixel format
for integer BufferedImages depending on ByteOrder.nativeOrder().
However, it isn't necessary to handle endianness in pure Java code such
as TJUnitTest (d'oh!) This was a product of porting the C version of
TJUnitTest too literally, and of insufficient testing (historically,
the big endian systems I had available for testing didn't have Java.)
planes == null is a valid argument to setBuf() if alloc == true, so we
need to make sure that planes is non-null before validating its length.
We also need to allocate one dimension of the planes array if it's null.
Fixes#168
Tag 1.5.2 release
* tag '1.5.2': (54 commits)
x86: Fix "short jump is out of range" w/ NASM<2.04
TurboJPEG: Document xform issue w/ big marker data
Java TJBench: Fix parsing of -warmup argument
Build: Disable warmup in TJBench regression tests
TJBench: Improve consistency of results
TurboJPEG: C API documentation buglet
TJBench: Code formatting tweaks
TJBench: Fix errors when decomp. files w/ ICC data
BUILDING.md: Include Android/x86 build recipes
Travis: Fix OS X build
Restore compatibility with older autoconf releases
Attribute ARM runtime detection code to Nokia
Honor max_memory_to_use/JPEGMEM/-maxmemory
AppVeyor: Fix CI build
TurboJPEG: Fix potential memory leaks
Always tweak EXIF w/h tags w/ lossless transforms
Fix error w/ lossless crop & libjpeg v7 emulation
Include jpeg_skip/crop_scanlines() in jpeg7.dll
libjpeg.txt: Include partial decomp. in TOC
Slightly de-confusify cjpeg, jpegtran usage info
...
Previously, -stoponwarning only had an effect on the underlying
TurboJPEG C functions, but TJBench still aborted if a non-fatal error
occurred. This commit modifies the C version of TJBench such that it
always recovers from a non-fatal error unless -stoponwarning is
specified. Furthermore, the benchmark stores the details of the last
non-fatal error and does not print any subsequent non-fatal error
messages unless they differ from the last one.
Due to limitations in the Java API (specifically, the fact that it
cannot communicate errors, fatal or otherwise, to the calling program
without throwing a TJException), it was only possible to make
decompression operations fully recoverable within TJBench. With other
operations, -stoponwarning still has an effect on the underlying C
library but has no effect at the Java level.
The Java API documentation has been amended to reflect that only certain
methods are truly recoverable, regardless of the state of
TJ.FLAG_STOPONWARNING.
Allow progressive entropy coding to be enabled on a
transform-by-transform basis, and implement a new transform option for
disabling the copying of markers.
Closes#153
If the source image for a transform operation has a lot of EXIF or ICC
data embedded in it, then it may cause the output image size to exceed
the worst-case size returned by tjBufSize() (because tjTransform()
transfers all markers to the output image.) This is only a problem if
TJFLAG_NOREALLOC is passed to the function. Since the TurboJPEG C API
doesn't require the destination image size to be set in this case, it
makes the documented assumption that the calling program has allocated
the destination buffer to exactly the size returned by tjBufSize().
Changing this assumption would change the API behavior and necessitate
a new function name (tjTransform2().) At the moment, it's easier to
just document this as a known issue, since it's easy to work around in
the C API.
The Java API is unfortunately a different story, since it must always
use TJFLAG_NOREALLOC (because, when using the TurboJPEG Java API, all
buffers are allocated on the Java heap, and thus they can't be
reallocated by the C code.) There is no easy way to work around this
without changing the C API as discussed above, because if the source
image contains large amounts of marker data, it's virtually impossible
to determine how big the output image will be.
Given that libjpeg-turbo can often process hundreds of megapixels/second
on modern hardware, the default of one warmup iteration was essentially
meaningless. Furthermore, the -warmup option was a bit clunky, since
it required some foreknowledge of how fast the benchmarks were going to
execute.
This commit introduces a 1-second warmup interval for each benchmark by
default, and the -warmup option has been retasked to control the length
of that interval.
- Provide a new C API function and TJException method that allows
calling programs to query the severity of a compression/decompression/
transform error.
- Provide a new flag that instructs the library to immediately stop
compressing/decompressing/transforming if a warning is encountered.
Fixes#151
Introduce a new C API function (tjGetErrorStr2()) that can be used to
retrieve compression/decompression/transform error messages in a
thread-safe (i.e. instance-specific) manner. Retrieving error messages
from global functions is still thread-unsafe.
Addresses a concern expressed in #151.
Embedded ICC profiles can cause the size of a JPEG file to exceed the
size returned by tjBufSize() (which is really meant to be used for
compression anyhow, not for decompression), and this was causing a
segfault (C) or an ArrayIndexOutOfBoundsException (Java) when
decompressing such files with TJBench. This commit modifies the
benchmark such that, when tiled decompression is disabled, it re-uses
the source buffer as the primary JPEG buffer.
The Travis xcode7.3 image now apparently includes GnuPG 1.4.x by
default, so use it instead of installing GnuPG 2. Using GnuPG 2.1.x,
the default version in Homebrew as of this writing, is problematic for
this reason:
https://wiki.archlinux.org/index.php/GnuPG#Unattended_passphrase
This re-introduces a feature of the obsolete system-specific libjpeg
memory managers-- namely the ability to limit the amount of main memory
used by the library during decompression or multi-pass compression.
This is mainly beneficial for two reasons:
- Works around a 2 GB limit in libFuzzer
- Allows security-sensitive applications to set a memory limit for the
JPEG decoder so as to work around the progressive JPEG exploit
(LJT-01-004) described here:
http://www.libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf
This commit also removes obsolete documentation regarding the MS-DOS
memory manager (which itself was removed long ago) and changes the
documentation of the -maxmemory switch and JPEGMEM environment variable
to reflect the fact that backing stores are never used in libjpeg-turbo.
Inspired by:
066fee2e7dCloses#143
Something changed in the CI build environment, and our previous trick of
setting the Git URL to file://c:/projects/libjpeg-turbo no longer works.
Using cygpath to translate the Windows path to a MinGW-friendly format
is a better solution anyhow.
Referring to https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=746,
it seems that the values of local buffer pointers in TurboJPEG API
functions aren't always preserved if longjmp() returns control to a
point prior to the allocation of the local buffers. This is known to
be an issue with GCC 4.x and clang with -O1 and higher optimization
levels but not with GCC 5.x and later. It is unknown why GCC 5.x and
6.x do not suffer from the issue, but possibly the local buffer pointers
are not allocated on the stack when using those more recent compilers.
In any case, this commit modifies the TurboJPEG API library code such
that the jump buffer is always updated after any local buffer pointers
are allocated but before any subsequent libjpeg API functions are
called.
Tag 1.5.1 release
* tag '1.5.1':
ARM64 NEON: Fix another ABI conformance issue
Build: Remove ARMv6 support from 'make iosdmg'
Fix out-of-bounds write in partial decomp. feature
Silence additional UBSan warnings
Fix unsigned int overflow in libjpeg memory mgr.
TurboJPEG: Decomp. 4:2:2/4:4:0 JPEGs w/unusual SFs
Silence pedantic GCC6 code formatting warnings
Use plain upsampling if merged isn't accelerated
Implement h1v2 fancy upsampling
Fix AArch64 ABI conformance issue in SIMD code
Don't install libturbojpeg.pc if TJPEG disabled
Linux/PPC: Only enable AltiVec if CPU supports it
ARM/MIPS: Change the behavior of JSIMD_FORCE*
Bump version to 1.5.1 to prepare for new commits
This commit does the following:
-- Merges the two glueware functions (read_icc_profile() and
write_icc_profile()) from iccjpeg.c, which is contained in downstream
projects such as LCMS, Ghostscript, Mozilla, etc. These functions were
originally intended for inclusion in libjpeg, but Tom Lane left the IJG
before that could be accomplished. Since then, programs and libraries
that needed to embed/extract ICC profiles in JPEG files had to include
their own local copy of iccjpeg.c, which is suboptimal.
-- The new functions were prefixed with jpeg_ and split into separate
files for the compressor and decompressor, per the existing libjpeg
coding standards.
-- jpeg_write_icc_profile() was made slightly more fault-tolerant.
It will now trigger a libjpeg error if it is called before
jpeg_start_compress() or if it is passed NULL arguments.
-- jpeg_read_icc_profile() was made slightly more fault-tolerant.
It will now trigger a libjpeg error if it is called before
jpeg_read_header() or if it is passed NULL arguments. It will also
now trigger libjpeg warnings if the ICC profile data is corrupt.
-- The code comments have been wordsmithed.
-- Note that the one-line setup_read_icc_profile() function was not
included. Instead, libjpeg.txt now documents the need to call
jpeg_save_markers(cinfo, JPEG_APP0 + 2, 0xFFFF) prior to calling
jpeg_read_header(), if jpeg_read_icc_profile() is to be used.
-- Adds documentation for the new functions to libjpeg.txt.
-- Adds an -icc switch to cjpeg and jpegtran that allows those programs
to embed an ICC profile in the JPEG files they generate.
-- Adds an -icc switch to djpeg that allows that program to extract an
ICC profile from a JPEG file while decompressing.
-- Adds appropriate unit tests for all of the above.
-- Bumps the SO_AGE of the libjpeg API library to indicate the presence
of new API functions.
Note that the licensing information was obtained from:
https://github.com/mm2/Little-CMS/issues/37#issuecomment-66450180
... even if using libjpeg v6b emulation. Previously
adjust_exif_parameters() was only called with libjpeg v7/v8 emulation,
but due to a bug (which this commit also fixes), it only worked properly
with libjpeg v8 emulation.
The JPEG_LIB_VERSION #ifdef in jtransform_adjust_parameters() was
incorrect, which caused a "Bogus virtual array access" error when
attempting to use the lossless crop feature.
Introduced in c04bd3cc97.
This also adds libjpeg v7 API/ABI emulation to the Travis CI tests.
LICENSE.md is included in the binary distributions as well, so it
doesn't make much sense to refer to license headers in source files that
aren't necessarily going to be there.
The whole point of `make tarball` is to make it easy for users to create
a binary distribution of libjpeg-turbo on platforms that aren't
supported by our official build system, so requiring root permissions
somewhat defeated that purpose. Intead, the script now attempts to
detect whether the system has GNU tar or a recent version of BSD tar
that supports setting the ownership of the files in the tarball.
Although there is little chance that we will ever have a package
conflict on OS X, the convention from our Linux packages is to use the
package name, not the project name, for the name of the documentation
directory.
These improvements enable build systems to use GNUInstallDirs to define
custom directory variables.
- The set_dir() macro was renamed to GNUInstallDirs_set_install_dir(),
in keeping with the module's established macro naming convention.
- Rather than detecting whether the prefix has changed, the new
GNUInstallDirs_set_install_dir() macro instead examines whether the
default for the variable in question has changed. This allows for
more flexibility, since build systems may decide to change the
defaults based on factors other than the prefix. It also enables the
macro to work properly outside of the module.
- The module now performs directory variable substitution within the
body of GNUInstallDirs_get_absolute_install_dir().
- The JAVADIR variable is no longer included in GNUInstallDirs. That
directory is not part of the GNU spec, and it turns out that various
operating systems use different conventions for the location of Java
classes. Instead, the variable is now implemented in our build
system as a demonstration of the aforementioned GNUInstallDirs
enhancements.
- GNUInstallDirs: any directory variable can now reference any other
directory variable by including its name in angle brackets (<>).
- Changed the documentation of the directory variables in BUILDING.md
accordingly. This commit also includes some formatting tweaks to
that section (using boldface for directory names, as is our
convention.)
- Changed the package scripts such that they use
CMAKE_INSTALL_DATAROOTDIR rather than CMAKE_INSTALL_DATADIR.
- We no longer override the install dir. defaults on Windows unless
performing an official build. It may be useful, for instance, to
use the GNU defaults when installing into an MSYS environment.
It isn't actually necessary to specify `CMAKE_INSTALL_DEFAULT_MANDIR`
for our official build. Because `CMAKE_INSTALL_DEFAULT_DATAROOTDIR` is
blank for the official build, the default of "<DATAROOTDIR>/man" will
resolve to "man".
For the same reason, this commit changes the specification of
`CMAKE_INSTALL_DEFAULT_DOCDIR` and `CMAKE_INSTALL_DEFAULT_JAVADIR` in
the official build to be dependent on the data root directory (mainly to
make it obvious what we're doing.)
This commit also tweaks the example CMake command line in the directory
variable documentation so that it shows the correct location of the
CMake argument.
YASM requires a debug format to be specified with -g. Currently the
only combination that I can make work at all is DWARF-2/ELF (YASM
doesn't support Mach-O debugging at all, and its support for CV8/MSVC
and MinGW/DWARF-2 appears to be broken), so debugging is only enabled
automatically for ELF at the moment. For other formats, we don't
specify -g at all, which is how the old build system behaved.
Fixes#125, Closes#126
This builds upon the existing GNUInstallDirs module in CMake but adds
the following features to that module:
- The ability to override the defaults for each install directory
through a new set of variables (`CMAKE_INSTALL_DEFAULT_*DIR`).
Before operating system vendors began shipping libjpeg-turbo, it was
meant to be a run-time drop-in replacement for the system's
distribution of libjpeg, so it has traditionally installed itself
under /opt/libjpeg-turbo on Un*x systems by default. On Windows, it
has traditionally installed itself under %SystemDrive%\libjpeg-turbo*,
which is not uncommon behavior for open source libraries (open source
SDKs tend to install outside of the Program Files directory so as to
avoid spaces in the directory name.) At least in the case of Un*x,
the install directory behavior is based somewhat on the Solaris
standard, which requires all non-O/S packages to install their files
under /opt/{package_name}. I adopted that standard for VirtualGL and
TurboVNC while working at Sun, because it allowed those packages to be
located under the same directory on all platforms. I adopted it for
libjpeg-turbo because it ensured that our files would never conflict
with the system's version of libjpeg. Even though many Un*x
distributions ship libjpeg-turbo these days, not all of them ship the
TurboJPEG API library or the Java classes or even the latest version
of the libjpeg API library, so there are still many cases in which it
is desirable to install a separate version of libjpeg-turbo than the
one installed by the system. Furthermore, installing the files under
/opt mimics the directory structure of our official binary packages,
and it makes it very easy to uninstall libjpeg-turbo.
For these reasons, our build system needs to be able to use
non-GNU-compliant defaults for each install directory if
`CMAKE_INSTALL_PREFIX` is set to the default value.
- For each directory variable, the module now detects changes to
`CMAKE_INSTALL_PREFIX` and changes the directory variable accordingly,
if the variable has not been changed by the user.
This makes it easy to switch between our "official" directory
structure and the GNU-compliant directory structure "on the fly"
simply by changing `CMAKE_INSTALL_PREFIX`. Also, this new mechanism
eliminated the need for the crufty mechanism that previously did the
same thing just for the library directory variable.
How it should work:
- If a dir variable is unset, then the module will set an internal
property indicating that the dir variable was initialized to its
default value.
- If the dir variable ever diverges from its default value, then the
internal property is cleared, and it cannot be set again without
unsetting the dir variable.
- If the install prefix changes, and if the internal property
indicates that the dir variable is still set to its default value,
and if the dir variable's value is not being manually changed at the
same time that the install prefix is being changed, then the dir
variable's value is automatically changed to the new default value
for that variable (as determined by the new install prefix.)
- The directory variables are now always cached, regardless of whether
they were set on the command line or not. This ensures that they can
easily be examined and modified after being set, regardless of how they
were set.
This was made possible by the introduction of the aforementioned
`CMAKE_INSTALL_DEFAULT_*DIR` variables.
- Improved directory variable documentation (based on descriptions at
https://www.gnu.org/prep/standards/html_node/Directory-Variables.html)
- The module now allows "<DATAROOTDIR>" to be used as a placeholder in
relative directory variables.
It is replaced "on the fly" with the actual path of
`CMAKE_INSTALL_DATAROOTDIR`.
This should more closely mimic the behavior of the old autotools build
system while retaining our customizations to it, and it should retain
the behavior of the old CMake build system.
Closes#124
Strict C89-conformant compilers don't support the "inline" keyword, but
most of them support "__inline__", and that keyword can be used with the
always_inline atribute as well. This commit also removes duplicate code
by using a foreach() loop to test the various keywords.
This is a subtle point, but AC_C_INLINE defines "inline" to be either
"inline", "__inline__", or "__inline". The subsequent test for
"inline __attribute__((always_inline))" uses this definition. The
attribute is irrespective of the inline keyword, so whereas
"__inline__ __attribute__((always_inline))" works under C89,
"inline __attribute__((always_inline))" doesn't, and defining INLINE to
the latter causes the build to fail. The easiest way around this is
simply to define "inline" ahead of "INLINE" in jconfigint.h,
which causes the inline keyword detected by AC_C_INLINE to modify the
INLINE macro if necessary.
+ advertise that full AltiVec SIMD acceleration is now available on
OpenBSD.
The relevant compilers probably all support C99 or GNU's variation of
C90 that allows variables to be declared anywhere, but our policy is to
conform to the C90 standard, if for no other reason than that it
improves code readability.
Fixes:
rdppm.c:257:14: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
if (temp > maxval)
~~~~ ^ ~~~~~~
rdppm.c:284:14: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
if (temp > maxval)
~~~~ ^ ~~~~~~
rdppm.c:289:14: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
if (temp > maxval)
~~~~ ^ ~~~~~~
rdppm.c:294:14: warning: comparison of integers of different signs: 'int' and 'unsigned int' [-Wsign-compare]
if (temp > maxval)
- Replace CMAKE_SOURCE_DIR with CMAKE_CURRENT_SOURCE_DIR
- Replace CMAKE_BINARY_DIR with CMAKE_CURRENT_BINARY_DIR
- Don't use "libjpeg-turbo" in any of the package system filenames
(because CMAKE_PROJECT_NAME will not be the same if building LJT as
a submodule.)
Closes#122
The xcode7.2 image is verfallen, verlumpt, verblunget, verkackt
This also ensures that the build scripts are checked out from a
branch matching the libjpeg-turbo repository branch (not strictly
necessary when building from master, but it keeps the code in sync with
dev.)
The previous hack (adding ${CMAKE_ASM_COMPILER} to CMAKE_ASM_FLAGS)
didn't work in all cases, because more recent versions of CMake place
the includes ahead of the flags (which meant that the real assembler
wasn't the first argument to gas-preprocessor.pl.)
CMAKE_INSTALL_RPATH has to be set before the targets are defined (oops.)
This also explicitly turns on MACOSX_RPATH for the shared libraries
(which is the default with newer versions of CMake but not with 2.8.x.)
The old autotools/libtool build system hard-coded the install name
directory of the OS X shared libraries to libdir, which meant that any
executable that linked against those libraries would also be hard-coded
to look for the libjpeg-turbo libraries in that directory. @rpath makes
the OS X version of libjpeg-turbo behave like the Linux version, in the
sense that the executables under /opt/libjpeg-turbo/bin will
automatically pick up the libraries under /opt/libjpeg-turbo/lib* by
default, but other executables won't unless they are linked with -rpath.
AppVeyor already has MinGW32 and MinGW64 flavors of GCC 5.3.0
installed under MSYS2, so there is no need to install our own builds of
MinGW. MinGW-builds is no longer an active project, and we were getting
occasional timeouts while wgetting those files from SourceForge.
Furthermore, GCC 5.3.0 should produce faster code than GCC 4.8.1.
Updated out-of-date information, wordsmithed and clarified many
sections, and generally cleaned up the build recipes (including a
complete overhaul of the iOS recipes.)
Regression caused by f9134384b7
This commit also makes the "testclean" target clean up the 4:1:1 test
images. This was implemented in the autotools build system in
1f3635c496 but was left out of the CMake
build system due to an oversight.
This has the following advantages:
-- It doesn't require checking a private SSH key into the repository.
(With SourceForge, an SSH key is the "keys to the kingdom".)
-- If the S3 key is compromised, it is very easy to revoke it and
generate a new one.
-- The S3 bucket is isolated, so even if it becomes compromised, then
the damage that one could do is limited.
-- It's much easier to manage files through S3's web interface than
through SourceForge.
-- The files are served via HTTPS.
-- Travis fully supports S3 as a deployment target, so this simplifies
.travis.yml somewhat.
Since we're still deploying our Linux/macOS CI artifacts to a web server
(specifically SourceForge Project Web Services) that doesn't support
HTTPS, it's a good idea to sign them. But since the private key has to
be checked into the repository, we use a different key for signing the
pre-releases (per project policy, the private signing keys for our
release binaries are never made available on any public server.)
Previously, simd/CMakeLists.txt was hard-coded to use NASM, and it was
necessary to override the NASM variable in order to use YASM. This
commit changes the behavior such that NASM is still preferred, but YASM
will be used if it is in the PATH and NASM isn't available. This brings
the actual behavior in line with the behavior described in BUILDING.md.
Based on
b0799a1598Closes#107
-- Use trusty for SIMD builds. Ubuntu 12.04 is still using NASM 2.09.x,
which isn't new enough to support AVX2.
-- Add a special test for the SSE2 code path, since it is no longer the
default.
Pass the actual repository and branch that Travis is using into the
builtljt script, so the official builds it generates will come from
the same code base as the other tested builds.
- Introduce a new FLOATTEST value ("387") on Un*x systems that will
compare the floating point DCT/IDCT algorithms against the expected
results from the C algorithms when built using 32-bit code and
-mfpmath=387.
- Extend the Windows regression tests so that they work properly when
building libjpeg-turbo with 32-bit code and without SIMD, using either
Visual C++ (tested with 2008, 2010, 2015) or MinGW.
Based on
98a5a9dc89
with wordsmithing by DRC.
In the AArch64 ABI, as in many others, it's forbidden to read/store data
below the stack pointer. Some SIMD functions were doing just that
(stack pointer misuse) when trying to preserve callee-saved registers,
and this resulted in those registers being restored with incorrect
contents under certain circumstances.
This patch fixes that behavior, and callee-saved registers are now
stored above the stack pointer throughout the function call. The patch
also removes register saving in places where it is unnecessary for this
ABI, or it makes use of unused scratch regiters instead of callee-saved
registers.
Fixes#97. Closes#101.
Refer also to https://bugzilla.redhat.com/show_bug.cgi?id=1368569
The last iDevice to require ARMv6 was the iPhone 3G, which required iOS
4.2.1 or older. Our binaries have always required iOS 4.3 or newer,
so I'm not sure if the ARMv6 fork of our binaries was ever useful to
begin with. In any case, if it ever was useful, it no longer is. Fat
binaries can still be generated with ARMv6 support by invoking
{build_directory}/pkgscripts/makemacpkg manually.
Reported by Clang UBSan (refer to
https://bugzilla.mozilla.org/show_bug.cgi?id=1301252 for test image.)
This appears to be a legitimate bug introduced by
3ab68cf563. Any component array, such
as first_MCU_col and last_MCU_col, should always be able to accommodate
MAX_COMPONENTS values. The aforementioned test image had 8 components,
which was not enough to make the out-of-bounds write bust out of the
jpeg_decomp_master struct (and fortunately the memory after last_MCU_col
is an integer used as a boolean, so stomping on it will do nothing other
than change the decoder state.) I crafted another special image that
has 10 components (the maximum allowable), but that was apparently not
enough to bust out of the allocated memory, either. Thus, it is
posited that the security threat posed by this bug is either extremely
minimal or non-existent.
When attempting to decode a malformed JPEG image (refer to
https://bugzilla.mozilla.org/show_bug.cgi?id=1295044) with dimensions
61472 x 32800, the maximum_space variable within the
realize_virt_arrays() function will exceed the maximum value of a 32-bit
integer and will wrap around. The memory manager subsequently fails
with an "Insufficient memory" error (case 4, in alloc_large()), so this
commit simply causes that error to be triggered earlier, before UBSan
has a chance to complain.
Note that this issue did not ever represent an exploitable security
threat, because the POSIX-based memory manager that we use doesn't ever
do anything meaningful with the value of maximum_space.
jpeg_mem_available() simply sets avail_mem = maximum_space, so the
subsequent behavior of the memory manager is the same regardless of
whether maximum_space is correct or not. This commit simply removes a
UBSan warning in order to make it easier to detect actual security
issues.
Normally, 4:2:2 JPEGs have horizontal x vertical luminance,chrominance
sampling factors of 2x1,1x1, and 4:4:0 JPEGs have horizontal x vertical
luminance,chrominance sampling factors of 1x2,1x1. However, it is
technically legal to create 4:2:2 JPEGs with sampling factors of
2x2,1x2 and 4:4:0 JPEGs with sampling factors of 2x2,2x1, since the
sums of the products of those sampling factors (2x2 + 1x2 + 1x2 and
2x2 + 2x1 + 2x1) are still <= 10. The libjpeg API correctly decodes
such images, so the TurboJPEG API should as well.
Fixes#92
Currently, this only affects ARM, since it is the only platform that
accelerates YCbCr-to-RGB conversion but not merged upsampling. Even if
"plain" upsampling isn't accelerated, the combination of accelerated
color conversion + unaccelerated plain upsampling is still faster than
the unaccelerated merged upsampling algorithms.
Closes#81
This allows fancy upsampling to be used when decompressing 4:2:2 images
that have been losslessly rotated or transposed.
(docs and comments added by DRC)
Based on f63aca945dCloses#89
cpuid tells us whether the O/S uses extended state management via
XSAVE/XRSTOR, but we have to call xgetbv to verify that it is using
XSAVE/XRSTOR to manage the state of XMM/YMM registers.
In the AArch64 ABI, the high (unused) DWORD of a 32-bit argument's
register is undefined, so it was incorrect to use 64-bit
instructions to transfer a JDIMENSION argument in the 64-bit NEON SIMD
functions. The code worked thus far only because the existing compiler
optimizers weren't smart enough to do anything else with the register in
question, so the upper 32 bits happened to be all zeroes.
The latest builds of Clang/LLVM have a smarter optimizer, and under
certain circumstances, it will attempt to load-combine adjacent 32-bit
integers from one of the libjpeg structures into a single 64-bit integer
and pass that 64-bit integer as a 32-bit argument to one of the SIMD
functions (which is allowed by the ABI, since the upper 32 bits of the
32-bit argument's register are undefined.) This caused the
libjpeg-turbo regression tests to crash.
This patch tries to use the Wn registers whenever possible. Otherwise,
it uses a zero-extend instruction to avoid using the upper 32 bits of
the 64-bit registers, which are not guaranteed to be valid for 32-bit
arguments.
Based on 1fbae13021Closes#91. Refer also to android-ndk/ndk#110 and
https://llvm.org/bugs/show_bug.cgi?id=28393
This fixes crashes that would occur when attempting to use
libjpeg-turbo's AVX2 extensions on older O/S's (such as Windows XP or
RHEL 5.) Even if the CPU supports AVX2, the O/S has to also support
saving/restoring YMM registers when switching contexts.
This eliminates "illegal instruction" errors when running libjpeg-turbo
under Linux on PowerPC chips that lack AltiVec support (e.g. the old
7XX/G3 models but also the newer e5500 series.)
The JSIMD_FORCE* environment variables previously meant "force the use
of this instruction set if it is available but others are available as
well", but that did nothing on ARM platforms, since there is only ever
one instruction set available. Since the ARM and MIPS CPU feature
detection code is less than bulletproof, and since there is only one
SIMD instruction set (currently) supported on those platforms, it makes
sense for the JSIMD_FORCE* environment variables on those platforms to
actually force the use of the SIMD instruction set, thus bypassing the
CPU feature detection code.
This addresses a concern raised in #88 whereby parsing /proc/cpuinfo
didn't work within a QEMU environment. This at least provides a
workaround, allowing users to force-enable or force-disable SIMD
instructions for ARM and MIPS builds of libjpeg-turbo.
This commit adds back instructive comments in the image-space
algorithms, similar to those in the SSE2 code. These comments make it
easier to follow the flow of data through the algorithms.
Expand collect_args/uncollect_args macros so that the number of
arguments can be specified. This prevents unnecessary push and mov
instructions.
NOTE: On Windows, the push/pop of xmm6 and xmm7 had to be moved to the
other end of the macro to ensure that rsp is aligned on a 16-byte
boundary.
28d1a1300c introduced the line
"nasm.exe should be in your PATH". This commit corrects an oversight in
8f1c0a681c /
e5091f2cf3 whereby this line should have
been extended to include yasm.exe.
The IJG convention is to format copyright notices as:
Copyright (C) YYYY, Owner.
We try to maintain this convention for any code that is part of the
libjpeg API library (with the exception of preserving the copyright
notices from Cendio's code verbatim, since those predate
libjpeg-turbo.)
Note that the phrase "All Rights Reserved" is no longer necessary, since
all Buenos Aires Convention signatories signed onto the Berne Convention
in 2000. However, our convention is to retain this phrase for any files
that have a self-contained copyright header but to leave it off of any
files that refer to another file for conditions of distribution and use.
For instance, all of the non-SIMD files in the libjpeg API library refer
to README.ijg, and the copyright message in that file contains "All
Rights Reserved", so it is unnecessary to add it to the individual
files.
The TurboJPEG code retains my preferred formatting convention for
copyright notices, which is based on that of VirtualGL (where the
TurboJPEG API originated.)
Calling jpeg_stdio_dest() followed by jpeg_mem_dest(), or jpeg_mem_src()
followed by jpeg_stdio_src(), is dangerous, because the existing opaque
structure would not be big enough to accommodate the new source/dest
manager. This issue was non-obvious to libjpeg-turbo consumers, since
it was only documented in code comments. Furthermore, the issue could
also occur if the source/dest manager was allocated by the calling
program, but it was not allocated with enough space to accommodate the
opaque stdio or memory source/dest manager structs. The safest thing to
do is to throw an error if one of these functions is called when there
is already a source/dest manager assigned to the object and it was
allocated elsewhere.
Closes#78, #79
The jpeg-7/jpeg-8 APIs/ABIs require arithmetic coding, and the jpeg-8
API/ABI requires the memory source/destination manager, so this commit
causes the build system to ignore --with-arith-enc/--without-arith-enc
and --with-arith-dec/--without-arith-dec (and the equivalent CMake
variables-- WITH_ARITH_ENC and WITH_ARITH_DEC) when v7/v8 API/ABI
emulation is enabled. Furthermore, the CMake build system now ignores
WITH_MEM_SRCDST whenever WITH_JPEG8 is specified (the autotools build
system already did that.)
GCC does support UAL syntax (strbeq) if the ".syntax unified" directive
is supplied. This directive is supported by all versions of GCC and
clang going back to 2003, so it should not create any backward
compatibility issues.
Based on 1264349e2fCloses#76
If wmic.exe wasn't available, then CMakeLists.txt would call
"cmd /C date /T" and parse the result in order to set the BUILD
variable. However, the parser assumed that the date was in MM/DD/YYYY
format, which is not generally the case unless the user's locale is U.S.
English with the default region/language settings for that locale.
This commit modifies CMakeLists.txt such that it uses the
string(TIMESTAMP) function available in CMake 2.8.11 and later to set
the BUILD variable, thus eliminating the need to use wmic.exe or any
other platform-specific hack.
This commit also modifies the build instructions to remove any reference
to CMake 2.6 (which hasn't been supported by our build system since
libjpeg-turbo 1.3.x.)
Closes#74
* libjpeg-turbo/master: (140 commits)
Increase severity of tjDecompressToYUV2() bug desc
Catch libjpeg errors in tjDecompressToYUV2()
BUILDING.md: Fix "... OR ..." indentation again
BUILDING.md: Fix confusing Windows build reqs
ChangeLog.md: Improve readability of plain text
change.log: Refer users to ChangeLog.md
Markdown version of ChangeLog.txt
Rename ChangeLog.txt
README.md: Link to BUILDING.md
BUILDING.md and README.md: Cosmetic tweaks
ChangeLog: "1.5 beta1" --> "1.4.90 (1.5 beta1)"
Java: Fix parallel make with autotools
Win/x64: Fix improper callee save of xmm8-xmm11
Bump TurboJPEG C API revision to 1.5
ChangeLog: Mention jpeg_crop_scanline() function
1.5 beta1
Fix v7/v8-compatible build
libjpeg API: Partial scanline decompression
Build: Make the NASM autoconf variable persistent
Use consistent/modern code formatting for dbl ptrs
...
* libjpeg-turbo/1.4.x: (94 commits)
CMakeLists.txt: Clarify that Un*x isn't supported
Catch libjpeg errors in tjDecompressToYUV2()
cjpeg: Fix buf overrun caused by bad bin PPM input
Add version/build info to global string table
Ensure that default Huffman tables are initialized
Fix memory leak when running tjunittest -yuv
Prevent overread when decoding malformed JPEG
Guard against wrap-around in alloc functions
Fix Visual C++ compiler warnings
rdppm.c: formatting tweaks
jmemmgr.c: formatting tweaks
TurboJPEG: Avoid dangling pointers
Update Android build instr. for ARMv8, PIE, etc.
Makefile.am: formatting tweak
Update build instructions for new autoconf, GitHub
1.4.3
Regression: Allow co-install of 32-bit/64-bit RPMs
Build: Use FILEPATH type for NASM CMake variable
Comment formatting tweaks
Fix 'make dist'
...
Tag 1.4.1 release
* tag '1.4.1': (427 commits)
Now that the TurboJPEG API is reporting libjpeg warnings as errors, an "Invalid SOS parameters for sequential JPEG" warning surfaced in tjDecodeYUV*(). This was caused by the Se member of jpeg_decompress_struct being set to 0 (it is normally set to a non-zero value when the start-of-scan markers are read, but there are no SOS markers in this case, because we're not actually decompressing a JPEG file.)
Fix a segfault that occured in the MIPS DSPr2 fancy upsampling routine when downsampled_width==3. Because the DSPr2 code unrolls the loop for the middle columns (refer to jdsample.c), it has the effect of performing two column iterations, and that only works properly if the number of columns (minus the first and last) is >= 2. For the specific case of downsampled_width==3, this patch skips to the second iteration of the unrolled column loop.
If a warning (such as "Premature end of JPEG file") is triggered in the underlying libjpeg API, make sure that the TurboJPEG API function returns -1. Unlike errors, however, libjpeg warnings do not make the TurboJPEG functions abort.
Back out r1555 and r1548. Using setenv() didn't fix the iOS simulator issue. It just replaced an undefined _putenv$UNIX2003 symbol with an undefined _setenv$UNIX2003 symbol. The correct solution seems to be to use -D_NONSTD_SOURCE when generating our official builds.
Fix the Windows build. I remember now why I used putenv() originally-- because Windows doesn't have setenv(). We could use _putenv_s(), but older versions of MinGW don't have that either. Fortunately, since all of the environment values we're setting in turbojpeg.c are static, we can just map setenv() to putenv() using a macro. NOTE: we still have to use _putenv_s() in turbojpeg-jni.c, but at least people who may need to build with an older version of MinGW can still do so by disabling the Java build.
Allow building only static or only shared libraries on Windows
__WORDSIZE doesn't seem to be available on platforms other than Mac or Linux, and best practices are for user-level code not to rely on it anyhow, since it's meant to be an internal macro. Fortunately, autoconf already has a way of determining the word size at configure time, so it can be passed into the compiler. This should work on any platform and has been tested on all of the Un*x platforms we support (Linux, Mac, FreeBSD, Solaris.)
Unless you define _ANSI_SOURCE, then putenv() on Mac is renamed to putenv$UNIX2003(), and this causes problems when trying to link an i386 iOS application (for the simulator) against the TurboJPEG static library. It's easiest to just use setenv() instead.
Fix a bug in the 64-bit Huffman encoder that Google discovered when encoding some very specific (and proprietary) aerial images using quality=98, an optimized Huffman table, and the ISLOW DCT. These images were causing the Huffman bit buffer to overflow, because the code for encoding the DC coefficient was using the equivalent of the 32-bit version of EMIT_BITS(). Thus, when 64-bit code was used, the DC coefficient code was not properly checking how many bits were in the buffer before attempting to add more bits to it. This issue appears to have existed in all versions of libjpeg-turbo.
Restore backward compatibility with MSVC < 2010 (broken by r1541)
Oops. OS X doesn't define __WORDSIZE unless you include stdint.h, so apparently the Huffman codec hasn't ever been fully accelerated on 64-bit OS X.
Allow the executables and libraries outside of the sharedlib/ directory to be linked against msvcr*.dll instead of libcmt*.lib. This is reported to be necessary when building libjpeg-turbo for use with C#.
Surround the usage of getenv() in the TurboJPEG API with #ifndef NO_GETENV so that developers can add -DNO_GETENV to the C flags when building for platforms that don't have getenv(). Currently this is known to be necessary when building for Windows Phone.
If libjpeg-turbo is configured with a non-default prefix, such as /usr, then use the docdir variable defined by autoconf 2.60 and later, if available. This will, for instance, install the documentation under /usr/share/doc/libjpeg-turbo by default if prefix=/usr, unless docdir is overridden. When using earlier versions of autoconf, docdir is set to ${datadir}/doc, as it always has been.
Enable silent build rules for the NASM objects, if the source is configured with automake 1.11 or later. NOTE: the build still spits out "error: ignoring unknown tag NASM" for each object, but unfortunately, if we remove "--tag NASM" from the command line, the build breaks under older versions of automake (it aborts with "unable to infer tagged configuration.")
Set the RPM and deb architecture properly on non-x86 platforms.
Come on, Cohaagen, you got what you want. Give these people air!
Oops. Need to set the alpha channel when using TYPE_4BYTE_ABGR*. This has no bearing on the actual tests, but it prevents the PNG pre-encode reference images for those tests from being blank.
Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error.
Introduce fast paths to speed up NULL color conversion somewhat, particularly when using 64-bit code; on the decompression side, the "slow path" also now use an approach similar to that of the compression side (with the component loop outside of the column loop rather than inside.) This is faster when using 32-bit code.
...
* origin/master: (108 commits)
Bump version number to 3.1.
jpegyuv: fix memory leak when path is invalid
jpegyuv: fix memory leak when @image_buffer allocation fails
yuvjpeg: fix memory leak when @image_buffer allocation fails
jpegtran: Do not leak the input and output buffers
Fix previous commit
Scan optimization: return error when unable to copy data buffer
cjpeg option for baseline quant tables
Fix#153
rdpng: convert 16-bit input to 8-bit
Larger number of DC trellis candidates
Fix overflow issue #157
Const on getters
Const on simple getters and copy source
Expanded .gitignore
Add pkg-config requirement
Re-order links.
Declare inbuffer const
Oops. Delete the duplicate copy of [lib]turbojpeg.dll in the binary directory when uninstalling the package.
Get rid of changelog file that we don't update.
...
* commit 'eca0637c8150d3d1c08a60c64d7ee16eaea4b198':
Remove trailing spaces
Another oops. tjBufSizeYUV2() should return -1 if width < 1.
Oops. tjPlaneSizeYUV() should return -1 if componentID > 0 and subsamp==TJSAMP_GRAY.
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
Fix Windows build
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
Fix build on OS X PowerPC platforms
Oops. Forgot to alter the version header in the change log to indicate the release of 1.4 beta.
Create 1.4.x branch
At one time, it was possible to use CMake to build under Cygwin, but
that hasn't worked since 1.4.1 (due to the Huffman codec changes that
now require SIZEOF_SIZE_T to be defined for non-WIN32 platforms) and may
have even been broken before that. Originally, we used the "date"
command under MSYS in order to obtain the default build number, but that
was rendered unnecessary by 5e3bb3e9 (v1.3 beta.) 9fe22dac (1.4 beta)
further modified CMakeLists.txt so that the "date" command was only used
on Cygwin, but for unexplained reasons, that commit also applied the
(now vestigial) code to all non-WIN32 platforms. This prevented
CMakeLists.txt from displaying an error if someone attempted to use the
CMake build system on Un*x platforms, and that may have been behind the
flurry of pull requests and issues-- including #21, #29, #37, #58, #73--
complaining that the CMake build system didn't work on Un*x platforms
(although it was not until #73 that this bug came to light.)
This commit removes all vestiges of Un*x support from the CMake build
system and makes it clear that CMake cannot be used to build
libjpeg-turbo on non-WIN32 platforms. It is our position that CMake
will not be supported on non-WIN32 platforms until/unless the autotools
build system is removed, and this will not happen without broad support
from the community (including major O/S vendors.) If you are in favor
of migrating the entire build system to CMake, then please make your
voice heard by commenting on #56.
Even though tjDecompressToYUV2() is mostly just a wrapper for
tjDecompressToYUVPlanes(), tjDecompressToYUV2() still calls
jpeg_read_header(), so it needs to properly set up the libjpeg error
handler prior to making this call. Otherwise, under very esoteric (and
arguably incorrect) use cases, a program could call tjDecompressToYUV2()
without first checking the JPEG header using tjDecompressHeader3(), and
if the header was corrupt, then the libjpeg API would invoke
my_error_exit(). my_error_exit() would in turn call longjmp() on the
previous value of myerr->setjmp_buffer, which was probably set in a
previous TurboJPEG function, such as tjInitDecompress(). Thus, when a
libjpeg error was triggered within the body of tjDecompressToYUV2(), the
PC would jump to the error handler of the previous TurboJPEG function,
and this usually caused stack corruption in the calling program (because
the signature and return type of the previous TurboJPEG function
probably wasn't the same as that of tjDecompressToYUV2().)
Actually, what happened was that the longjmp() call within
my_error_exit() acted on the previous value of myerr->setjmp_buffer,
which was probably set in a previous TurboJPEG function, such as
tjInitDecompress(). Thus, when a libjpeg error was triggered within
the body of tjDecompressToYUV2(), the PC jumped to the error handler
of the previous TurboJPEG function, and this usually caused stack
corruption in the calling program (because the signature and return
type of the previous TurboJPEG function probably wasn't the same.)
Even though tjDecompressToYUV2() is mostly just a wrapper for
tjDecompressToYUVPlanes(), tjDecompressToYUV2() still calls
jpeg_read_header(), so it needs to properly set up the libjpeg error
handler prior to making this call. Otherwise, under very esoteric (and
arguably incorrect) use cases, a program can call tjDecompressToYUV2()
without first checking the JPEG header using tjDecompressHeader3(), and
if the header is corrupt, tjDecompressToYUV2() will abort without
triggering an error.
Fixes#72
<sigh> GitHub doesn't render indented text the same as my local MarkDown
viewer (MacDown), so it's necessary to indent "... OR ..." by 3 spaces
so both will display it on the same indentation level as "Visual C++
2005 or later" and "MinGW".
Indent "... OR ..." to make it clear that the choice is between Visual
C++ and MinGW, not Visual C++ and MinGW + NASM. Move NASM to the top of
the list to make that even more clear. Make it clear that nasm.exe
should be in the PATH.
Addresses concerns raised in #70
This extends the fix in 6709e4a0cf to
include binary PPM/PGM files, thus preventing a malformed binary
PPM/PGM input file from triggering an overrun of the rescale array and
potentially crashing cjpeg.
Note that this issue affected only cjpeg and not the underlying
libjpeg-turbo libraries, and thus it did not represent a security
threat.
Thanks to @hughdavenport for the discovery.
This is a common practice in other infrastructure libraries, such as
OpenSSL and libpng, because it makes it easy to examine an application
binary and determine which version of the library the application was
linked against.
Closes#66
This prevents a malformed motion-JPEG frame (MJPEG frames lack Huffman
tables) from causing the "fast path" of the Huffman decoder to read
uninitialized memory. Essentially, this is doing the same thing for
MJPEG frames as 43d8cf4d45 did for regular
images.
Running 'make -j{jobs}' on a build that was configured with Java
(--with-java) would previously cause an error:
make: *** No rule to make target `TJExample.class', needed by
`turbojpeg.jar'.
It seems that parallel make doesn't understand that the files in
$(JAVA_CLASSES) are all generated from the same invocation of javac, so
it tries to parallelize the building of those files (which of course
doesn't work.) This patch instead makes turbojpeg.jar depend on
classnoinst.stamp. This effectively creates a synchronization fence,
since that file is only created when all of the class files have been
built.
Fixes#62
The x86-64 SIMD accelerations for Huffman encoding used incorrect
stack math to save xmm8-xmm11 on Windows. This caused TJBench to
always report 1 Mpixel/sec for the compression performance, and it
likely would have caused other application issues as well.
The changes relative to 1.4.x are only cosmetic (using const pointers)
and should not affect API/ABI compatibility, but our practice is to
synchronize the API revision with the most recent release that provides
user-visible changes to the API.
This, in combination with the existing jpeg_skip_scanlines() function,
provides the ability to crop the image both horizontally and vertically
while decompressing (certain restrictions apply-- see libjpeg.txt.)
This also cleans up the documentation of the line skipping feature and
removes the "strip decompression" feature from djpeg, since the new
cropping feature is a superset of it.
Refer to #34 for discussion.
Closes#34
Previously, if a custom value of this variable was specified when
running configure, then that value would be lost if configure was
automatically re-run (as a result of changes to configure.ac, for
instance.)
As a bonus, the NASM variable is now also listed when running
'configure --help', so it is obvious how to override the default
NASM command.
The convention used by libjpeg:
type * variable;
is not very common anymore, because it looks too much like
multiplication. Some (particularly C++ programmers) prefer to tuck the
pointer symbol against the type:
type* variable;
to emphasize that a pointer to a type is effectively a new type.
However, this can also be confusing, since defining multiple variables
on the same line would not work properly:
type* variable1, variable2; /* Only variable1 is actually a
pointer. */
This commit reformats the entirety of the libjpeg-turbo code base so
that it uses the same code formatting convention for pointers that the
TurboJPEG API code uses:
type *variable1, *variable2;
This seems to be the most common convention among C programmers, and
it is the convention used by other codec libraries, such as libpng and
libtiff.
Place the authors in the following order:
* libjpeg-turbo authors (2009-) in descending order of the date of their
most recent contribution to the project, then in ascending order of
the date of their first contribution to the project
* Upstream authors in descending order of the date of the first
inclusion of their code (this indicates that their code serves as the
foundation of this code.)
This also adds Siarhei to the author list, since he contributed ARM SIMD
code both as a Nokia employee and more recently as an independent
developer.
We need to garbage collect between iterations of the outside loop in
bufSizeTest() in order to avoid exhausting the heap when running with
Java 6 (which is still used on Linux to test the 32-bit version of
libjpeg-turbo in automated builds.)
Broken by 46ecffa324.
gas-preprocessor.pl and/or the clang assembler apparently don't like
default values in macro arguments, and we need to use a separate const
section for each function (because of our use of adr, also necessitated
by the broken clang assembler.)
... and only if ThunderX is detected. This can be easily expanded later
on to include other CPUs that are known to suffer from slow LD3/ST3, but
it doesn't make sense to disable LD3/ST3 for all non-Android Linux
platforms just because ThunderX is slow.
Full-color compression speedups relative to previous commits:
Cortex-A53 (Nexus 5X), Android, 64-bit: 1.1-13% (avg. 6.0%)
Cortex-A57 (Nexus 5X), Android, 64-bit: 0.0-22% (avg. 6.3%)
Refer to #47 and #50 for discussion
Closes#50
Note that this commit introduces a similar /proc/cpuinfo parser to that
of the ARM32 implementation. It is used to specifically check whether
the code is running on Cavium ThunderX and, if so, disable the ARM64
SIMD Huffman routines (which slow performance by an average of 8% on
that CPU.)
Based on:
a8c282e5e5
Don't include the all: target as a dependency of the tests when
cross-compiling, and ensure that the files generated by the tests are
removed, even if they were created read-only (or if the tests are being
run on a different type of system that doesn't correctly interpret the
file permissions.) This allows one to easily build the code on one
machine and run 'make test' on another.
When cross-compiling, CMakeLists.txt now generates the CTest script
using relative paths, so that CTest can more easily be executed on a
different machine from the build machine. Furthermore, Windows builds
are now tested using md5cmp, just like on Linux, rather than a CMake
script. This prevents issues with differing CMake locations between
the build and test machines.
This also removes some trailing spaces from the md5cmp code and improves
the readability of the test code in CMakeLists.txt.
jinclude.h can't be safely included multiple times, so instead of
including it in the shared (broken-out) headers, it should instead be
included by the source files that include one or more of those headers.
The accelerated Huffman decoder was previously invoked if there were
> 128 bytes in the input buffer. However, it is possible to construct a
JPEG image with Huffman blocks > 430 bytes in length
(http://stackoverflow.com/questions/2734678/jpeg-calculating-max-size).
While such images are pathological and could never be created by a
JPEG compressor, it is conceivable that an attacker could use such an
artifially-constructed image to trigger an input buffer overrun in the
libjpeg-turbo decompressor and thus gain access to some of the data on
the calling program's heap.
This patch simply increases the minimum buffer size for the accelerated
Huffman decoder to 512 bytes, which should (hopefully) accommodate any
possible input.
This addresses a major issue (LJT-01-005) identified in a security audit
by Cure53.
Because of the exposed nature of the libjpeg API, alloc_small() and
alloc_large() can potentially be called by external code. If an
application were to call either of those functions with
sizeofobject > SIZE_MAX - ALIGN_SIZE - 1, then the math in
round_up_pow2() would wrap around to zero, causing that function to
return a small value. That value would likely not exceed
MAX_ALLOC_CHUNK, so the subsequent size checks in alloc_small() and
alloc_large() would not catch the error.
A similar problem could occur in 32-bit builds if alloc_sarray() were
called with
samplesperrow > SIZE_MAX - (2 * ALIGN_SIZE / sizeof(JSAMPLE)) - 1
This patch simply ensures that the size argument to the alloc_*()
functions will never exceed MAX_ALLOC_CHUNK (1 billion). If it did,
then subsequent size checks would eventually catch that error, so we
are instead catching the error before round_up_pow2() is called.
This addresses a minor concern (LJT-01-001) expressed in a security
audit by Cure53.
This addresses a minor concern (LJT-01-002) expressed in a security
audit by Cure53. _tjInitCompress() and _tjInitDecompress() call
(respectively) jpeg_mem_dest_tj() and jpeg_mem_src_tj() with a pointer
to a dummy buffer, in order to set up the destination/source manager.
The dummy buffer should never be used, but it's still better to make it
static so that the pointer in the destination/source manager always
points to a valid region of memory.
Document the latest benchmarks on the Nexus 5X and change the "2-4x"
overall claim to "2-6x". The peak performance on x86 platforms was
already closer to 5x, and the addition of SIMD-accelerated Huffman
encoding gave it that extra push over the cliff.
There aren't really any best practices to follow here. I tried as best
as I could to adopt a standard that would ease any future maintenance
burdens. The basic tenets of that standard are:
* Assembly instructions always start on Column 5, and operands always
start on Column 21, except:
- The instruction and operand can be indented (usually by 2 spaces)
to indicate a separate instruction stream.
- If the instruction is within an enclosing .if block in a macro,
it should always be indented relative to the .if block.
* Comments are placed with an eye toward readability. There are always
at least 2 spaces between the end of a line of code and the associated
in-line comment. Where it made sense, I tried to line up the comments
in blocks, and some were shifted right to avoid overlap with
neighboring instruction lines. Not an exact science.
* Assembler directives and macros use 2-space indenting rules. .if
blocks are indented relative to the macro, and code within the .if
blocks is indented relative to the .if directive.
* No extraneous spaces between operands. Lining up the operands
vertically did not really improve readability-- personally, I think it
made it worse, since my eye would tend to lose its place in the
uniform columns of characters. Also, code with a lot of vertical
alignment is really hard to maintain, since changing one line could
necessitate changing a bunch of other lines to avoid spoiling the
alignment.
* No extraneous spaces in #defines or other directives. In general, the
only extraneous spaces (other than indenting spaces) are between:
- Instructions and operands
- Operands and in-line comments
This standard should be more or less in keeping with other formatting
standards used within the project.
Decompression speedup relative to libjpeg-turbo 1.4.2 (ISLOW IDCT):
48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 60-113% (avg. 86%)
Cortex-A53 (Nexus 5X), Android, 64-bit: 6.8-27% (avg. 14%)
Cortex-A57 (Nexus 5X), Android, 64-bit: 2.0-14% (avg. 6.8%)
Decompression speedup relative to libjpeg-turbo 1.4.2 (IFAST IDCT):
48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 51-98% (avg. 75%)
Minimal speedup (1-5%) observed on iPhone 5S (Cortex-A7)
NOTE: This commit avoids the st3 instruction for non-Android and
non-Apple builds, which may cause a performance regression against
libjpeg-turbo 1.4.x on ARM64 systems that are running plain Linux.
Since ThunderX is the only platform known to suffer from slow ld3 and
st3 instructions, it is probably better to check for the CPU type
at run time and disable ld3/st3 only if ThunderX is detected.
This commit also enables the use of ld3 on Android platforms, which
should be a safe bet, at least for now. This speeds up compression on
the afore-mentioned Nexus Cortex-A53 by 5.5-19% (avg. 12%) and on the
Nexus Cortex-A57 by 1.2-14% (avg. 6.3%), relative to the previous
commits.
This commit also removes unnecessary macros.
Refer to #52 for discussion.
Closes#52.
Based on:
6bad905034488dd7bf174f4d057c1fd3198afc43
* Include information on how to do a 64-bit ARMv8 build with the latest
NDK
* Suggest -fPIE and -pie as default CFLAGS (required for android-16 and
later.
* Remove -fstrict-aliasing flag (-Wall already includes it)
* Include information on how to do a 64-bit ARMv8 build with the latest
NDK
* Suggest -fPIE and -pie as default CFLAGS (required for android-16 and
later.
* Remove -fstrict-aliasing flag (-Wall already includes it)
For whatever reason, the "write" global variable in tjbench.c was
overriding the linkage with the write() system function. This may have
affected other platforms as well but was not known to.
This allows a project to use PKG_CHECK_MODULES() in its configure.ac
file to easily check for the presence of libjpeg-turbo and modify the
compiler/linker flags accordingly. Note that if a project relies solely
on pkg-config to check for libjpeg-turbo, then it will not be possible
to build that project using libjpeg or an earlier version of
libjpeg-turbo.
Closes#53
Based on:
4967138719
Per @ssvb:
ThunderX is an ARM64 chip that dedicates most of its transistor real
estate to providing 48 cores, so each core is not as fast as a result.
Each core is dual-issue & in-order for scalar instructions and has only
a single-issue half-width NEON unit, so the peak throughput is one
128-bit instruction per 2 cycles. So careful instruction scheduling is
important. Furthermore, ThunderX has an extremely slow implementation
of ld2 and ld3, so this commit implements the equivalent of those
instructions using ld1.
Compression speedup relative to libjpeg-turbo 1.4.2:
48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 58-85% (avg. 74%)
relative to jpeg-6b: 1.75-2.14x (avg. 1.95x)
Refer to #49 and #51 for discussion.
Closes#51.
This commit also wordsmiths the ChangeLog entry (the ARMv8 SIMD
implementation is "complete" only for compression-- it still lacks some
decompression algorithms, as does the ARMv7 implementation.)
Based on:
9405b5fd03
which is based on:
f561944ff7962c8ab21f
This adds 64-bit NEON coverage for all of the algorithms that are
covered by the 32-bit NEON implementation, except for h2v1 (4:2:2) fancy
upsampling (used when decompressing 4:2:2 JPEG images.) It also adds
64-bit NEON SIMD coverage for:
* slow integer forward DCT (compressor)
* h2v2 (4:2:0) downsampling (compressor)
* h2v1 (4:2:2) downsampling (compressor)
which are not covered in the 32-bit implementation.
Compression speedups relative to libjpeg-turbo 1.4.2:
Apple A7 (iPhone 5S), iOS, 64-bit: 113-150% (reported)
48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 2.1-33% (avg. 15%)
Refer to #44 and #49 for discussion
This commit also removes the unnecessary
if (simd_support & JSIMD_ARM_NEON)
statements from the jsimd* algorithm functions. Since the jsimd_can*()
functions check for the existence of NEON, the corresponding algorithm
functions will never be called if NEON isn't available.
Based on:
dcd9d84f10b0d87b811f70cd5c8a493e58d9a064837b19542f73dc43ccc8a82b71a261c1b1188c21305c89284e7f443f99954c2b53b77d
Unified version with fixes:
1004a3cd05
Full-color compression speedups relative to libjpeg-turbo 1.4.2:
800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%)
Refer to #42 and #47 for discussion.
This commit also removes the unnecessary
if (simd_support & JSIMD_ARM_NEON)
statements from the jsimd* algorithm functions. Since the jsimd_can*()
functions check for the existence of NEON, the corresponding algorithm
functions will never be called if NEON isn't available. Removing those
if statements improved performance across the board by a couple of
percent.
Based on:
fc023c880c
Partially reverts 54014d9c2a. When
building from a git sandbox, as opposed to from an official source
tarball, it is still necessary to run autoreconf.
Closes#48
The Linux build machine has been upgraded to autoconf 2.69, automake
1.15, m4 1.4.17, and libtool 2.4.6, so it is no longer necessary to
recommend running autoreconf prior to building the source, if one is
building from an official source tarball (as opposed to from a git
sandbox.) Also, there is no SVN repository anymore (oops.)
Full-color compression speedups relative to libjpeg-turbo 1.4.2:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%)
2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%)
2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%)
3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%)
3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%)
2.6 GHz AMD Athlon 64 X2 5050e:
Performance-neutral (give or take a few percent)
Full-color compression speedups relative to IPP:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%)
Refer to #42 for discussion. Numerous other approaches were attempted,
but this one proved to be the most performant across all platforms.
This commit also fixes#3 (works around, really-- the clang-compiled version
of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but
that code is now bypassed by the new SSE2 Huffman algorithm.)
Based on:
2cb4d4133036c94e050d
Traditionally, the x86-64 code did not call init_simd() because it had
no need to (only SSE2 was supported.) However, having the ability to
disable SIMD at run time is a useful testing tool, and all of the other
SIMD implementations have this ability.
Fix a regression introduced in 1.4.1 that prevented 32-bit and 64-bit
libjpeg-turbo RPMs from being installed simultaneously on recent Red
Hat/Fedora distributions. This was due to the addition of the
SIZEOF_SIZE_T macro in jconfig.h, which allows the Huffman codec to
determine the word size at compile time. Since that macro differs
between 32-bit and 64-bit builds, this caused a conflict between the
i386 and x86_64 RPMs (any differing files, other than executables, are
not allowed when 32-bit and 64-bit RPMs are installed simultaneously.)
Since the macro is used only internally, it has been moved into
jconfigint.h.
The unnecessary .arch directive was removed from the ARM64 SIMD code
in d70a5c12fc, thus allowing clang's
integrated assembler to assemble the code on Linux systems. However,
this broke the detection mechanism in acinclude.m4 that tells the build
system whether it needs to use gas-preprocessor.pl. Since one of the
primary motivators for using gas-preprocessor.pl with ARM64 builds is
the lack of .req/.unreq directives in Apple's implementation of clang,
acinclude.m4 now checks whether .req/.unreq can be properly assembled
and uses gas-preprocessor.pl if not.
Closes#33.
Quality values > 95 are not useless. They just may not provide as good
of a size vs. perceptual quality tradeoff as lower quality values. This
also displays the default quality value in the cjpeg usage.
Closes#39
Most of these involved overrunning the signed 32-bit JLONG type whenever
building libjpeg-turbo with a 32-bit compiler. These issues are not
believed to represent actual security threats, but eliminating them
makes it easier to detect such threats should they arise in the future.
These days, INT32 is a commonly-defined datatype in system headers. We
cannot eliminate the definition of that datatype from jmorecfg.h, since
the INT32 typedef has technically been part of the libjpeg API since
version 5 (1994.) However, using INT32 internally is risky, because the
inclusion of a particular header (Xmd.h, for instance) could change the
definition of INT32 from long to int on 64-bit platforms and thus change
the internal behavior of libjpeg-turbo in unexpected ways (for instance,
failing to correctly set __INT32_IS_ACTUALLY_LONG to match the INT32
typedef-- perhaps as a result of including the wrong version of
jpeglib.h-- could cause libjpeg-turbo to produce incorrect results.)
The library has always been built in environments in which INT32 is
effectively long (on Windows, long is always 32-bit, so effectively it's
the same as int), so it makes sense to turn INT32 into an explicitly
long datatype. This ensures that libjpeg-turbo will always behave
consistently, regardless of the headers included at compile time.
Addresses a concern expressed in #26.
The IJG README file has been renamed to README.ijg, in order to avoid
confusion (many people were assuming that that was our project's README
file and weren't reading README-turbo.txt) and to lay the groundwork for
markdown versions of the libjpeg-turbo README and build instructions.
Most of these involved left shifting a negative number, which is
technically undefined (although every modern compiler I'm aware of
will implement this by treating the signed integer as a 2's complement
unsigned integer-- the LEFT_SHIFT() macro just makes this behavior
explicit in order to shut up ubsan.) This also fixes a couple of
non-issues in the entropy codecs, whereby the sanitizer reported an
out-of-bounds index in the 4th argument of jpeg_make_d_derived_tbl().
In those cases, the index was actually out of bounds (caused by a
malformed JPEG image), but jpeg_make_d_derived_tbl() would have caught
the error and aborted prior to actually using the invalid address. Here
again, the fix was to make our intentions explicit so as to shut up
ubsan.
The DSPr2 code was errantly comparing the residual (t9, width & 0xF)
with the end pointer (t4, out + width) instead of the width directly
(a1). This would give the wrong results with any image whose output
width was less than 16. The other small changes (ulw to lw and removal
of the nop) are just some easy optimizations around this code.
This issue caused a buffer overrun and subsequent segfault on images
whose scaled output height was 1 pixel and whose scaled output width was
< 16 pixels. Note that the "plain" (non-fancy and non-merged) upsample
routine, which was affected by this bug, is normally not used except
when decompressing a non-YCbCr JPEG image, but it is also used when
decompressing a single-row image (because the other upsampling
algorithms require at least two rows.)
Closes#16.
(descriptions cribbed by DRC from discussion in #20)
In the x86-64 ABI, the high (unused) DWORD of a 32-bit argument's
register is undefined, so it was incorrect to use a 64-bit mov
instruction to transfer a JDIMENSION argument in the 64-bit SSE2 SIMD
functions. The code worked thus far only because the existing compiler
optimizers weren't smart enough to do anything else with the register in
question, so the upper 32 bits happened to be all zeroes-- for the past
6 years, on every x86-64 compiler previously known to mankind.
The bleeding-edge Clang/LLVM compiler has a smarter optimizer, and
under certain circumstances, it will attempt to load-combine adjacent
32-bit integers from one of the libjpeg structures into a single 64-bit
integer and pass that 64-bit integer as a 32-bit argument to one of the
SIMD functions (which is allowed by the ABI, since the upper 32 bits of
the 32-bit argument's register are undefined.) This caused the
libjpeg-turbo regression tests to crash.
Also enhance the documentation of JDIMENSION to explain that its size
is significant to the implementation of the SIMD code.
Closes#20. Refer also to http://crbug.com/532214.
Previously this information was found in a page on libjpeg-turbo.org,
but there was still some confusion, because README-turbo.txt wasn't
clear as to which license applied to what.
With certain images, compressing using quality=100 and the fast integer
forward DCT will cause the divisor passed to compute_reciprocal() to be
1. In those cases, the library already disables the SIMD quantization
algorithm to avoid 16-bit overflow. However, compute_reciprocal()
doesn't properly handle the divisor==1 case, so we need to use special
values in that case so that the C quantization algorithm will behave
like an identity function.
When compiled with -mfpxx (which is now the default on Debian), there are
some restrictions on the use of odd-numbered FP registers. More details
about FPXX can be found here:
https://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinking
This commit simply changes all uses of FP registers to an even-numbered
equivalent like this:
f0 -> f0
f1 -> f2
f2 -> f4
...
f8 -> f16
This commit should have no observable effect except that the MIPS assembly
will now compile with -mfpxx.
Closes#11
rdbmp.c used the ambiguous INT32 datatype, which is sometimes typedef'ed
to long. Windows bitmap headers use 32-bit signed integers for the
width and height, because height can sometimes be negative (this
indicates a top-down bitmap.) If biWidth or biHeight was negative and
INT32 was a 64-bit long, then biWidth and biHeight were read as a
positive integer > INT32_MAX, which failed the test in line 385:
if (biWidth <= 0 || biHeight <= 0)
ERREXIT(cinfo, JERR_BMP_EMPTY);
This commit refactors rdbmp.c so that it uses the datatypes specified by
Microsoft for the Windows BMP header.
This closes#9 and also provides a better solution for mozilla/mozjpeg#153.
This reassures the caller that the buffers will not be modified and also
allows read-only buffers to be passed to the functions.
Partially reverts 3947a19f25fc8186d3812dbcf8e70baea36ef652.
Declare inbuffer arg in jpeg_mem_src() to be const
This reassures the caller that the buffer will not be modified and also
allows read-only buffers to be passed to the function.
rdbmp.c used the ambiguous INT32 datatype, which is sometimes typedef'ed
to long. Windows bitmap headers use 32-bit signed integers for the
width and height, because height can sometimes be negative (this
indicates a top-down bitmap.) If biWidth or biHeight was negative and
INT32 was a 64-bit long, then biWidth and biHeight were read as a
positive integer > INT32_MAX, which failed the test in line 385:
if (biWidth <= 0 || biHeight <= 0)
ERREXIT(cinfo, JERR_BMP_EMPTY);
This commit refactors rdbmp.c so that it uses the datatypes specified by
Microsoft for the Windows BMP header.
This closes#9 and also provides a better solution for mozilla/mozjpeg#153.
NASM 2.11.08 has a bug that prevents it from properly assembling a
macho64 version of libjpeg-turbo (the resulting binary generates corrupt
images.) 2.11.09 works properly. YASM also works properly and has been
a supported alternative since libjpeg-turbo 1.2.
NASM 2.11.08 has a bug that prevents it from properly assembling a
macho64 version of libjpeg-turbo (the resulting binary generates corrupt
images.) 2.11.09 works properly. YASM also works properly and has been
a supported alternative since libjpeg-turbo 1.2.
NASM 2.11.08 has a bug that prevents it from properly assembling a
macho64 version of libjpeg-turbo (the resulting binary generates corrupt
images.) 2.11.09 works properly. YASM also works properly and has been
a supported alternative since libjpeg-turbo 1.2.
NASM 2.11.08 has a bug that prevents it from properly assembling a
macho64 version of libjpeg-turbo (the resulting binary generates corrupt
images.) 2.11.09 works properly. YASM also works properly and has been
a supported alternative since libjpeg-turbo 1.2.
Under very rare circumstances, decompressing specific corrupt JPEG
images would create a situation whereby GET_BITS(1) was invoked
from within HUFF_DECODE_FAST() when bits_left=0. This produced a right
shift by a negative number of bits, which is undefined in C.
Under very rare circumstances, decompressing specific corrupt JPEG
images would create a situation whereby GET_BITS(1) was invoked
from within HUFF_DECODE_FAST() when bits_left=0. This produced a right
shift by a negative number of bits, which is undefined in C.
Under very rare circumstances, decompressing specific corrupt JPEG
images would create a situation whereby GET_BITS(1) was invoked
from within HUFF_DECODE_FAST() when bits_left=0. This produced a right
shift by a negative number of bits, which is undefined in C.
When using context-based upsampling, use a dummy color conversion
routine instead of a dummy row buffer. This improves performance
(since the actual color conversion routine no longer has to be called),
and it also fixes valgrind errors when decompressing to RGB565.
Valgrind previously complained, because using the RGB565 color
converter with the dummy row buffer was causing a table lookup with
undefined indices.
Under very rare circumstances, decompressing specific corrupt JPEG
images would create a situation whereby GET_BITS(1) was invoked
from within HUFF_DECODE_FAST() when bits_left=0. This produced a right
shift by a negative number of bits, which is undefined in C.
Use a new checked exception type (TJException) when passing through
errors from the underlying C library. This gives the application a
choice of catching all exceptions or just those from TurboJPEG.
Throw IllegalArgumentException at the JNI level when arguments to the
JNI function are incorrect, and when one of the TurboJPEG "utility"
functions returns an error (because, per the C API specification, those
functions will only return an error if one of their arguments is out of
range.)
Remove "throws Exception" from the signature of any methods that no
longer pass through an error from the TurboJPEG C library.
Credit Viktor for the new code
Code formatting tweaks
Change the behavior of the bailif0() macro in the JNI wrapper so that it doesn't throw an exception for an unexpected NULL condition. In fact, in all cases, the underlying JNI API function (such as GetFieldID(), etc.) will throw an Error on its own whenever it returns NULL, so our custom exceptions were never being thrown in that case anyhow. All we need to do is just detect the error and bail out of the C code.
This also corrects a couple of formatting issues (semicolons aren't needed at the end of class definitions, and @Override should be specified for the methods we're overriding from super-classes, so the compiler can sanity-check that we're actually overriding a method and not declaring a new one.)
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1595 632fc199-4ca6-4c93-a231-07263d6284db
#166 describes an issue where I/O suspension is not properly handled in
scan optimization. Supporting I/O suspension may be difficult to
achieve here, thus return an error to make it explicit that I/O
suspension is unsupported.
Add command line option -quant-baseline to cjpeg to force quantization
table entries to be in 1-255 range for JPEG baseline compatibility. See
related discussion in #145
* libjpeg-turbo: (39 commits)
Oops. Delete the duplicate copy of [lib]turbojpeg.dll in the binary directory when uninstalling the package.
AltiVec SIMD implementation of sample conversion and integer quantization
Document the fact that the AltiVec implementation uses the same modified algorithms as the SSE2 implementation
Use intrinsics for loading/storing data in the DCT/IDCT functions. This has no effect on the performance of the aligned loads/stores, but it makes it more obvious what that code is doing. Using intrinsics for the unaligned stores in the inverse DCT functions increases overall decompression performance by 1-2%.
AltiVec SIMD implementation of RGB-to-Grayscale color conversion
Remove unneeded code; Make sure jccolor-altivec.o will be rebuilt if jccolext-altivec.c changes.
AltiVec SIMD implementation of RGB-to-YCC color conversion
Make test a phony target so things don't go haywire if there is a file named test.c in the current directory.
Maintain the traditional order of the regression tests while allowing the TurboJPEG and libjpeg portions to be executed separately
Make comments more consistent
Add a "quicktest" pseudo-target, for those times when you just don't want to sit through 11 iterations of TJUnitTest.
Cosmetic tweaks to the PowerPC SIMD stubs
Split AltiVec algorithms into separate files for ease of maintenance; Rename constants using lowercase so they are not confused with macros
Optimizations to the AltiVec DCT algorithms (pre-compute constants and combine multiply/add operations)
AltiVec SIMD implementation of slow integer inverse DCT
Use macros to allocate constants statically, rather than reading them from a table using vec_splat*(). This improves code readability and probably improves performance a bit as well.
Swap the order of the IFAST and ISLOW FDCT functions so that it matches the order of the prototypes in jsimd.h and the stubs in jsimd_powerpc.c.
Include ARMv8 binaries when generating a combined OS X/iOS package using 'make iosdmg'
In the output of the configure script, indicate whether gas-preprocessor.pl is being used along with the assembler.
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
...
Conflicts:
BUILDING.txt
ChangeLog.txt
cjpeg.c
jpegtran.c
Initial implementation of trellis quantization for arithmetic coding.
The rate computation does not yet implement all rules of the entropy
coder and may thus be suboptimal.
Fix pass number computation in scan optimization to support case where
Huffman table optimization is not done, e.g. when arithmetic coding is
used
Enable combination of arithmetic coding and scan optimization
(previously disabled)
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1444 632fc199-4ca6-4c93-a231-07263d6284db
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1444 632fc199-4ca6-4c93-a231-07263d6284db
Add extension parameter JFLOAT_TRELLIS_DELTA_DC_WEIGHT that controls
how distortion is calculated in DC trellis quantization. The parameter
defines weighting between actual distortion of DC and distortion of
vertical gradient of DC.
By default the parameter is 0.0 and has no effect.
Addresses #117
This incorporates an upstream fix to add jdmrg565.c to the tarball created
by 'make dist', as well as a fix to add the new jcmaster.h file to same. There
are still some mozjpeg-specific files that aren't added when doing 'make dist'.
I'll let someone else worry about those. This patch mainly ensures that any
files that might be eventually adopted upstream are included.
mozjpeg should produce identical output to libjpeg-turbo when the JCP_FASTEST
compression profile is used. That means that that profile needs to revert to
the default libjpeg quantization/Huffman tables as well as disable mozjpeg's
duplicate table checking feature. This patch also adds -revert to any instance
of cjpeg and jpegtran called by 'make test' (or ctest on Windows), so that
those tests actually work again. The tests aren't useful for regression
testing the mozjpeg extensions, but at least they can now be used to regression
test the underlying code.
There was an oversight in the extension framework. jpeg_start_compress() can
be called multiple times between the time that a compress structure is created
and the time it is destroyed. If this happened, then the following sequence
would occur:
-- heap alloc of master struct within jpeg_create_compress()
-- heap free of master struct within jinit_c_master_control()
-- static alloc of extended master struct (JPOOL_IMAGE) within
jinit_c_master_control()
-- free extended master struct in jpeg_finish_compress()
-- jinit_c_master_control() now sees that cinfo->master is set and tries to
free it, even though it has already been freed. Chaos ensues.
The fix involved breaking out the extended master struct into a header so that
jpeg_create_compress() can go ahead and allocate it to the correct size, thus
eliminating the need to free and reallocate it in jinit_c_master_control().
Further, the master struct is now created in the permanent pool, so it will
survive until the compression struct is destroyed. Further,
jinit_c_master_control() now resets all fields in the master struct that
are not related to the extension parameters.
"jcext" is a bit more descriptive, since this code is primarily intended to
extend the libjpeg API. It does so in a backward-ABI-compatible manner, but
"jccompat" could be misinterpreted to mean that the code is providing backward
compatibility at the code level..
This eliminates JBOOLEAN_USE_MOZ_DEFAULTS and replaces it with
JINT_COMPRESS_PROFILE, a more flexible and descriptive parameter. Currently,
this new parameter works in much the same way as the old-- it changes the
behavior of jpeg_set_defaults(). It currently supports only two values
(max. compression, i.e. mozjpeg defaults, and fastest, i.e. libjpeg-turbo
defaults), but it can be extended in the future with additional profiles that
balance compression ratio with performance.
This includes more descriptive text for the project summary (the same
text that is in the package descriptions), a more thorough description of the
libjpeg API extensibility framework, reformatting to improve readability
(particularly on 80-column terminals), and numerous grammar tweaks.
JBOOLEAN_ONE_DC_SCAN and JBOOLEAN_SEP_DC_SCAN are merged into a single
parameter JINT_DC_SCAN_OPT_MODE
Default behavior is modified to use one DC scan per component
Since mozjpeg is now backward ABI-compatible with libjpeg[-turbo], it is now
possible to temporarily load mozjpeg into a binary application and cause that
application to generate uber-compressed JPEGs (at the expense of an extreme
performance loss, of course.) For instance, someone could do
LD_LIBRARY_PATH=/opt/mozjpeg/lib convert blah_blah_blah
to make ImageMagick use mozjpeg instead of the system's pre-installed JPEG
library (libjpeg-turbo, in most cases.) However, this only makes sense if
mozjpeg is actually producing different behavior by default than libjpeg-turbo.
Currently it isn't. Currently it requires the application to set
JBOOLEAN_USE_MOZ_DEFAULTS to TRUE in order to enable the mozjpeg-specific
behavior, but of course applications that were built to use libjpeg[-turbo]
won't do that. Thus, this patch sets use_moz_defaults to TRUE by default,
requiring an application to explicitly set it to FALSE in order to revert to
the libjpeg[-turbo] behavior (makes sense, since the only applications that
would need to revert to the libjpeg[-turbo] behavior would be mozjpeg-aware
applications.)
Note that we discussed the possibility of adding a function
(jpeg_revert_defaults()), which would act the same as jpeg_set_defaults() does
in libjpeg[-turbo]. This is a good solution for implementing the -revert
switch in cjpeg, but unfortunately it doesn't work for jpegtran. The reason
is that jpeg_set_defaults() is called within the body of
jpeg_copy_critical_parameters(), which is part of the API. So yet again,
if mozjpeg were loaded into a non-mozjpeg-aware application at run time, it
would be desirable for jpeg_copy_critical_parameters() to set the parameters
to mozjpeg defaults. That means that, in order to implement the -revert
switch in jpegtran, it would be necessary to introduce a new function
(jpeg_revert_critical_parameters(), perhaps). It seems cleaner to just keep
using the JBOOLEAN_USE_MOZ_DEFAULTS parameter to control the behavior of
jpeg_set_defaults(), even though this represents a minor abuse of the libjpeg
API (jpeg_set_defaults() is technically supposed to set all of the parameters
to defaults, irrespective of any previous state. However, as long as we
document that JBOOLEAN_USE_MOZ_DEFAULTS works differently, then it should be
OK.)
* libjpeg-turbo:
Remove trailing spaces
Another oops. tjBufSizeYUV2() should return -1 if width < 1.
Oops. tjPlaneSizeYUV() should return -1 if componentID > 0 and subsamp==TJSAMP_GRAY.
The AltiVec code actually works on 32-bit PowerPC platforms as well, so change the "powerpc64" token to "powerpc". Also clean up the shift code, which wasn't building properly on OS X.
AltiVec SIMD implementation of fast forward DCT
Bump version to 1.5 alpha1 to prepare for new features
When building libjpeg-turbo on Un*x systems, INT32 is usually typedef'ed to long, not int, so we need to specify an int pointer when doing a 4-byte write to the RGB565 output buffer. On little endian systems, this doesn't matter, but when you write a 32-bit int to a 64-bit long pointer address on a big endian system, you are writing to the upper 4 bytes, not the lower 4 bytes. NOTE: this will probably break on big endian systems that use 16-bit ints (are there any of those still around?)
Fix Windows build
Fix issues with RGB565 color conversion on big endian machines. The RGB565 routines are now abstracted in a separate file, with separate little-endian and big-endian versions defined at compile time through the use of macros (this is similar to how the colorspace extension routines work.) This allows big-endian machines to take advantage of the same performance optimizations as little-endian machines, and it retains the performance on little-endian machines, since the conditional branch for endianness is at a very coarse-grained level.
Fix build on OS X PowerPC platforms
Oops. Forgot to alter the version header in the change log to indicate the release of 1.4 beta.
For whatever reason, some of these files didn't get fully merged from
libjpeg-turbo 1.4. They still contained tab characters and other formatting
conventions from libjpeg-turbo 1.3. This patch also fixes some obvious
indentation errors in the mozjpeg-specific code. There is more formatting work
that needs to be done to the mozjpeg-specific code, to fix line overruns,
incorrect operator whitespace, and other issues that make it not consistent
with the libjpeg/libjpeg-turbo code.
This might be slightly more controversial, since it changes the CMake and
autotools project names and the binaty package names to "mozjpeg", and it
changes the default install directory to /opt/mozjpeg. To me, this makes much
more sense, but it does represent a change in operational behavior, which is
why I put it in a separate commit.
This patch does the following:
-- Implements some (hopefully non-controversial) changes to the package
descriptions, in order to prevent confusion (the existing descriptions from
libjpeg-turbo are not appropriate for mozjpeg.)
-- Replaces "libmozjpeg" with "mozjpeg" in all documentation and comments. The project is called "mozjpeg", and it doesn't actually generate a library called
"libmozjpeg", so it doesn't make sense to use "libmozjpeg" to describe it.
-- Replaces "MozJPEG" with "TurboJPEG" in all documentation and comments.
"MozJPEG" appears to have been the product of blindly searching/replacing
instances of "Turbo". TurboJPEG is the name of the API, and that name still
applies to the implementation in mozjpeg. Furthermore, the TurboJPEG libraries
are still called "libturbojpeg" in mozjpeg.
-- Attempts to remove build instructions that are irrelevant or not applicable
to mozjpeg. Further work possibly needs to be done here-- for instance, it
doesn't make much sense to have build instructions for mobile devices when the
library is not intended to be used for decoding.
-- Changes the vendor in the DEB and RPM files from "The libmozjpeg Project" to
"Mozilla Research".
-- Changes the source tarball location in the RPM spec file to correctly point
to the release tarball on github.
-- Changes the source directory in the RPM spec file to "mozjpeg-%{version}",
which is the actual name of the source directory in the mozjpeg tarballs.
The ABI compatibility feature was developed by the current maintainer of
libjpeg-turbo with an eye toward eventual inclusion in libjpeg-turbo (once
other features are added to libjpeg-turbo that necessitate the inclusion.)
Thus, it is easy to ensure that the DLL function ordinals will be synchronized
between libjpeg-turbo and mozjpeg. However, it still makes sense to allow for
a little bit of breathing room, just in case. Thus, this patch uses ordinals
starting at 200 for the accessor functions. It would probably make sense to
start the equivalent decompressor get/set functions at ordinal 300, once they
are implemented.
Windows requires exported symbols to be explicitly declared.
Also, use a very large ordinal number so that any future symbols
added by IJG or TurboJPEG will not break ABI.
Fixes#104.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '73edb3d734a628fd88994bc974dc6737a58bd956': (45 commits)
Rename the ARM64 assembly file to match the C file
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches: https://sourceforge.net/p/libjpeg-turbo/patches/64/
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
Clarify forward compatibility of iOS/ARM builds
ARM64 NEON SIMD support for YCC-to-RGB565 conversion
ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
Windows doesn't have setenv(). Go, go Gadget Macros.
1.4 beta1
Fix 'make dist'
Don't use sudo when building a Debian package unless the user is non-root
Add a set of undocumented environment variables and Java system properties that allow compression features of libjpeg that are not normally exposed in the TurboJPEG API to be enabled. These features are not normally exposed because, for the most part, they aren't "turbo" features, but it is still useful to be able to benchmark them without modifying the code.
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
Oops
Subtle point, but dest->outbuffer is a pointer to the address of the JPEG buffer, which is stored in the calling program. Thus, *(dest->outbuffer) will always equal *outbuffer. We need to compare *outbuffer with dest->buffer instead to determine if the pointer is being reused.
If the output buffer in the TurboJPEG destination manager was allocated by the destination manager and is being reused from a previous compression operation, then we need to get the buffer size from the previous operation, since the calling program doesn't know the actual buffer size.
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
...
Conflicts:
CMakeLists.txt
Makefile.am
configure.ac
jcdctmgr.c
release/deb-control.tmpl
sharedlib/CMakeLists.txt
simd/CMakeLists.txt
turbojpeg.c
* origin/master: (23 commits)
Update .gitignore
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Enable DC trellis by default
Avoid double inline attribute
Detect libpng
Implement DHT Merging
Add .gitignore for autotools files
Check memory alloc success
Update cjpeg usage text
Implement DQT merging
Fix issue with scan printout
Get rid of unnecessary and obsolete platform configuration instructions.
Add error checks for malloc calls that don't already have them. Issue #87.
yuvjpeg: fix trivial leak
Parse quality as float
PNG reading support
Fix issue with DC trellis
Add option to split DC scans
Add trellis for DC
Bump version to 2.1.
...
Conflicts:
BUILDING.txt
cdjpeg.h
jcdctmgr.c
jchuff.h
jcmarker.c
jcmaster.c
jconfig.txt
jpeglib.h
rdswitch.c
* commit 'b8d044a666056d4d8d28d7a5d0805ac32b619b36': (58 commits)
Big oops. wrjpgcom on Windows was being built using the rdjpgcom source.
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
Remove VMS-specific code
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We don't support non-ANSI C compilers
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
SIMD-accelerated int upsample routine for MIPS DSPr2
Fix MIPS build
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
Further copyright header cleanup
Further copyright header cleanup
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
Clean up code formatting in the SIMD interface functions
SIMD-accelerated NULL convert routine for MIPS DSPr2
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
Fix error in MIPS DSPr2 accelerated smooth downsample routine
SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2
Minor tweak to improve code readability
...
Conflicts:
BUILDING.txt
CMakeLists.txt
Makefile.am
cdjpeg.h
cjpeg.1
cjpeg.c
configure.ac
djpeg.1
example.c
jccoefct.c
jcdctmgr.c
jchuff.c
jchuff.h
jcinit.c
jcmaster.c
jcparam.c
jcphuff.c
jidctflt.c
jpegint.h
jpeglib.h
jversion.h
libjpeg.txt
rdswitch.c
simd/CMakeLists.txt
tjbench.c
turbojpeg.c
usage.txt
wrjpgcom.c
* commit '93ddfcfc1a814789ed64d967a6118616753bb9d5': (65 commits)
Use clz/bsr instructions on ARM for bit counting rather than the lookup table (reduces memory footprint and can improve performance in some cases.)
Make iOS build instructions more generic and applicable to all versions of Xcode; modify iOS build procedure for Xcode 5.0 and later to fix a build issue with Xcode 5.1.
Update build instructions to reflect the use of pkgbuild/productbuild
Remove any claims of support for OS X 10.4 "Tiger" (the packaging system overhaul produces packages that require Leopard or later, and I haven't been able to test Tiger for years anyhow.) Update TurboJPEG shared library version.
Migrate Mac packaging system to pkgbuild, since PackageMaker is no longer supported.
Remove the sections about replacing libjpeg at run time and compile time. These were written before O/S distributions started shipping libjpeg-turbo, and they are either pedantic or no longer relevant. Also remove any text that assumes the use of our official project binaries. Notes specific to the official binaries have been moved into the project wiki.
Fix Windows build
Since we're now maintaining our own Cygwin pseudo-repository directories instead of recommending that users install these packages from a local source, it makes more sense to name the packages according to Cygwin specs, so they can be copied as-is into the pseudo-repository.
39dbc2db9718f9af2f62eb486fd73328fe8bf5e8
Fix 'make dist'
RHEL 6 (and probably other platforms as well) sets _defaultdocdir=%{_datadir}/doc, which screws things up, since we're overriding _datadir. Since we intend _defaultdocdir to be /usr/share/doc, just be explicit about it.
Fix compiler warning about unused function when building with the libjpeg v6b API/ABI
Fix compiler warning ("always_inline function might not be inlinable") when building with recent versions of GCC
Enable silent build (can be overridden with 'make V=1') if the version of autotools being used is new enough.
Extend YUVImage class to allow reuse of the same buffer with different metadata; port TJBench changes that treat YUV encoding/decoding as an intermediate step of the JPEG compression/decompression pipeline rather than a separate test case; add YUV encode/decode tests to the Java version of tjbenchtest
formatting tweaks
Fix an error that occurred when trying to use the lossless transform feature without specifying -quiet; formatting tweak
Move the garbage collection of the JPEG tiles into the decompression function to increase the chances that tiled decompression of large images will succeed without an OutOfMemoryError.
Generate the Java documentation using javadoc 7, to improve readability.
This should have been checked in with the previous commit.
...
Conflicts:
BUILDING.txt
configure.ac
jversion.h
release/Info.plist.in
release/ReadMe.rtf
tjbench.c
turbojpeg.c
* mozjpeg: (94 commits)
Disable scan optimization if no scan given
Fixed mozjpeg build with Visual C++ 2010
Added few error messages in cjpeg
Fix trellis for nonprogressive mode (#69)
Bugfix: AM_PROG_AR is not recognized by older automake, so only use it when defined
Bump version number for 2.0, make this version 2.0.1.
Update MS-SSIM tuning
Fix#64
Updating yuvjpeg and jpegyuv to match Daala tools.
Fix#56
Silence compiler warning
Silence compiler warning
Improve support of JPEG input in cjpeg
Fix issue with JPEG read in cjpeg
Add support for JPEG input in cjpeg
Fix#50
Disable trellis in jpegtran
Update version to 2.0pre.
Use single DC scan by default
Add configure check for libm/pow. Fixes Linux build issue.
...
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1403 632fc199-4ca6-4c93-a231-07263d6284db
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1402 632fc199-4ca6-4c93-a231-07263d6284db
This patch also removes an unneeded macro from jdmerge.c.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1402 632fc199-4ca6-4c93-a231-07263d6284db
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
The current way multipass works, write_scan_header() seems to be called
multiple times, so only DC tables get merged, however, this will still
merge AC tables if possible.
Implements the second half of #30.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Command line option -split-dc-scan is added to code DC scans
independently (instead of interleaved). It should be determined whether
this option introduces any decoder compatibility issues ( see #83 )
Option -multidcscan is renamed to -opt-dc-scan
Add option to apply trellis quantization to the DC coefficients ( see
#57 ). May need further refinement to make sure block order during
trellis optimization matches order during coding.
if trellis_quant is enabled, increment total number of
passes by optimization beginning pass number.
Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
From jpeglib-turbo r1221:
Integrate a slightly modified version of Mozilla's patch for
precomputing the bit-counting LUT. This is useful if the table needs
to be shared among multiple processes, although the primary reason for
doing that is reduced footprint on mobile devices, which are probably
already covered by the clz intrinsic code.
From libjpeg-turbo r1288
Port the more accurate (and slightly faster) floating point IDCT
implementation from jpeg-8a and later. New research revealed that the
SSE/SSE2 floating point IDCT implementation was actually more accurate
than the jpeg-6b implementation, not less, which is why its
mathematical results have always differed from those of the jpeg-6b
implementation. This patch brings the accuracy of the C code in line
with that of the SSE/SSE2 code.
Add macro JPEG_RAW_READER that defines whether to pass RAW sample data
from input to output JPEG files (hence preserving color space and
sampling). Macro is now disabled by default.
Add code to copy metadata from input to output JPEG, hence preserving
color profiles and other important information
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1319 632fc199-4ca6-4c93-a231-07263d6284db
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
We can't simply increase JMSG_LENGTH_MAX, because it is part of the libjpeg API, and it is generally assumed that a buffer of this length will be passed to format_message(). Thus, the easiest solution is simply to use a shorter copyright string for JMSG_COPYRIGHT.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1318 632fc199-4ca6-4c93-a231-07263d6284db
Value of cinfo->one_dc_scan is set to true by default to use a single
DC scan for all components.
Option -onedcscan is replaced by -multidcscan to enable multiple DC
scans.
While this change appears to degrade compression performance it
improves compatibility with a wider range of JPEG decoders.
Add option to have a single DC scan wherein all components are
interleaved when using progressive mode. This may resolve compatibility
issues raised in #29 and #48.
This option is available through -onedcscan in cjpeg
Optimizes quantization matrix by minimizing reconstruction error based
on quantized coefficients.
Feature is controlled by cinfo->trellis_q_opt; disabled by default.
The invocation of this function is wrapped in an ifdef JPEGLIB >70
but the definition of the function wasn't. This change adds the
ifdef around the function definition.
the throw macros set retval, but not all functions use/return this
value. This change, mainly to clean up compiler warnings, makes
those functions return a value.
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1258 632fc199-4ca6-4c93-a231-07263d6284db
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
-- Auto-generates HAVE_LOCALE_H macro and adds it to jconfig.h (this is used by rdjpgcom.c.)
-- Reconciles the description and ordering of macros between config.h.in and jconfig.h.in, so the two files can be easily diffed.
-- Eliminates the use of the autoheader-generated config.h in the project and moves relevant internal-only macros into a new file, jconfigint.h. This is to avoid "already defined" warnings in files that were including both config.h (to get the internal autotools package information or the INLINE definition) and jconfig.h.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1257 632fc199-4ca6-4c93-a231-07263d6284db
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1253 632fc199-4ca6-4c93-a231-07263d6284db
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
delta = cur0 * 2;
cur0 += delta; /* form error * 3 */
errorptr[0] = (FSERROR) (bpreverr0 + cur0);
cur0 += delta; /* form error * 5 */
bpreverr0 = belowerr0 + cur0;
cur0 += delta; /* form error * 7 */
Each time cur0 is incremented by delta, the compiled code doubles the value of delta (WTF?!) Thus, by the time the end of the block is reached, cur0 is equal to 15 times its former self, not 7 times its former self as it should be. At any rate, it was a lot simpler to just refactor the code so that it uses multiplication.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1251 632fc199-4ca6-4c93-a231-07263d6284db
This macro defined two local variables, cinfo and dinfo, but both
aren't always used. Add a (void)cinfo; and (void)dinfo in there
to hush up the compiler.
Derivation of sign value relies on shift right operator >> being an
arithmetic shift. It is thus not strictly portable since the C standard
defines the result of x >> y as "implementation-defined" when x is a
signed integer with a negative value.
Multiple trellis iterations may improve coding performance as Huffman
tables are updated with each iteration. In practice the benefit appears
to be very minimal
Trellis quantization is modified:
- to work on the configurable spectral range Ss to Se
- to optionally optimize runs of EOBs
- to optionally split optimization between 2 spectral ranges
In trellis quantization passes Huffman table code optimization is
modified such as to generable a valid code length for each possible
symbol by resetting frequency counters to 1 instead of 0
Fixes issues #23#24#31
Note that the original jpgcrush script optimizes scans only for YCbCr
and grayscale color spaces. Scan optimization is thus disabled for RGB
and CMYK color spaces and behavior reverts to the fast mode of jpgcrush
which uses predefined scans
Different lambda values may be used for each frequency in DCT domain.
The weighting table is currently not configurable but can be
enabled/disabled with cinfo-> use_lambda_weight_tbl
-tune-psnr, -tune-ssim and -tune-hvs-psnr are added to cjpeg to control
the trellis quantization process and optimize output for PSNR, SSIM and
HVS-PSNR distortion metrics
A new pass type trellis_pass is added. It defines a pass where trellis
quantization is done in the quantize_trellis() function.
Trellis quantization can be enabled by setting use_moz_defaults to 2 or
by using the -trellis option in cjpeg
Note that trellis does currently not work with scan optimization. Scan
optimization is disabled when trellis is enabled.
First implementation of scan optimisation as done by jpgcrush. Many
parameters are currently hardcoded which should be changed.
Implementation is missing for monochrome.
Add the fast mode of jpgcrush where:
- Huffman table optimisation is enabled
- Progressive coding mode is enabled
- New default scans are defined for progressive coding
-- The Mac and Cygwin packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version} on Cygwin and /Library/Documentation/{package_name} on Mac.
-- Fixed a duplicate filename warning when generating RPMs with the default prefix of /opt/libjpeg-turbo.
-- Moved the TurboJPEG libraries out of the system directory on Windows and Mac. It is no longer necessary to put them there, since we are not trying to be backward compatible with TurboJPEG/IPP anymore.
-- Fixed an issue whereby building the "installer" target on Windows would not build the Java JAR file, thus causing an error if the JAR had not been previously built.
-- Building the "install" target on Windows will now install libjpeg-turbo into c:\libjpeg-turbo[-gcc][64] (the same directories used by the installers.) This can be overridden by setting CMAKE_INSTALL_PREFIX.
-- The Java classes on all platforms will now look for the JNI library in the directory under which the build/packaging system installs it.
-- The Mac and Cygwin packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version} on Cygwin and /Library/Documentation/{package_name} on Mac.
-- Fixed a duplicate filename warning when generating RPMs with the default prefix of /opt/libjpeg-turbo.
-- Moved the TurboJPEG libraries out of the system directory on Windows and Mac. It is no longer necessary to put them there, since we are not trying to be backward compatible with TurboJPEG/IPP anymore.
-- Fixed an issue whereby building the "installer" target on Windows would not build the Java JAR file, thus causing an error if the JAR had not been previously built.
-- Building the "install" target on Windows will now install libjpeg-turbo into c:\libjpeg-turbo[-gcc][64] (the same directories used by the installers.) This can be overridden by setting CMAKE_INSTALL_PREFIX.
-- The Java classes on all platforms will now look for the JNI library in the directory under which the build/packaging system installs it.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@946 632fc199-4ca6-4c93-a231-07263d6284db
If the default prefix (/opt/libjpeg-turbo) is used, then we now always install 32-bit libraries in /opt/libjpeg-turbo/lib32 and 64-bit libraries in /opt/libjpeg-turbo/lib64 instead of trying to conform to the Debian or Red Hat conventions. The RPM and DEB packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version}.
If the default prefix (/opt/libjpeg-turbo) is used, then we now always install 32-bit libraries in /opt/libjpeg-turbo/lib32 and 64-bit libraries in /opt/libjpeg-turbo/lib64 instead of trying to conform to the Debian or Red Hat conventions. The RPM and DEB packages will now be created with the directory structure defined by the configure variables "prefix", "bindir", "libdir", etc., with the exception that the docs are always installed under /usr/share/doc/{package_name}-{version}.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@944 632fc199-4ca6-4c93-a231-07263d6284db
The SIMD glue code has gotten a bit #ifdef heavy so clean it up by having
one file for each possible SIMD arch. This also allows a simplification of
the x86_64 code as SSE/SSE2 is always known to exist on that arch.
Older versions of automake doesn't properly support no-recursive make.
Reimplement the build system by having a local Makefile.am in the
simd/ directory.
Studio build. A static jconfig.h has been re-added, but in a separate
directory, to avoid clash with jconfig.h generated by configure
script. Also, jconfig.h now includes the inline macro. jpeg.dsp has
been modified to search in the "win" subdir, to find jconfig.h.
This patch is in spirit similar to r121.
* If using JDK 11 or later, CMake 3.10.x or later must also be used.
### Windows
- Microsoft Visual C++ 2005 or later
If you don't already have Visual C++, then the easiest way to get it is by
installing
[Visual Studio Community Edition](https://visualstudio.microsoft.com).
* You can also download and install the standalone Windows SDK (for Windows 7
or later), which includes command-line versions of the 32-bit and 64-bit
Visual C++ compilers.
* If you intend to build libjpeg-turbo from the command line, then add the
appropriate compiler and SDK directories to the `INCLUDE`, `LIB`, and
`PATH` environment variables. This is generally accomplished by
executing `vcvars32.bat` or `vcvars64.bat`, which are located in the same
directory as the compiler.
* If built with Visual C++ 2015 or later, the libjpeg-turbo static libraries
cannot be used with earlier versions of Visual C++, and vice versa.
* The libjpeg API DLL (**jpeg{version}.dll**) will depend on the C run-time
DLLs corresponding to the version of Visual C++ that was used to build it.
- Vcpkg
You need to download and install libpng using the [vcpkg](https://github.com/Microsoft/vcpkg) dependency manager:
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
vcpkg install libpng:x64-windows
vcpkg install libpng:x64-windows-static
Actually, you can just download and install MozJPEG using vcpkg dependency manager:
vcpkg install mozjpeg
The mozjpeg port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please [create an issue or pull request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
... OR ...
- MinGW
[MSYS2](http://msys2.github.io/) or [tdm-gcc](http://tdm-gcc.tdragon.net/)
recommended if building on a Windows machine. Both distributions install a
Start Menu link that can be used to launch a command prompt with the
appropriate compiler paths automatically set.
- If building the TurboJPEG Java wrapper, JDK 1.5 or later is required. This
This repository is governed by Mozilla's code of conduct and etiquette guidelines.
For more details, please read the
[Mozilla Community Participation Guidelines](https://www.mozilla.org/about/governance/policies/participation/).
## How to Report
For more information on how to report violations of the Community Participation Guidelines, please read our '[How to Report](https://www.mozilla.org/about/governance/policies/participation/reporting/)' page.
<!--
## Project Specific Etiquette
In some cases, there will be additional project etiquette i.e.: (https://bugzilla.mozilla.org/page.cgi?id=etiquette.html).
Mozilla JPEG Encoder Project [](https://ci.appveyor.com/project/kornel/mozjpeg-4ekrx)
============================
MozJPEG improves JPEG compression efficiency achieving higher visual quality and smaller file sizes at the same time. It is compatible with the JPEG standard, and the vast majority of the world's deployed JPEG decoders.
MozJPEG is a patch for [libjpeg-turbo](https://github.com/libjpeg-turbo/libjpeg-turbo). **Please send pull requests to libjpeg-turbo** if the changes aren't specific to newly-added MozJPEG-only compression code. This project aims to keep differences with libjpeg-turbo minimal, so whenever possible, improvements and bug fixes should go there first.
MozJPEG is compatible with the libjpeg API and ABI. It is intended to be a drop-in replacement for libjpeg. MozJPEG is a strict superset of libjpeg-turbo's functionality. All MozJPEG's improvements can be disabled at run time, and in that case it behaves exactly like libjpeg-turbo.
MozJPEG is meant to be used as a library in graphics programs and image processing tools. We include a demo `cjpeg` command-line tool, but it's not intended for serious use. We encourage authors of graphics programs to use libjpeg's [C API](libjpeg.txt) and link with MozJPEG library instead.
## Features
* Progressive encoding with "jpegrescan" optimization. It can be applied to any JPEG file (with `jpegtran`) to losslessly reduce file size.
* Trellis quantization. When converting other formats to JPEG it maximizes quality/filesize ratio.
* Comes with new quantization table presets, e.g. tuned for high-resolution displays.
* Fully compatible with all web browsers.
* Can be seamlessly integrated into any program that uses the industry-standard libjpeg API. There's no need to write any MozJPEG-specific integration code.
See [BUILDING](BUILDING.md). MozJPEG is built exactly the same way as libjpeg-turbo, so if you need additional help please consult [libjpeg-turbo documentation](https://libjpeg-turbo.org/).
"Directory containing Armv8 iOS or macOS build to include in universal binaries")
set(MACOS_APP_CERT_NAME""CACHESTRING
"Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG. Leave this blank to generate an unsigned DMG.")
set(MACOS_INST_CERT_NAME""CACHESTRING
"Name of the Developer ID Installer certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo installer package. Leave this blank to generate an unsigned package.")
[C compiler flags needed to include jni.h (default: -I/System/Library/Frameworks/JavaVM.framework/Headers on OS X, '-I/usr/java/include -I/usr/java/include/solaris' on Solaris, and '-I/usr/java/default/include -I/usr/java/default/include/linux' on Linux)])
AC_MSG_CHECKING([whether to build TurboJPEG Java wrapper])
AC_ARG_WITH([java],
AC_HELP_STRING([--with-java], [Build Java wrapper for the TurboJPEG library]))
if test "x$with_12bit" = "xyes" -o "x$with_turbojpeg" = "xno"; then
['tjtransform_90',['tjtransform',['../structtjtransform.html',1,'tjtransform'],['../group___turbo_j_p_e_g.html#ga504805ec0161f1b505397ca0118bf8fd',1,'tjtransform(): turbojpeg.h'],['../group___turbo_j_p_e_g.html#ga9cb8abf4cc91881e04a0329b2270be25',1,'tjTransform(tjhandle handle, const unsigned char *jpegBuf, unsigned long jpegSize, int n, unsigned char **dstBufs, unsigned long *dstSizes, tjtransform *transforms, int flags): turbojpeg.h']]],
['tjtransform',['tjtransform',['../structtjtransform.html',1,'tjtransform'],['../group___turbo_j_p_e_g.html#gae403193ceb4aafb7e0f56ab587b48616',1,'tjTransform(tjhandle handle, unsigned char *jpegBuf, unsigned long jpegSize, int n, unsigned char **dstBufs, unsigned long *dstSizes, tjtransform *transforms, int flags): turbojpeg.h'],['../group___turbo_j_p_e_g.html#gaa29f3189c41be12ec5dee7caec318a31',1,'tjtransform(): turbojpeg.h']]],
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.