Commit Graph

2337 Commits

Author SHA1 Message Date
DRC
622c462e48 Fix copyright header formatting buglets 2019-11-06 17:24:34 -06:00
DRC
410c028f33 example.txt: Avoid undefined setjmp() behavior
Modifying a locally-defined non-volatile variable below the setjmp()
return point results in undefined behavior whereby the variable may not
have the expected value after setjmp() returns.

Fixes #379
2019-11-06 16:16:54 -06:00
DRC
d70047fcd2 Travis: Use GCC 6 for Mac pre-release builds
The primary impetus for this is to eliminate build warnings, such as

  (32-bit only)
  section "__textcoal_nt" is deprecated

  object file (XXXXXX.o) was built for newer OSX version (10.XX) than
  being linked (10.5)

Upgrading to GCC 6 results in neutral performance for compression,
a measured average overall decompression speedup of 2.5% for 64-bit
code, and a measured average overall decompression speedup of -4.3% for
32-bit code on a 3 GHz Core i7.  The 4.3% slow-down for 32-bit code is
deemed acceptable, given that 32-bit macOS apps are deprecated.
2019-11-05 17:57:59 -06:00
DRC
cbf0fcc8b7 i386 SSE2 Huffman: Fix pointer arithmetic issue
Splitting the pointer arithmetic in GET_SYM() into a separate add and
sub instruction was an attempt to work around an error ("invalid operand
type") that occurred when assembling the file with NASM.  However, this
created a link error on macOS ("ld: illegal text-relocation to
'_jconst_huff_encode_one_block' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o from
'_jsimd_huff_encode_one_block_sse2' in
simd/CMakeFiles/simd.dir/i386/jchuff-sse2.asm.o for architecture i386")
and also changed the alignment of the code in ways that might have
affected the previous benchmark results (which took a great deal of time
to obtain.)  Ultimately, the path of least resistance is just to
require NASM 2.13 or later.
2019-11-05 15:56:28 -06:00
DRC
bbedb4b564 Merge branch 'master' into dev 2019-11-05 15:43:21 -06:00
DRC
cf54623b08 Mac: Support hiding SIMD fct symbols w/ NASM 2.14+
(NASM 2.14+ now supports the private_extern section directive, which was
previously only available with YASM.)
2019-11-05 15:41:59 -06:00
DRC
087c29e07f Optimize Huffman encoding
This commit improves the C and SSE2 Huffman encoding implementations in
the following ways:

- Avoid using xmm8-xmm15 in the x86-64 SSE2 implementation.  There is no
  actual need to use those registers, and avoiding them produces a
  cleaner WIN64 function entry/exit-- as well as shorter code, since REX
  prefixes can be avoided (this is helpful on certain CPUs, such as
  Intel Atom, for which instruction fetch and decoding can be a
  bottleneck.)
- Optimize register usage so that fewer REX prefixes and
  register-register moves are needed.
- Use the bit counter to store the number of free bits in the bit buffer
  rather than the number of bits in the bit buffer.  This changes the
  method for inserting a code into the bit buffer to:

  (put_buffer |= code << (free_bits -= code_size));

  As a result:
  * Only one bit counter needs to stay in a register (we just keep it in
    cl.)
  * The bit buffer contents are already properly aligned to be written
    out (after a byte swap.)
  * Adjusting the free bits counter and checking if the bit buffer is
    full can be combined into a single operation.
  * We can wait to flush the bit buffer until the buffer is actually
    full and not just in danger of becoming full.  Thus, eight bytes can
    be flushed at a time.

- Speed is quite sensitive to the alignment of branch target labels, so
  insert some padding and remove branches from the flush code.
  (Flushing this way isn't actually faster when compared to using
  branches, but the branchless code doesn't need extra alignment and is
  thus smaller.)
- Speculatively write out the bit buffer as a single 8-byte write,
  falling back to a byte-by-byte write only if there are any 0xFF bytes
  in the bit buffer that need to be encoded as 0xFF 0x00.
- Use MMX registers for the 32-bit implementation (so the bit buffer can
  be 64 bits wide.)
- Slightly reduce overall function code size.
- Eliminate or combine a few SSE instructions.
- Make some minor improvements to instruction scheduling.
- Adjust flush_bits() in jchuff.c to handle cases in which the bit
  buffer has less than 7 free bits (apparently that couldn't happen
  before.)

Based on:
947a09defa
262ebb6b81
6e9a091221

See change log for performance claims.

Closes #292
2019-11-04 19:04:05 -06:00
DRC
d92ae5df0c Merge branch 'master' into dev 2019-11-04 18:50:45 -06:00
DRC
ac59b2c582 TJBench: Fix output with -componly -quiet 2019-11-04 18:49:46 -06:00
DRC
6902cdb177 Build: Don't require ASM_NASM if !REQUIRE_SIMD
The build system is supposed to fall back to a non-SIMD build if
WITH_SIMD==1 but REQUIRE_SIMD==0.

Based on:
972df912d0

Closes #384
2019-10-29 12:08:40 -05:00
DRC
adf9cc942f j*huff.c: Remove crufty ASSIGN_STATE() macro
This macro is a relic of libjpeg's historic need to support a wide
variety of C compilers with varying degrees of compatibility.  Such was
necessary during the open systems era, because C compilers were often
supplied by the system vendor.  Prior to 1989, there was no C standard
per se, and even after ANSI C became a thing, there were still compilers
in use that didn't conform to it (libjpeg was first released in 1991.)

Realistically, only a handful of C compilers are in widespread use these
days, and all modern C compilers should support structure assignment.
2019-10-24 02:13:34 -05:00
DRC
95f4d6ef8b Merge branch 'master' into dev 2019-10-24 02:13:23 -05:00
DRC
55de97207d AppVeyor: Use MinGW-builds instead of MSYS2 MinGW
... to avoid backward compatibility issues with GCC 4-6 MinGW
toolchains.  Apparently GCC 7+ MinGW toolchains introduce a link-time
dependency with internal MinGW CRT functions that are meant to provide
compatibility with Microsoft's Universal CRT (ucrt) library, but those
internal functions are not available in GCC 4-6 MinGW toolchains.  This
made it impossible to use the official builds of libjpeg.a and
libturbojpeg.a with GCC 4-6 MinGW toolchains (a fatal link error--
"undefined reference to '__imp___acrt_iob_func'"-- occurred.)

This problem was not immediately apparent after switching to the MSYS2
implementation of MinGW (d6d7b53968)
because, for a while, MSYS2 was still using GCC 5 and 6.

Refer to libjpeg-turbo/libjpeg-turbo#382
2019-10-24 02:04:04 -05:00
DRC
708f013f89 Win packaging: Fix 64-bit VC/GCC co-install issue 2019-10-23 00:31:30 -05:00
DRC
163f0b1965 Bump version to 2.0.4 to prepare for new commits 2019-10-22 19:39:38 -05:00
DRC
3a32d199df x86 SIMD: Consistify capitalization of NASM types
byte, word, dword, qword, oword, and yword are all assembler keywords,
so it makes sense to use lowercase for these so as not to mistake them
for macros or constants.
2019-10-17 20:02:20 -05:00
DRC
9a51a87af3 x86 SIMD: Remove obsolete [TAB8] comments
With apologies to Richard Hendricks, our assembly code no longer uses
tabs.
2019-10-17 14:11:35 -05:00
DRC
74aeaddf8e jc*huff.c: Consistify preproc directive formatting
The rest of the libjpeg API code uses "#if defined(condition)" rather
than "#if defined condition".
2019-10-17 14:09:50 -05:00
DRC
2ae2bd66b4 Remove support for Apple's Java implementation
(AKA "Java for OS X systems.")  This implementation of Java 1.6 is long
obsolete and not supported on any version of macOS past High Sierra.
Oracle no longer provides a 32-bit JVM on macOS, so it is no longer
necessary to provide a 32-bit version of the TurboJPEG Java wrapper on
macOS.
2019-10-04 16:31:46 -05:00
DRC
b7523059c1 Merge branch 'master' into dev 2019-10-04 15:53:30 -05:00
DRC
ab6c4a5db1 Travis: Fix macOS build
Java 6 has moved to homebrew/cask-versions.
2019-10-04 15:02:18 -05:00
DRC
ec4cacc16e BUILDING.md: Document need for CRB/PowerTools repo
NASM and YASM are located in this non-default repository on
(respectively) RHEL and CentOS.
2019-10-02 16:42:03 -05:00
DRC
b4110b65fc Merge branch 'master' into dev 2019-09-04 18:58:12 -05:00
DRC
5db6a6819d README.md: Document memory debugger pitfalls
(more specifically, the need to disable libjpeg-turbo's SIMD extensions
when testing with valgrind, MSan, etc.)

Addresses a concern expressed in #365
2.0.3
2019-08-30 12:02:50 -05:00
DRC
f448279b14 README.md: Remove vestigial autotools references 2019-08-30 11:24:36 -05:00
DRC
ded5a504b4 tjDecodeYUV*: Fix err if TJ inst used for prog dec
If the TurboJPEG instance passed to tjDecodeYUV[Planes]() was previously
used to decompress a progressive JPEG image, then we need to disable the
progressive decompression parameters in the underlying libjpeg instance
before calling jinit_master_decompress().

This commit also modifies the build system so that the "tjtest" target
will test for this issue, and it corrects a previous oversight in the
build system whereby tjbenchtest did not test progressive
compression/decompression unless WITH_JAVA was true.
2019-08-15 13:57:36 -05:00
DRC
8ef53b102f Merge branch 'master' into dev 2019-08-14 22:08:59 -05:00
DRC
c0d0fe86d8 ChangeLog.md: Wordsmithing 2019-08-14 22:08:44 -05:00
DRC
a81a8c137b SSE2 SIMD: Fix prog Huffman enc. error if Sl%16==0
(regression introduced by 5b177b3cab)

The SSE2 implementation of progressive Huffman encoding performed
extraneous iterations when the scan length was a multiple of 16.

Based on:
bb7f1ef983

Fixes #335
Closes #367
2019-08-14 22:01:30 -05:00
DRC
02f7bcdbd1 Travis: Enable additional sanitizer tests
- Re-purpose the non-SIMD test to test with MSan as well.
- Re-purpose the ASan test to test with UBSan as well.
- Use the default Travis build environment rather than specifying Ubuntu
  14.04.  I think I added 'dist: trusty' back when 14.04 was newer than
  the default, but now it's older than the default.
- Enable verbose output for any unit tests that fail (so we can see the
  sanitizer output.)
2019-08-13 16:15:00 -05:00
DRC
7fbfe29c65 Merge branch 'master' into dev 2019-07-18 15:18:27 -05:00
DRC
5ced1f590b Travis: Cache Homebrew downloads
... to speed up the Mac CI builds.
2019-07-18 15:17:33 -05:00
DRC
b98ee19241 Travis: Update Homebrew/fix Mac CI build
Updating Homebrew wasn't necessary when the CI build first started using
the xcode8.3 image (see c8e52741fd), but
apparently it now is.
2019-07-18 14:14:47 -05:00
DRC
ad8330af72 TJBench.java: Remove space in method invocation
(to make checkstyle happy)
2019-07-12 17:29:27 -05:00
DRC
b25adabcfa tjbench.c: Fix minor code formatting errors
(introduced by 2a9e3bd743)
2019-07-12 17:28:15 -05:00
DRC
875e873f26 tjutil.c: Fix compiler warning with Visual Studio
Apparently windows.h includes stdlib.h, which defines the max() and
min() macros, so we need to ensure that tjutil.h is included after
that.
2019-07-11 17:06:08 -05:00
DRC
2a9e3bd743 TurboJPEG: Properly handle gigapixel images
Prevent several integer overflow issues and subsequent segfaults that
occurred when attempting to compress or decompress gigapixel images with
the TurboJPEG API:

- Modify tjBufSize(), tjBufSizeYUV2(), and tjPlaneSizeYUV() to avoid
  integer overflow when computing the return values and to return an
  error if such an overflow is unavoidable.
- Modify tjunittest to validate the above.
- Modify tjCompress2(), tjEncodeYUVPlanes(), tjDecompress2(), and
  tjDecodeYUVPlanes() to avoid integer overflow when computing the row
  pointers in the 64-bit TurboJPEG C API.
- Modify TJBench (both C and Java versions) to avoid overflowing the
  size argument to malloc()/new and to fail gracefully if such an
  overflow is unavoidable.

In general, this allows gigapixel images to be accommodated by the
64-bit TurboJPEG C API when using automatic JPEG buffer (re)allocation.
Such images cannot currently be accommodated without automatic JPEG
buffer (re)allocation, due to the fact that tjAlloc() accepts a 32-bit
integer argument (oops.)  Such images cannot be accommodated in the
TurboJPEG Java API due to the fact that Java always uses a signed 32-bit
integer as an array index.

Fixes #361
2019-07-11 16:56:50 -05:00
DRC
f37b7c1f96 Build: Fix build/install with Xcode IDE
Closes #355
2019-07-02 11:28:26 -05:00
Jonathan Wright
509c2680aa Use bias pattern for 4:4:0 (h1v2) fancy upsampling
This commit modifies h1v2_fancy_upsample() so that it uses an ordered
dither pattern, similar to that of h2v1_fancy_upsample(), rounding up or
down the result for alternate pixels rather than always rounding down.
This ensures that the decompression error pattern for a 4:4:0 JPEG image
will be similar to the rotated decompression error pattern for a 4:2:2
JPEG image.  Thus, the final result will be similar regardless of
whether a 4:2:2 JPEG image is rotated or transposed before or after
decompression.

Closes #356
2019-07-02 09:55:24 -05:00
DRC
ec5adb83dd Build/packaging: Support macOS package/DMG signing 2019-05-18 17:58:50 -05:00
DRC
051f4862f9 Travis: Fix iOS build
The new libjpeg-turbo 2.1.x official build scripts expect Xcode to be
under /Applications/Xcode83.app.
2019-05-09 20:46:18 -05:00
DRC
c055c88057 Discontinue support for 32-bit iOS builds 2019-05-09 20:36:51 -05:00
DRC
85d96d4478 Merge branch 'master' into dev 2019-05-09 19:13:39 -05:00
DRC
c8e52741fd Travis: Use xcode8.3 image
xcode7.3 is based on El Capitan, which is EOL, and Homebrew no longer
provides El Cap bottles (pre-compiled binaries.)  Thus, Homebrew was
trying to build GCC 5, YASM, and the other packages we need from source,
which caused the Mac CI builds to time out.  I tried goading Homebrew
into installing GCC 5.5.0_2 and YASM 1.3.0_1, which still have El Cap
bottles available, by using the URLs of those specific versions of the
formulae (from the Homebrew GitHub repository) as package names.  This
failed, however, because 'brew bundle' converted the URLs to all
lowercase.  I then tried explicitly installing the old formulae by using
'brew install' with the aforementioned URLs (bypassing the Travis
Homebrew addon), but Homebrew still tried to build all of the
dependencies from source.

Upgrading to Xcode 8.3.x necessitated regression testing the performance
on iOS, but that proved less of a pain than figuring out how to install
all of the old Homebrew bottles we needed.  Also, this future-proofs the
CI builds against the inevitable discontinuation of the xcode7.3 image.

Note that Xcode 8.3.x improves iOS 64-bit decompression performance
significantly relative to Xcode 7.2.x or 7.3.x.  iOS 32-bit performance
unfortunately regresses by as much as 5%, but it can't be helped (32-bit
iOS apps are no longer supported on iOS 11+ anyhow, and the next major
release of libjpeg-turbo will remove support for them as well.)
2019-05-09 18:09:42 -05:00
DRC
f36d531553 Merge branch 'master' into dev 2019-04-23 14:54:23 -05:00
DRC
6399d0a699 Fix code formatting/style issues ...
... including, but not limited to:
- unused macros
- private functions not marked as static
- unprototyped global functions
- variable shadowing

(detected by various non-default GCC 8 warning options)
2019-04-23 14:15:48 -05:00
Chris Blume
aa9db61677 x86 SIMD: Check for CPUID leaf 07H before using
According to Intel's manual [1], "If a value entered for CPUID.EAX is
higher than the maximum input value for basic or extended function for
that processor then the data for the highest basic information leaf is
returned."

Right now, libjpeg-turbo doesn't first check that leaf 07H is supported
before attempting to use it, so the ostensible AVX2 bit (Bit 05) of the
CPUID result might actually be Bit 05 from a lower leaf.  That bit might
be set, even if the CPU doesn't support AVX2.

This commit modifies the x86 and x86-64 SIMD feature detection code so
that it first checks whether CPUID leaf 07H is supported before
attempting to use it to check for AVX2 instruction support.

DRC:
This commit should fix
https://bugzilla.mozilla.org/show_bug.cgi?id=1520760
However, I have not personally been able to reproduce that issue,
despite using a Nehalem (pre-AVX2) CPU on which the maximum CPUID leaf
has been limited via a BIOS setting.

Closes #348

[1]
"Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A-Z", https://software.intel.com/sites/default/files/managed/a4/60/325383-sdm-vol-2abcd.pdf, page 3-192.
2019-04-16 17:07:28 -05:00
DRC
8a0e35b212 Merge branch 'master' into dev 2019-04-15 13:41:45 -05:00
DRC
2b05d47bc2 ChangeLog.md: Document 33011754 2019-04-15 13:38:15 -05:00
DRC
7bc9fca430 jdhuff.c: Silence UBSan signed int overflow err #2
Same as d3a3a73f64 but in the fast decode
path.  It was necessary to use a different-sized test image in order to
trigger the error in this location.

Refer to #347
2019-04-12 09:09:27 -05:00