The DSPr2 extensions have been verified to work with little endian MIPS.
Whether or not CMAKE_SYSTEM_PROCESSOR is set to "mips" or "mipsel" in a
little endian MIPS environment seems to be inconsistent, but our build
system needs to handle both cases.
This is basically the same test that was performed in acinclude.m4 in
the old autotools-based build system. It was not ported to the
CMake-based build system because I previously had no way of testing
a non-DSPr2 build environment.
Fixes#248
The old Un*x (autotools-based) build system always auto-generated this
file, but that behavior was more or less a relic of the days before the
libjpeg-turbo colorspace extensions were implemented. The thinking was
that, if a particular developer wanted to change RGB_RED, RGB_GREEN,
RGB_BLUE, or RGB_PIXELSIZE in order to compress from/decompress to
different RGB pixel layouts, then the SIMD extensions should
automatically respond to those changes whenever they were made to
jmorecfg.h. The modern reality is that changing RGB_* is no longer
necessary because of the libjpeg-turbo colorspace extensions, and
changing any of the other constants in jsimdcfg.inc can't be done
without making deeper modifications to the SIMD extensions. In general,
we treat RGB_* as a de facto, immutable part of the legacy libpjeg API.
Realistically, since the values of those constants have been the same in
every Un*x distribution released in the past 20-30 years, any software
that uses a system-supplied build of libjpeg must assume that those
constants will have default values.
Furthermore, even if it made sense to auto-generate jsimdcfg.inc, it was
never possible to do so on Windows, so it was always going to be
necessary to manually generate the Windows version of the file whenever
any of the constants changed. This commit introduces a new custom CMake
target called "jsimdcfg" that can be used, on Un*x platforms, to
generate jsimdcfg.inc on demand, although this should only be necessary
when introducing new x86 SIMD instructions or making other deep
modifications, such as SIMD acceleration for 12-bit JPEGs.
For those who may be wondering why we don't do the same thing for
win/jconfig.h.in, it's because performing all of the necessary CMake
checks to populate that file is very slow on Windows.
This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@3c54642 using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +7%
gcc-5 x86_64: +30%
gcc-7 x86_64: +33%
clang i386: +0%
gcc-5 i386: +24%
gcc-7 i386: +23%
SSE2
clang x86_64: +42%
gcc-5 x86_64: +53%
gcc-7 x86_64: +64%
clang i386: +35%
gcc-5 i386: +46%
gcc-7 i386: +49%
Discussion in libjpeg-turbo/libjpeg-turbo#46
Newer versions of CMake (known to be the case with 3.7.x and 3.10.x)
fail to add a space between CMAKE_C_FLAGS and CMAKE_ASM_FLAGS, which
causes the build to fail when using the official build procedure.
Closes#216
Tag 1.5.2 release
* tag '1.5.2': (54 commits)
x86: Fix "short jump is out of range" w/ NASM<2.04
TurboJPEG: Document xform issue w/ big marker data
Java TJBench: Fix parsing of -warmup argument
Build: Disable warmup in TJBench regression tests
TJBench: Improve consistency of results
TurboJPEG: C API documentation buglet
TJBench: Code formatting tweaks
TJBench: Fix errors when decomp. files w/ ICC data
BUILDING.md: Include Android/x86 build recipes
Travis: Fix OS X build
Restore compatibility with older autoconf releases
Attribute ARM runtime detection code to Nokia
Honor max_memory_to_use/JPEGMEM/-maxmemory
AppVeyor: Fix CI build
TurboJPEG: Fix potential memory leaks
Always tweak EXIF w/h tags w/ lossless transforms
Fix error w/ lossless crop & libjpeg v7 emulation
Include jpeg_skip/crop_scanlines() in jpeg7.dll
libjpeg.txt: Include partial decomp. in TOC
Slightly de-confusify cjpeg, jpegtran usage info
...
YASM requires a debug format to be specified with -g. Currently the
only combination that I can make work at all is DWARF-2/ELF (YASM
doesn't support Mach-O debugging at all, and its support for CV8/MSVC
and MinGW/DWARF-2 appears to be broken), so debugging is only enabled
automatically for ELF at the moment. For other formats, we don't
specify -g at all, which is how the old build system behaved.
Fixes#125, Closes#126
- Replace CMAKE_SOURCE_DIR with CMAKE_CURRENT_SOURCE_DIR
- Replace CMAKE_BINARY_DIR with CMAKE_CURRENT_BINARY_DIR
- Don't use "libjpeg-turbo" in any of the package system filenames
(because CMAKE_PROJECT_NAME will not be the same if building LJT as
a submodule.)
Closes#122
The previous hack (adding ${CMAKE_ASM_COMPILER} to CMAKE_ASM_FLAGS)
didn't work in all cases, because more recent versions of CMake place
the includes ahead of the flags (which meant that the real assembler
wasn't the first argument to gas-preprocessor.pl.)
Previously, simd/CMakeLists.txt was hard-coded to use NASM, and it was
necessary to override the NASM variable in order to use YASM. This
commit changes the behavior such that NASM is still preferred, but YASM
will be used if it is in the PATH and NASM isn't available. This brings
the actual behavior in line with the behavior described in BUILDING.md.
Based on
b0799a1598Closes#107
* libjpeg-turbo/master: (140 commits)
Increase severity of tjDecompressToYUV2() bug desc
Catch libjpeg errors in tjDecompressToYUV2()
BUILDING.md: Fix "... OR ..." indentation again
BUILDING.md: Fix confusing Windows build reqs
ChangeLog.md: Improve readability of plain text
change.log: Refer users to ChangeLog.md
Markdown version of ChangeLog.txt
Rename ChangeLog.txt
README.md: Link to BUILDING.md
BUILDING.md and README.md: Cosmetic tweaks
ChangeLog: "1.5 beta1" --> "1.4.90 (1.5 beta1)"
Java: Fix parallel make with autotools
Win/x64: Fix improper callee save of xmm8-xmm11
Bump TurboJPEG C API revision to 1.5
ChangeLog: Mention jpeg_crop_scanline() function
1.5 beta1
Fix v7/v8-compatible build
libjpeg API: Partial scanline decompression
Build: Make the NASM autoconf variable persistent
Use consistent/modern code formatting for dbl ptrs
...
* libjpeg-turbo/1.4.x: (94 commits)
CMakeLists.txt: Clarify that Un*x isn't supported
Catch libjpeg errors in tjDecompressToYUV2()
cjpeg: Fix buf overrun caused by bad bin PPM input
Add version/build info to global string table
Ensure that default Huffman tables are initialized
Fix memory leak when running tjunittest -yuv
Prevent overread when decoding malformed JPEG
Guard against wrap-around in alloc functions
Fix Visual C++ compiler warnings
rdppm.c: formatting tweaks
jmemmgr.c: formatting tweaks
TurboJPEG: Avoid dangling pointers
Update Android build instr. for ARMv8, PIE, etc.
Makefile.am: formatting tweak
Update build instructions for new autoconf, GitHub
1.4.3
Regression: Allow co-install of 32-bit/64-bit RPMs
Build: Use FILEPATH type for NASM CMake variable
Comment formatting tweaks
Fix 'make dist'
...
Full-color compression speedups relative to libjpeg-turbo 1.4.2:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%)
2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%)
2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%)
3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%)
3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%)
2.6 GHz AMD Athlon 64 X2 5050e:
Performance-neutral (give or take a few percent)
Full-color compression speedups relative to IPP:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%)
Refer to #42 for discussion. Numerous other approaches were attempted,
but this one proved to be the most performant across all platforms.
This commit also fixes#3 (works around, really-- the clang-compiled version
of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but
that code is now bypassed by the new SSE2 Huffman algorithm.)
Based on:
2cb4d4133036c94e050d
* commit '73edb3d734a628fd88994bc974dc6737a58bd956': (45 commits)
Rename the ARM64 assembly file to match the C file
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches: https://sourceforge.net/p/libjpeg-turbo/patches/64/
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
Clarify forward compatibility of iOS/ARM builds
ARM64 NEON SIMD support for YCC-to-RGB565 conversion
ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
Ensure that tjFree() is used for any JPEG buffers that might have been dynamically allocated by the compress/transform functions. To keep things simple, we use tjAlloc() for the statically-allocated buffer as well, so that tjFree() can always be used to free the buffer, regardless of whether it was allocated by tjbench or by the TurboJPEG library. This fixes crashes that occurred on Windows when running tjunittest or tjbench with the -alloc flag.
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
Windows doesn't have setenv(). Go, go Gadget Macros.
1.4 beta1
Fix 'make dist'
Don't use sudo when building a Debian package unless the user is non-root
Add a set of undocumented environment variables and Java system properties that allow compression features of libjpeg that are not normally exposed in the TurboJPEG API to be enabled. These features are not normally exposed because, for the most part, they aren't "turbo" features, but it is still useful to be able to benchmark them without modifying the code.
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
Extend tjbenchtest so that it tests the dynamic JPEG buffer allocation feature in TurboJPEG. Disable the tiling feature in TJBench whenever dynamic buffer allocation is enabled (because the tiling feature requires a separate buffer for each tile, using it successfully with dynamic buffer allocation would require a separate TurboJPEG compressor instance for each tile, and it's not worth going to that trouble right now.)
Run the TurboJPEG conformance tests out of a directory in /tmp (for improved performance, if the source directory is on a remote file share.) Fix an issue in TJBench.java that prevented it from working properly if the source image resided in a directory with a dot in the name.
Oops
Subtle point, but dest->outbuffer is a pointer to the address of the JPEG buffer, which is stored in the calling program. Thus, *(dest->outbuffer) will always equal *outbuffer. We need to compare *outbuffer with dest->buffer instead to determine if the pointer is being reused.
If the output buffer in the TurboJPEG destination manager was allocated by the destination manager and is being reused from a previous compression operation, then we need to get the buffer size from the previous operation, since the calling program doesn't know the actual buffer size.
Actually, we need to increase the size of BUFSIZE, not just the size of _buffer. The previous patch might have cause problems if, for instance, state->free_in_buffer was 127 but 129 bytes were compressed. In that case, only 127 of the 129 bytes would have been written to the file. Also document the fix.
...
Conflicts:
CMakeLists.txt
Makefile.am
configure.ac
jcdctmgr.c
release/deb-control.tmpl
sharedlib/CMakeLists.txt
simd/CMakeLists.txt
turbojpeg.c
* commit 'b8d044a666056d4d8d28d7a5d0805ac32b619b36': (58 commits)
Big oops. wrjpgcom on Windows was being built using the rdjpgcom source.
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
Remove VMS-specific code
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We don't support non-ANSI C compilers
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
SIMD-accelerated int upsample routine for MIPS DSPr2
Fix MIPS build
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
Further copyright header cleanup
Further copyright header cleanup
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
Clean up code formatting in the SIMD interface functions
SIMD-accelerated NULL convert routine for MIPS DSPr2
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
Fix error in MIPS DSPr2 accelerated smooth downsample routine
SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2
Minor tweak to improve code readability
...
Conflicts:
BUILDING.txt
CMakeLists.txt
Makefile.am
cdjpeg.h
cjpeg.1
cjpeg.c
configure.ac
djpeg.1
example.c
jccoefct.c
jcdctmgr.c
jchuff.c
jchuff.h
jcinit.c
jcmaster.c
jcparam.c
jcphuff.c
jidctflt.c
jpegint.h
jpeglib.h
jversion.h
libjpeg.txt
rdswitch.c
simd/CMakeLists.txt
tjbench.c
turbojpeg.c
usage.txt
wrjpgcom.c