This commit adds C and SSE2 optimizations for the encode_mcu_AC_first()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@293263c using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +19%
gcc-5 x86_64: +80%
gcc-7 x86_64: +57%
clang i386: +5%
gcc-5 i386: +59%
gcc-7 i386: +51%
SSE2
clang x86_64: +79%
gcc-5 x86_64: +158%
gcc-7 x86_64: +122%
clang i386: +71%
gcc-5 i386: +134%
gcc-7 i386: +135%
Discussion in libjpeg-turbo/libjpeg-turbo#46
This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine()
function used in progressive Huffman encoding.
The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran
All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`
Here are the results in comparison to libjpeg-turbo@3c54642 using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`
C
clang x86_64: +7%
gcc-5 x86_64: +30%
gcc-7 x86_64: +33%
clang i386: +0%
gcc-5 i386: +24%
gcc-7 i386: +23%
SSE2
clang x86_64: +42%
gcc-5 x86_64: +53%
gcc-7 x86_64: +64%
clang i386: +35%
gcc-5 i386: +46%
gcc-7 i386: +49%
Discussion in libjpeg-turbo/libjpeg-turbo#46
Within the libjpeg API code, it seems to be more the convention than not
to separate the macro name and value by two or more spaces, which
improves general readability. Making this consistent across all of
libjpeg-turbo is less about my individual preferences and more about
making it easy to automatically detect variations from our chosen
formatting convention. I intend to release the script I'm using to
validate this stuff, once it matures and stabilizes a bit.
With rare exceptions ...
- Always separate line continuation characters by one space from
preceding code.
- Always use two-space indentation. Never use tabs.
- Always use K&R-style conditional blocks.
- Always surround operators with spaces, except in raw assembly code.
- Always put a space after, but not before, a comma.
- Never put a space between type casts and variables/function calls.
- Never put a space between the function name and the argument list in
function declarations and prototypes.
- Always surround braces ('{' and '}') with spaces.
- Always surround statements (if, for, else, catch, while, do, switch)
with spaces.
- Always attach pointer symbols ('*' and '**') to the variable or
function name.
- Always precede pointer symbols ('*' and '**') by a space in type
casts.
- Use the MIN() macro from jpegint.h within the libjpeg and TurboJPEG
API libraries (using min() from tjutil.h is still necessary for
TJBench.)
- Where it makes sense (particularly in the TurboJPEG code), put a blank
line after variable declaration blocks.
- Always separate statements in one-liners by two spaces.
The purpose of this was to ease maintenance on my part and also to make
it easier for contributors to figure out how to format patch
submissions. This was admittedly confusing (even to me sometimes) when
we had 3 or 4 different style conventions in the same source tree. The
new convention is more consistent with the formatting of other OSS code
bases.
This commit corrects deviations from the chosen formatting style in the
libjpeg API code and reformats the TurboJPEG API code such that it
conforms to the same standard.
NOTES:
- Although it is no longer necessary for the function name in function
declarations to begin in Column 1 (this was historically necessary
because of the ansi2knr utility, which allowed libjpeg to be built
with non-ANSI compilers), we retain that formatting for the libjpeg
code because it improves readability when using libjpeg's function
attribute macros (GLOBAL(), etc.)
- This reformatting project was accomplished with the help of AStyle and
Uncrustify, although neither was completely up to the task, and thus
a great deal of manual tweaking was required. Note to developers of
code formatting utilities: the libjpeg-turbo code base is an
excellent test bed, because AFAICT, it breaks every single one of the
utilities that are currently available.
- The legacy (MMX, SSE, 3DNow!) assembly code for i386 has been
formatted to match the SSE2 code (refer to
ff5685d5344273df321eb63a005eaae19d2496e3.) I hadn't intended to
bother with this, but the Loongson MMI implementation demonstrated
that there is still academic value to the MMX implementation, as an
algorithmic model for other 64-bit vector implementations. Thus, it
is desirable to improve its readability in the same manner as that of
the SSE2 implementation.
* libjpeg-turbo/master: (140 commits)
Increase severity of tjDecompressToYUV2() bug desc
Catch libjpeg errors in tjDecompressToYUV2()
BUILDING.md: Fix "... OR ..." indentation again
BUILDING.md: Fix confusing Windows build reqs
ChangeLog.md: Improve readability of plain text
change.log: Refer users to ChangeLog.md
Markdown version of ChangeLog.txt
Rename ChangeLog.txt
README.md: Link to BUILDING.md
BUILDING.md and README.md: Cosmetic tweaks
ChangeLog: "1.5 beta1" --> "1.4.90 (1.5 beta1)"
Java: Fix parallel make with autotools
Win/x64: Fix improper callee save of xmm8-xmm11
Bump TurboJPEG C API revision to 1.5
ChangeLog: Mention jpeg_crop_scanline() function
1.5 beta1
Fix v7/v8-compatible build
libjpeg API: Partial scanline decompression
Build: Make the NASM autoconf variable persistent
Use consistent/modern code formatting for dbl ptrs
...
The convention used by libjpeg:
type * variable;
is not very common anymore, because it looks too much like
multiplication. Some (particularly C++ programmers) prefer to tuck the
pointer symbol against the type:
type* variable;
to emphasize that a pointer to a type is effectively a new type.
However, this can also be confusing, since defining multiple variables
on the same line would not work properly:
type* variable1, variable2; /* Only variable1 is actually a
pointer. */
This commit reformats the entirety of the libjpeg-turbo code base so
that it uses the same code formatting convention for pointers that the
TurboJPEG API code uses:
type *variable1, *variable2;
This seems to be the most common convention among C programmers, and
it is the convention used by other codec libraries, such as libpng and
libtiff.
Most of these involved overrunning the signed 32-bit JLONG type whenever
building libjpeg-turbo with a 32-bit compiler. These issues are not
believed to represent actual security threats, but eliminating them
makes it easier to detect such threats should they arise in the future.
These days, INT32 is a commonly-defined datatype in system headers. We
cannot eliminate the definition of that datatype from jmorecfg.h, since
the INT32 typedef has technically been part of the libjpeg API since
version 5 (1994.) However, using INT32 internally is risky, because the
inclusion of a particular header (Xmd.h, for instance) could change the
definition of INT32 from long to int on 64-bit platforms and thus change
the internal behavior of libjpeg-turbo in unexpected ways (for instance,
failing to correctly set __INT32_IS_ACTUALLY_LONG to match the INT32
typedef-- perhaps as a result of including the wrong version of
jpeglib.h-- could cause libjpeg-turbo to produce incorrect results.)
The library has always been built in environments in which INT32 is
effectively long (on Windows, long is always 32-bit, so effectively it's
the same as int), so it makes sense to turn INT32 into an explicitly
long datatype. This ensures that libjpeg-turbo will always behave
consistently, regardless of the headers included at compile time.
Addresses a concern expressed in #26.
The IJG README file has been renamed to README.ijg, in order to avoid
confusion (many people were assuming that that was our project's README
file and weren't reading README-turbo.txt) and to lay the groundwork for
markdown versions of the libjpeg-turbo README and build instructions.
For whatever reason, some of these files didn't get fully merged from
libjpeg-turbo 1.4. They still contained tab characters and other formatting
conventions from libjpeg-turbo 1.3. This patch also fixes some obvious
indentation errors in the mozjpeg-specific code. There is more formatting work
that needs to be done to the mozjpeg-specific code, to fix line overruns,
incorrect operator whitespace, and other issues that make it not consistent
with the libjpeg/libjpeg-turbo code.
* commit 'b8d044a666056d4d8d28d7a5d0805ac32b619b36': (58 commits)
Big oops. wrjpgcom on Windows was being built using the rdjpgcom source.
Prevent a buffer overrun if the comment begins with a literal quote character and the string exceeds 65k characters. Also prevent comments longer than 65k characters from being written, since this will produce an incorrect JPEG file.
Remove VMS-specific code
Our copyright string is longer than JMSG_LENGTH_MAX, and this was causing a buffer overrun if output_message() was called with msg_code set to JMSG_COPYRIGHT, or if format_message() was called with msg_code set to JMSG_COPYRIGHT and with a buffer of length JMSG_LENGTH_MAX.
We don't support non-ANSI C compilers
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
SIMD-accelerated int upsample routine for MIPS DSPr2
Fix MIPS build
libjpeg-turbo has never supported non-ANSI compilers, so get rid of the crufty SIZEOF() macro. It was not being used consistently anyhow, so it would not have been possible to build prior releases of libjpeg-turbo using the broken compilers for which that macro was designed.
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
Further copyright header cleanup
Further copyright header cleanup
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
Clean up code formatting in the SIMD interface functions
SIMD-accelerated NULL convert routine for MIPS DSPr2
Fix build, which was broken by the checkin of the MIPS DSPr2 accelerated smooth downsampling routine. Until/unless other platforms include SIMD support for that function, it's just easier to #ifdef around it rather than adding stubs for the other platforms.
Fix error in MIPS DSPr2 accelerated smooth downsample routine
SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2
Minor tweak to improve code readability
...
Conflicts:
BUILDING.txt
CMakeLists.txt
Makefile.am
cdjpeg.h
cjpeg.1
cjpeg.c
configure.ac
djpeg.1
example.c
jccoefct.c
jcdctmgr.c
jchuff.c
jchuff.h
jcinit.c
jcmaster.c
jcparam.c
jcphuff.c
jidctflt.c
jpegint.h
jpeglib.h
jversion.h
libjpeg.txt
rdswitch.c
simd/CMakeLists.txt
tjbench.c
turbojpeg.c
usage.txt
wrjpgcom.c
Derivation of sign value relies on shift right operator >> being an
arithmetic shift. It is thus not strictly portable since the C standard
defines the result of x >> y as "implementation-defined" when x is a
signed integer with a negative value.
Trellis quantization is modified:
- to work on the configurable spectral range Ss to Se
- to optionally optimize runs of EOBs
- to optionally split optimization between 2 spectral ranges
In trellis quantization passes Huffman table code optimization is
modified such as to generable a valid code length for each possible
symbol by resetting frequency counters to 1 instead of 0