Commit Graph

14 Commits

Author SHA1 Message Date
Kornel
97e2e1df43 Allow disabling fopen in SIMD 2025-01-03 10:04:01 +00:00
DRC
78a36f6dc3 Fix buffer overrun in 12-bit prog Huffman encoder
Regression introduced by 16bd984557 and
5b177b3cab

The pre-computed absolute values used in encode_mcu_AC_first() and
encode_mcu_AC_refine() were stored in a JCOEF (signed short) array.
When attempting to losslessly transform a specially-crafted malformed
12-bit JPEG image with a coefficient value of -32768 into a progressive
12-bit JPEG image, the progressive Huffman encoder attempted to store
the absolute value of -32768 in the JCOEF array, thus overflowing the
16-bit signed data type.  Therefore, at this point in the code:
8c5e78ce29/jcphuff.c (L889)
the absolute value was read as -32768, which caused the test at
8c5e78ce29/jcphuff.c (L896)
to fail, falling through to
8c5e78ce29/jcphuff.c (L908)
with an overly large value of r (46) that, when shifted left four
places, incremented, and passed to emit_symbol(), exceeded the maximum
index (255) for the derived code tables.  Fortunately, the buffer
overrun was fully contained within phuff_entropy_encoder, so the issue
did not generate a segfault or other user-visible errant behavior, but
it did cause a UBSan failure that was detected by OSS-Fuzz.

This commit introduces an unsigned JCOEF (UJCOEF) data type and uses it
to store the absolute values of DCT coefficients computed by the
AC_first_prepare() and AC_refine_prepare() methods.

Note that the changes to the Arm Neon progressive Huffman encoder
extensions cause signed 16-bit instructions to be replaced with
equivalent unsigned 16-bit instructions, so the changes should be
performance-neutral.

Based on:
bbf61c0382

Closes #628
2022-11-15 19:07:50 -06:00
DRC
f579cc11b3 Make SIMD capability variables thread-local ...
... on platforms that support TLS, which should include all
currently-supported platforms
(https://libjpeg-turbo.org/Documentation/OfficialBinaries)

Addresses a concern raised in #87

Although it is still my opinion that the data race in init_simd() was
innocuous, we can now fix it for free thanks to
ae87a95861, so why not?
2022-10-03 21:36:21 -05:00
DRC
9abeff46d8 Remove extraneous #include directives
jinclude.h already includes stdio.h, stdlib.h, and string.h.
2022-03-09 11:49:27 -06:00
DRC
035262a18d MIPS DSPr2: Work around various 'make test' errors
Referring to #408, this commit #ifdefs DSPr2 SIMD functions that only
work on little endian processors, and it completely excludes
jsimd_h2v1_downsample_dspr2() and jsimd_h2v2_downsample_dspr2().  The
latter two functions fail with the TJBench tiling regression tests, most
likely because the implementation of the functions predates those tests.
2020-02-17 18:13:31 -06:00
DRC
ed7cab47d9 MIPS DSPr2: Fix compiler warning with -mdspr2
If -mdspr2 is passed to the compiler, __mips_dsp will be defined, and
__mips_dsp_rev will be >= 2, so parse_proc_cpuinfo() will not be used.
2020-02-17 16:35:00 -06:00
DRC
42d679b9fc MIPS SIMD: Always honor JSIMD_FORCE* env vars
Previously, these environment variables were not honored unless a 74K
CPU was detected, but this detection doesn't work properly with QEMU's
user mode emulation.  With all other CPU types, libjpeg-turbo honors
JSIMD_FORCE* regardless of CPU detection.
2020-02-17 15:19:32 -06:00
DRC
3bef88f6ec Fix MIPS DSPr2 build when using soft float ABI
(for instance, when passing -msoft-float to the compiler)

The instructions used by jsimd_quantize_float_dspr2() and
jsimd_convsamp_float_dspr2() don't work with the soft float ABI, so
disable those functions when soft float is enabled.

Based on:
129a739bfa

Closes #272
2018-09-04 18:03:00 -05:00
mayeut
5b177b3cab C/SSE2 optimization of encode_mcu_AC_first()
This commit adds C and SSE2 optimizations for the encode_mcu_AC_first()
function used in progressive Huffman encoding.

The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran

All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`

Here are the results in comparison to libjpeg-turbo@293263c using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`

C
clang x86_64: +19%
gcc-5 x86_64: +80%
gcc-7 x86_64: +57%
clang i386: +5%
gcc-5 i386: +59%
gcc-7 i386: +51%

SSE2
clang x86_64: +79%
gcc-5 x86_64: +158%
gcc-7 x86_64: +122%
clang i386: +71%
gcc-5 i386: +134%
gcc-7 i386: +135%

Discussion in libjpeg-turbo/libjpeg-turbo#46
2018-03-22 15:49:23 -05:00
mayeut
16bd984557 C/SSE2 optimization of encode_mcu_AC_refine()
This commit adds C and SSE2 optimizations for the encode_mcu_AC_refine()
function used in progressive Huffman encoding.

The image used for testing can be retrieved from this page:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran

All timings done on `Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz`
clang version is `Apple LLVM version 9.0.0 (clang-900.0.39.2)`
gcc-5 version is `gcc-5 (Homebrew GCC 5.5.0) 5.5.0`
gcc-7 version is `gcc-7 (Homebrew GCC 7.2.0) 7.2.0`

Here are the results in comparison to libjpeg-turbo@3c54642 using
`time ./jpegtran -outfile /dev/null -progressive -optimise -copy none print_poster_0025.jpg`

C
clang x86_64: +7%
gcc-5 x86_64: +30%
gcc-7 x86_64: +33%
clang i386: +0%
gcc-5 i386: +24%
gcc-7 i386: +23%

SSE2
clang x86_64: +42%
gcc-5 x86_64: +53%
gcc-7 x86_64: +64%
clang i386: +35%
gcc-5 i386: +46%
gcc-7 i386: +49%

Discussion in libjpeg-turbo/libjpeg-turbo#46
2018-03-22 13:05:55 -05:00
DRC
84fbd4f1ed Merge branch 'master' into dev 2018-03-17 00:27:49 -05:00
DRC
19c791cdac Improve code formatting consistency
With rare exceptions ...
- Always separate line continuation characters by one space from
  preceding code.
- Always use two-space indentation.  Never use tabs.
- Always use K&R-style conditional blocks.
- Always surround operators with spaces, except in raw assembly code.
- Always put a space after, but not before, a comma.
- Never put a space between type casts and variables/function calls.
- Never put a space between the function name and the argument list in
  function declarations and prototypes.
- Always surround braces ('{' and '}') with spaces.
- Always surround statements (if, for, else, catch, while, do, switch)
  with spaces.
- Always attach pointer symbols ('*' and '**') to the variable or
  function name.
- Always precede pointer symbols ('*' and '**') by a space in type
  casts.
- Use the MIN() macro from jpegint.h within the libjpeg and TurboJPEG
  API libraries (using min() from tjutil.h is still necessary for
  TJBench.)
- Where it makes sense (particularly in the TurboJPEG code), put a blank
  line after variable declaration blocks.
- Always separate statements in one-liners by two spaces.

The purpose of this was to ease maintenance on my part and also to make
it easier for contributors to figure out how to format patch
submissions.  This was admittedly confusing (even to me sometimes) when
we had 3 or 4 different style conventions in the same source tree.  The
new convention is more consistent with the formatting of other OSS code
bases.

This commit corrects deviations from the chosen formatting style in the
libjpeg API code and reformats the TurboJPEG API code such that it
conforms to the same standard.

NOTES:
- Although it is no longer necessary for the function name in function
  declarations to begin in Column 1 (this was historically necessary
  because of the ansi2knr utility, which allowed libjpeg to be built
  with non-ANSI compilers), we retain that formatting for the libjpeg
  code because it improves readability when using libjpeg's function
  attribute macros (GLOBAL(), etc.)
- This reformatting project was accomplished with the help of AStyle and
  Uncrustify, although neither was completely up to the task, and thus
  a great deal of manual tweaking was required.  Note to developers of
  code formatting utilities:  the libjpeg-turbo code base is an
  excellent test bed, because AFAICT, it breaks every single one of the
  utilities that are currently available.
- The legacy (MMX, SSE, 3DNow!) assembly code for i386 has been
  formatted to match the SSE2 code (refer to
  ff5685d5344273df321eb63a005eaae19d2496e3.)  I hadn't intended to
  bother with this, but the Loongson MMI implementation demonstrated
  that there is still academic value to the MMX implementation, as an
  algorithmic model for other 64-bit vector implementations.  Thus, it
  is desirable to improve its readability in the same manner as that of
  the SSE2 implementation.
2018-03-16 02:14:34 -05:00
DRC
35ed3c97b2 SIMD: Formatting tweaks + remove unnecessary code
+ "JSIMD_ARM_NEON" = "JSIMD_NEON"
+ "JSIMD_MIPS_DSPR2" = "JSIMD_DSPR2"
+ "*_mips_dspr2" = "*_dspr2"

It's obvious that "NEON" refers to Arm and "DSPr2" refers to MIPS, and
this naming convention is consistent with the other SIMD extensions.
2018-03-01 18:53:58 -06:00
DRC
6abd39160c Unified CMake-based build system
See #56 for discussion.

Fixes #21, Fixes #29, Fixes #37, Closes #56, Fixes #58, Closes #73
Obviates #82

See also:
https://sourceforge.net/p/libjpeg-turbo/feature-requests/5/
https://sourceforge.net/p/libjpeg-turbo/patches/5/
2016-11-22 13:06:30 -06:00