Commit Graph

394 Commits

Author SHA1 Message Date
DRC
35ed3c97b2 SIMD: Formatting tweaks + remove unnecessary code
+ "JSIMD_ARM_NEON" = "JSIMD_NEON"
+ "JSIMD_MIPS_DSPR2" = "JSIMD_DSPR2"
+ "*_mips_dspr2" = "*_dspr2"

It's obvious that "NEON" refers to Arm and "DSPr2" refers to MIPS, and
this naming convention is consistent with the other SIMD extensions.
2018-03-01 18:53:58 -06:00
DRC
3c54642c81 Fix iOS/ARM[-64] build w/ newer versions of CMake
Newer versions of CMake (known to be the case with 3.7.x and 3.10.x)
fail to add a space between CMAKE_C_FLAGS and CMAKE_ASM_FLAGS, which
causes the build to fail when using the official build procedure.

Closes #216
2018-02-27 11:40:05 -06:00
DRC
367a838626 Make SIMD syms private for x86[-64]/Mach-O builds
... if building with YASM.  NASM doesn't currently support the necessary
directives.

Closes #212
2018-02-26 21:02:55 -06:00
DRC
7c2bfdb040 Merge branch 'master' into dev
Closes #214
2018-02-26 18:43:40 -06:00
mayeut
0dd9a2c1fd Fix Win64 ABI conformance when using xmm8-xmm11
Referring to https://docs.microsoft.com/en-US/cpp/build/stack-usage:

"All memory beyond the current address of RSP is considered volatile:
The OS, or a debugger, may overwrite this memory during a user debug
session, or an interrupt handler.  Thus, RSP must always be set before
attempting to read or write values to a stack frame."

Basically, if-- under extremely rare circumstances-- a context swap were
to occur between saving the values of xmm8-xmm11 and setting the new
value of rsp, the O/S might not preserve that area of the stack.  In
general, libjpeg-turbo should not be using xmm8-xmm11 before or after
the call to jsimd_huff_encode_one_block_sse2(), so this is probably a
non-issue, but it's still a good idea to fix it.

Based on
ff7d2030dd
2018-02-26 18:00:15 -06:00
mayeut
4c4dc6149b Fix Win64 ABI conformance issue in AVX2 ISLOW IDCT
xmm8-xmm11 must be saved and restored, since the function uses
ymm8-ymm11.

Closes #211
2018-02-26 12:00:56 -06:00
mayeut
feaec37d32 Fix build with YASM
vinserti128 requires all operands to be specified
2018-02-24 16:50:03 -06:00
mayeut
b6909ab3f7 Make SIMD symbols private for MIPS ELF builds
Closes #210
2018-02-23 18:49:08 -06:00
mayeut
9bef5df776 Make SIMD symbols private for iOS ARM/ARM64 builds 2018-02-23 18:39:06 -06:00
mayeut
88421563ad Make SIMD symbols private for x86[-64] ELF builds 2018-02-23 18:37:46 -06:00
DRC
9cdec16ceb 32-bit AVX2 implementation of slow int inverse DCT 2018-02-23 15:19:16 -06:00
DRC
845fe8bf80 32-bit AVX2 buglet: IS_ALIGNED_SSE=IS_ALIGNED_AVX 2018-02-23 12:24:10 -06:00
DRC
de9e9db6a5 64-bit AVX2 implementation of slow int inverse DCT 2018-02-23 11:50:11 -06:00
DRC
715b7c38a8 32-bit AVX2 implementation of int sample conv. 2018-02-19 00:24:53 -06:00
DRC
ca387e7fda 32-bit AVX2 implementation of slow int forward DCT 2018-02-19 00:00:55 -06:00
DRC
39e9e65c5b 64-bit AVX2 implementation of int sample conv. 2018-02-18 23:30:14 -06:00
DRC
264dd42a98 64-bit AVX2 implementation of slow int forward DCT 2018-02-18 23:30:08 -06:00
DRC
ff392d81ef AVX2: Introduce YMMBLOCK macro for readability 2018-02-17 17:29:38 -06:00
DRC
bf6c774305 Fix whitespace errors 2017-12-13 21:48:54 -06:00
DRC
51cc89fa7b Merge branch 'master' into dev 2017-09-01 09:02:55 -05:00
DRC
1d93541617 Build: Use -maltivec when testing AltiVec support
Doesn't seem to be necessary with recent Linux/GCC configurations, but
it is definitely necessary with OS X.
2017-09-01 08:55:33 -05:00
DRC
c0f3512d5a Merge branch 'master' into dev 2017-09-01 07:12:51 -05:00
DRC
e5c1613ccd x86: Fix "short jump is out of range" w/ NASM<2.04 2017-07-07 15:28:49 -05:00
DRC
2ac4e9d914 Merge branch 'master' into dev 2017-06-26 22:03:32 -05:00
DRC
9d64f3c60b Attribute ARM runtime detection code to Nokia
This code was submitted in the initial ARM NEON patches
(https://sourceforge.net/p/libjpeg-turbo/patches/7/) by Siarhei while he
was still a Nokia employee.
2017-04-24 14:42:58 -05:00
DRC
8a9b042b26 Merge branch 'master' into dev 2016-12-10 09:35:30 -06:00
DRC
2b29bca2a9 Build: Fix Debug/RelWithDebInfo build with YASM
YASM requires a debug format to be specified with -g.  Currently the
only combination that I can make work at all is DWARF-2/ELF (YASM
doesn't support Mach-O debugging at all, and its support for CV8/MSVC
and MinGW/DWARF-2 appears to be broken), so debugging is only enabled
automatically for ELF at the moment.  For other formats, we don't
specify -g at all, which is how the old build system behaved.

Fixes #125, Closes #126
2016-12-07 18:18:35 -06:00
DRC
786b649331 Reorg AltiVec detection code
+ advertise that full AltiVec SIMD acceleration is now available on
OpenBSD.

The relevant compilers probably all support C99 or GNU's variation of
C90 that allows variables to be declared anywhere, but our policy is to
conform to the C90 standard, if for no other reason than that it
improves code readability.
2016-12-05 13:14:19 -06:00
Donovan Watteau
f4ba09b33a Detect AltiVec support on OpenBSD 2016-12-05 12:35:15 -06:00
DRC
952191da79 Build: Fix issues when building as a Git submodule
- Replace CMAKE_SOURCE_DIR with CMAKE_CURRENT_SOURCE_DIR
- Replace CMAKE_BINARY_DIR with CMAKE_CURRENT_BINARY_DIR
- Don't use "libjpeg-turbo" in any of the package system filenames
  (because CMAKE_PROJECT_NAME will not be the same if building LJT as
  a submodule.)

Closes #122
2016-12-03 15:21:27 -06:00
DRC
059c9a5f2a Build: Fix regression in AltiVec SIMD detection
Only the SIMD source files should be built with -maltivec.  Otherwise
the detection code will not be compiled in.
2016-12-03 15:19:41 -06:00
DRC
94686e3c0f Build: Use wrapper script for gas-preprocessor.pl
The previous hack (adding ${CMAKE_ASM_COMPILER} to CMAKE_ASM_FLAGS)
didn't work in all cases, because more recent versions of CMake place
the includes ahead of the flags (which meant that the real assembler
wasn't the first argument to gas-preprocessor.pl.)
2016-11-25 18:54:55 -06:00
DRC
6abd39160c Unified CMake-based build system
See #56 for discussion.

Fixes #21, Fixes #29, Fixes #37, Closes #56, Fixes #58, Closes #73
Obviates #82

See also:
https://sourceforge.net/p/libjpeg-turbo/feature-requests/5/
https://sourceforge.net/p/libjpeg-turbo/patches/5/
2016-11-22 13:06:30 -06:00
DRC
9fdb8f8553 Merge branch 'master' into dev 2016-11-22 09:33:19 -06:00
Chris Young
4ad94b2963 Detect AltiVec support on AmigaOS 4 2016-11-18 13:03:28 -06:00
DRC
108b1cd9ba Merge branch 'master' into dev 2016-10-20 01:37:40 -05:00
DRC
13e6b151b0 Win: Use YASM if it is in the PATH and NASM isn't
Previously, simd/CMakeLists.txt was hard-coded to use NASM, and it was
necessary to override the NASM variable in order to use YASM.  This
commit changes the behavior such that NASM is still preferred, but YASM
will be used if it is in the PATH and NASM isn't available.  This brings
the actual behavior in line with the behavior described in BUILDING.md.

Based on
b0799a1598

Closes #107
2016-10-11 11:58:20 -05:00
DRC
ed21f4bd03 Merge branch 'master' into dev 2016-10-05 14:41:14 -05:00
DRC
f34f2f5bc6 Fix 'make dist' 2016-10-05 13:36:35 -05:00
DRC
7bfb22af12 Fix broken MIPS build
Regression introduced by 9055fb408d

Fixes #104
2016-09-26 18:01:54 -05:00
DRC
6c36568626 Merge branch 'master' into dev 2016-09-20 18:09:15 -05:00
mayeut
cb88e5da80 ARM64 NEON: Fix another ABI conformance issue
Based on
98a5a9dc89
with wordsmithing by DRC.

In the AArch64 ABI, as in many others, it's forbidden to read/store data
below the stack pointer.  Some SIMD functions were doing just that
(stack pointer misuse) when trying to preserve callee-saved registers,
and this resulted in those registers being restored with incorrect
contents under certain circumstances.

This patch fixes that behavior, and callee-saved registers are now
stored above the stack pointer throughout the function call.  The patch
also removes register saving in places where it is unnecessary for this
ABI, or it makes use of unused scratch regiters instead of callee-saved
registers.

Fixes #97.  Closes #101.

Refer also to https://bugzilla.redhat.com/show_bug.cgi?id=1368569
2016-09-20 17:38:39 -05:00
DRC
3924ebceb5 AVX2: Perform additional checks for O/S support
cpuid tells us whether the O/S uses extended state management via
XSAVE/XRSTOR, but we have to call xgetbv to verify that it is using
XSAVE/XRSTOR to manage the state of XMM/YMM registers.
2016-07-13 16:03:36 -05:00
DRC
1120ff29a1 Fix AArch64 ABI conformance issue in SIMD code
In the AArch64 ABI, the high (unused) DWORD of a 32-bit argument's
register is undefined, so it was incorrect to use 64-bit
instructions to transfer a JDIMENSION argument in the 64-bit NEON SIMD
functions.  The code worked thus far only because the existing compiler
optimizers weren't smart enough to do anything else with the register in
question, so the upper 32 bits happened to be all zeroes.

The latest builds of Clang/LLVM have a smarter optimizer, and under
certain circumstances, it will attempt to load-combine adjacent 32-bit
integers from one of the libjpeg structures into a single 64-bit integer
and pass that 64-bit integer as a 32-bit argument to one of the SIMD
functions (which is allowed by the ABI, since the upper 32 bits of the
32-bit argument's register are undefined.)  This caused the
libjpeg-turbo regression tests to crash.

This patch tries to use the Wn registers whenever possible.  Otherwise,
it uses a zero-extend instruction to avoid using the upper 32 bits of
the 64-bit registers, which are not guaranteed to be valid for 32-bit
arguments.

Based on 1fbae13021

Closes #91.  Refer also to android-ndk/ndk#110 and
https://llvm.org/bugs/show_bug.cgi?id=28393
2016-07-13 14:36:19 -05:00
DRC
3dcb85ee9a AVX2: Verify O/S support for AVX2 before enabling
This fixes crashes that would occur when attempting to use
libjpeg-turbo's AVX2 extensions on older O/S's (such as Windows XP or
RHEL 5.)  Even if the CPU supports AVX2, the O/S has to also support
saving/restoring YMM registers when switching contexts.
2016-07-11 20:26:34 -05:00
DRC
1be87b6273 Reformat jsimdcpu[-64].asm to improve readability 2016-07-11 19:42:37 -05:00
DRC
b331385e8a Merge branch 'master' into dev 2016-07-11 13:11:25 -05:00
DRC
b2921f1bcc 32-bit AVX2 implementation of integer quantization 2016-07-08 21:28:48 -05:00
DRC
eaae2cdb16 64-bit AVX2 implementation of integer quantization 2016-07-08 21:15:27 -05:00
DRC
a7c2f97939 AVX2: Avoid expensive AVX-SSE transitions
Refer to
https://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf
for more information.  This eliminates all AVX-SSE transitions detected
with the Intel SDE tool.
2016-07-08 20:10:24 -05:00