Files
mozjpeg/simd/arm/neon-compat.h.in
DRC 2a2970af67 Neon/AArch32: Work around Clang T32 miscompilation
Referring to the C standard
(http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf,
J.2 Undefined behavior), the behavior of the compiler is undefined if
"conversion between two pointer types produces a result that is
incorrectly aligned."  Thus, the behavior of this code

  *((uint32_t *)buffer) = BUILTIN_BSWAP32(put_buffer);

in the AArch32 version of the FLUSH() macro is undefined unless 'buffer'
is 32-bit-aligned.  Referring to
https://bugs.llvm.org/show_bug.cgi?id=50785, certain versions of Clang,
when generating Thumb (T32) instructions, miscompile that code into an
assembly instruction (stm) that requires the destination to be
32-bit-aligned.  Since such alignment cannot be guaranteed within the
Huffman encoder, this reportedly led to crashes (SIGBUS: illegal
alignment) with AArch32/Thumb builds of libjpeg-turbo running on Android
devices, although thus far I have been unable to reproduce those crashes
with a plain Linux/Arm system.

The miscompilation is visible with the Compiler Explorer:
https://godbolt.org/z/rv1ccx1Pb
However, it goes away when removing the return statement from the
function.  Thus, it seems that Clang's behavior in this regard is
somewhat variable, which may explain why the crashes are only
reproducible on certain platforms.

The suggested workaround is to use memcpy(), but whereas Clang and
recent GCC releases are smart enough to compile a 4-byte memcpy() call
into a str instruction, GCC < 6 is not.  Referring to
https://godbolt.org/z/ae7Wje3P6, the only way to consistently produce
the desired str instruction across all supported compilers is to use
inline assembly.  Visual C++ presumably does not miscompile the code in
question, since no issues have been reported with it, but since the code
relies on undefined compiler behavior, prudence dictates that
e4ec23d7ae should be reverted for Visual
C++, which this commit does.  The performance impact of
e4ec23d7ae for Visual C++/Arm builds is
unknown (I have no ability to test such builds), but regardless, this
commit reverts the Visual C++/Arm performance to that of libjpeg-turbo
2.1 beta1.

Closes #529
2021-07-09 19:37:58 -05:00

38 lines
1.5 KiB
C

/*
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#cmakedefine HAVE_VLD1_S16_X3
#cmakedefine HAVE_VLD1_U16_X2
#cmakedefine HAVE_VLD1Q_U8_X4
/* Define compiler-independent count-leading-zeros and byte-swap macros */
#if defined(_MSC_VER) && !defined(__clang__)
#define BUILTIN_CLZ(x) _CountLeadingZeros(x)
#define BUILTIN_CLZLL(x) _CountLeadingZeros64(x)
#define BUILTIN_BSWAP64(x) _byteswap_uint64(x)
#elif defined(__clang__) || defined(__GNUC__)
#define BUILTIN_CLZ(x) __builtin_clz(x)
#define BUILTIN_CLZLL(x) __builtin_clzll(x)
#define BUILTIN_BSWAP64(x) __builtin_bswap64(x)
#else
#error "Unknown compiler"
#endif