Merge commit '8a2cad020171184a49fa8696df0b9e267f1cf2f6'

* commit '8a2cad020171184a49fa8696df0b9e267f1cf2f6': (99 commits)
  Build: Handle CMAKE_OSX_ARCHITECTURES=(i386|ppc)
  Add Sponsor button for GitHub repository
  Build: Support CMAKE_OSX_ARCHITECTURES
  cjpeg: Fix FPE when compressing 0-width GIF
  Fix build with Visual C++ and /std:c11 or /std:c17
  Neon: Fix Huffman enc. error w/Visual Studio+Clang
  Use CLZ compiler intrinsic for Windows/Arm builds
  Build: Use correct SIMD exts w/VStudio IDE + Arm64
  jcphuff.c: Fix compiler warning with clang-cl
  Migrate from Travis CI to GitHub Actions
  tjexample.c: Fix mem leak if tjTransform() fails
  Build: Officially support Ninja
  decompress_smooth_data(): Fix another uninit. read
  LICENSE.md: Remove trailing whitespace
  Build: Test for correct AArch32 RPM/DEBARCH value
  LICENSE.md: Formatting tweak
  Fix uninitialized read in decompress_smooth_data()
  Fix buffer overrun with certain narrow prog JPEGs
  Bump revision to 2.0.91 for post-beta fixes
  Travis: Use Docker tag that matches Git branch
  ...
This commit is contained in:
Kornel
2021-02-26 21:30:09 +00:00
166 changed files with 17693 additions and 12607 deletions

1
.gitattributes vendored
View File

@@ -2,3 +2,4 @@
/appveyor.yml export-ignore /appveyor.yml export-ignore
/ci export-ignore /ci export-ignore
/.gitattributes export-ignore /.gitattributes export-ignore
*.ppm binary

View File

@@ -12,10 +12,7 @@ Build Requirements
- [NASM](http://www.nasm.us) or [YASM](http://yasm.tortall.net) - [NASM](http://www.nasm.us) or [YASM](http://yasm.tortall.net)
(if building x86 or x86-64 SIMD extensions) (if building x86 or x86-64 SIMD extensions)
* If using NASM, 2.10 or later is required. * If using NASM, 2.13 or later is required.
* If using NASM, 2.10 or later (except 2.11.08) is required for an x86-64 Mac
build (2.11.08 does not work properly with libjpeg-turbo's x86-64 SIMD code
when building macho64 objects.)
* If using YASM, 1.2.0 or later is required. * If using YASM, 1.2.0 or later is required.
* If building on macOS, NASM or YASM can be obtained from * If building on macOS, NASM or YASM can be obtained from
[MacPorts](http://www.macports.org/) or [Homebrew](http://brew.sh/). [MacPorts](http://www.macports.org/) or [Homebrew](http://brew.sh/).
@@ -49,10 +46,8 @@ Build Requirements
- If building the TurboJPEG Java wrapper, JDK or OpenJDK 1.5 or later is - If building the TurboJPEG Java wrapper, JDK or OpenJDK 1.5 or later is
required. Most modern Linux distributions, as well as Solaris 10 and later, required. Most modern Linux distributions, as well as Solaris 10 and later,
include JDK or OpenJDK. On OS X 10.5 and 10.6, it will be necessary to include JDK or OpenJDK. For other systems, you can obtain the Oracle Java
install the Java Developer Package, which can be downloaded from Development Kit from
<http://developer.apple.com/downloads> (Apple ID required.) For other
systems, you can obtain the Oracle Java Development Kit from
<http://www.oracle.com/technetwork/java/javase/downloads>. <http://www.oracle.com/technetwork/java/javase/downloads>.
* If using JDK 11 or later, CMake 3.10.x or later must also be used. * If using JDK 11 or later, CMake 3.10.x or later must also be used.
@@ -62,22 +57,22 @@ Build Requirements
- Microsoft Visual C++ 2005 or later - Microsoft Visual C++ 2005 or later
If you don't already have Visual C++, then the easiest way to get it is by If you don't already have Visual C++, then the easiest way to get it is by
installing the installing
[Windows SDK](http://msdn.microsoft.com/en-us/windows/bb980924.aspx). [Visual Studio Community Edition](https://visualstudio.microsoft.com),
The Windows SDK includes both 32-bit and 64-bit Visual C++ compilers and which includes everything necessary to build libjpeg-turbo.
everything necessary to build libjpeg-turbo.
* You can also use Microsoft Visual Studio Express/Community Edition, which * You can also download and install the standalone Windows SDK (for Windows 7
is a free download. (NOTE: versions prior to 2012 can only be used to or later), which includes command-line versions of the 32-bit and 64-bit
build 32-bit code.) Visual C++ compilers.
* If you intend to build libjpeg-turbo from the command line, then add the * If you intend to build libjpeg-turbo from the command line, then add the
appropriate compiler and SDK directories to the `INCLUDE`, `LIB`, and appropriate compiler and SDK directories to the `INCLUDE`, `LIB`, and
`PATH` environment variables. This is generally accomplished by `PATH` environment variables. This is generally accomplished by
executing `vcvars32.bat` or `vcvars64.bat` and `SetEnv.cmd`. executing `vcvars32.bat` or `vcvars64.bat`, which are located in the same
`vcvars32.bat` and `vcvars64.bat` are part of Visual C++ and are located in directory as the compiler.
the same directory as the compiler. `SetEnv.cmd` is part of the Windows * If built with Visual C++ 2015 or later, the libjpeg-turbo static libraries
SDK. You can pass optional arguments to `SetEnv.cmd` to specify a 32-bit cannot be used with earlier versions of Visual C++, and vice versa.
or 64-bit build environment. * The libjpeg API DLL (**jpeg{version}.dll**) will depend on the C run-time
DLLs corresponding to the version of Visual C++ that was used to build it.
... OR ... ... OR ...
@@ -120,6 +115,13 @@ directory, whereas *{source_directory}* refers to the libjpeg-turbo source
directory. For in-tree builds, these directories are the same. directory. For in-tree builds, these directories are the same.
Ninja
-----
In all of the procedures and recipes below, replace `make` with `ninja` and
`Unix Makefiles` with `Ninja` if using Ninja.
Build Procedure Build Procedure
--------------- ---------------
@@ -345,7 +347,7 @@ Build Recipes
------------- -------------
### 32-bit Build on 64-bit Linux/Unix/Mac ### 32-bit Build on 64-bit Linux/Unix
Use export/setenv to set the following environment variables before running Use export/setenv to set the following environment variables before running
CMake: CMake:
@@ -417,103 +419,9 @@ compression/decompression. This section describes how to build libjpeg-turbo
for these platforms. for these platforms.
### Additional build requirements
- For configurations that require [gas-preprocessor.pl]
(https://raw.githubusercontent.com/libjpeg-turbo/gas-preprocessor/master/gas-preprocessor.pl),
it should be installed in your `PATH`.
### Armv7 (32-bit)
**gas-preprocessor.pl required**
The following scripts demonstrate how to build libjpeg-turbo to run on the
iPhone 3GS-4S/iPad 1st-3rd Generation and newer:
#### Xcode 4.2 and earlier (LLVM-GCC)
IOS_PLATFORMDIR=/Developer/Platforms/iPhoneOS.platform
IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk)
export CFLAGS="-mfloat-abi=softfp -march=armv7 -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -miphoneos-version-min=3.0"
cd {build_directory}
cat <<EOF >toolchain.cmake
set(CMAKE_SYSTEM_NAME Darwin)
set(CMAKE_SYSTEM_PROCESSOR arm)
set(CMAKE_C_COMPILER ${IOS_PLATFORMDIR}/Developer/usr/bin/arm-apple-darwin10-llvm-gcc-4.2)
EOF
cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake \
-DCMAKE_OSX_SYSROOT=${IOS_SYSROOT[0]} \
[additional CMake flags] {source_directory}
make
#### Xcode 4.3-4.6 (LLVM-GCC)
Same as above, but replace the first line with:
IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
#### Xcode 5 and later (Clang)
IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk)
export CFLAGS="-mfloat-abi=softfp -arch armv7 -miphoneos-version-min=3.0"
export ASMFLAGS="-no-integrated-as"
cd {build_directory}
cat <<EOF >toolchain.cmake
set(CMAKE_SYSTEM_NAME Darwin)
set(CMAKE_SYSTEM_PROCESSOR arm)
set(CMAKE_C_COMPILER /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang)
EOF
cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake \
-DCMAKE_OSX_SYSROOT=${IOS_SYSROOT[0]} \
[additional CMake flags] {source_directory}
make
### Armv7s (32-bit)
**gas-preprocessor.pl required**
The following scripts demonstrate how to build libjpeg-turbo to run on the
iPhone 5/iPad 4th Generation and newer:
#### Xcode 4.5-4.6 (LLVM-GCC)
IOS_PLATFORMDIR=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform
IOS_SYSROOT=($IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk)
export CFLAGS="-Wall -mfloat-abi=softfp -march=armv7s -mcpu=swift -mtune=swift -mfpu=neon -miphoneos-version-min=6.0"
cd {build_directory}
cat <<EOF >toolchain.cmake
set(CMAKE_SYSTEM_NAME Darwin)
set(CMAKE_SYSTEM_PROCESSOR arm)
set(CMAKE_C_COMPILER ${IOS_PLATFORMDIR}/Developer/usr/bin/arm-apple-darwin10-llvm-gcc-4.2)
EOF
cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake \
-DCMAKE_OSX_SYSROOT=${IOS_SYSROOT[0]} \
[additional CMake flags] {source_directory}
make
#### Xcode 5 and later (Clang)
Same as the Armv7 build procedure for Xcode 5 and later, except replace the
compiler flags as follows:
export CFLAGS="-Wall -mfloat-abi=softfp -arch armv7s -miphoneos-version-min=6.0"
### Armv8 (64-bit) ### Armv8 (64-bit)
**gas-preprocessor.pl required if using Xcode < 6** **Xcode 5 or later required, Xcode 6.3.x or later recommended**
The following script demonstrates how to build libjpeg-turbo to run on the The following script demonstrates how to build libjpeg-turbo to run on the
iPhone 5S/iPad Mini 2/iPad Air and newer. iPhone 5S/iPad Mini 2/iPad Air and newer.
@@ -535,9 +443,6 @@ iPhone 5S/iPad Mini 2/iPad Air and newer.
[additional CMake flags] {source_directory} [additional CMake flags] {source_directory}
make make
Once built, lipo can be used to combine the Armv7, v7s, and/or v8 variants into
a universal library.
Building libjpeg-turbo for Android Building libjpeg-turbo for Android
---------------------------------- ----------------------------------
@@ -548,6 +453,8 @@ Building libjpeg-turbo for Android platforms requires v13b or later of the
### Armv7 (32-bit) ### Armv7 (32-bit)
**NDK r19 or later with Clang recommended**
The following is a general recipe script that can be modified for your specific The following is a general recipe script that can be modified for your specific
needs. needs.
@@ -573,6 +480,8 @@ needs.
### Armv8 (64-bit) ### Armv8 (64-bit)
**Clang recommended**
The following is a general recipe script that can be modified for your specific The following is a general recipe script that can be modified for your specific
needs. needs.
@@ -747,44 +656,23 @@ Mac
make dmg make dmg
Create Mac package/disk image. This requires pkgbuild and productbuild, which Create Mac package/disk image. This requires pkgbuild and productbuild, which
are installed by default on OS X 10.7 and later and which can be obtained by are installed by default on OS X/macOS 10.7 and later.
installing Xcode 3.2.6 (with the "Unix Development" option) on OS X 10.6.
Packages built in this manner can be installed on OS X 10.5 and later, but they
must be built on OS X 10.6 or later.
make udmg In order to create a Mac package/disk image that contains universal
x86-64/Arm binaries, set the following CMake variable:
This creates a Mac package/disk image that contains universal x86-64/i386/Arm * `ARMV8_BUILD`: Directory containing an Armv8 (64-bit) iOS or macOS build of
binaries. The following CMake variables control which architectures are libjpeg-turbo to include in the universal binaries
included in the universal binaries. Setting any of these variables to an empty
string excludes that architecture from the package.
* `OSX_32BIT_BUILD`: Directory containing an i386 (32-bit) Mac build of You should first use CMake to configure an Armv8 sub-build of libjpeg-turbo
libjpeg-turbo (default: *{source_directory}*/osxx86) (see "Building libjpeg-turbo for iOS" above, if applicable) in a build
* `IOS_ARMV7_BUILD`: Directory containing an Armv7 (32-bit) iOS build of directory that matches the one specified in the aforementioned CMake variable.
libjpeg-turbo (default: *{source_directory}*/iosarmv7) Next, configure the primary (x86-64) build of libjpeg-turbo as an out-of-tree
* `IOS_ARMV7S_BUILD`: Directory containing an Armv7s (32-bit) iOS build of build, specifying the aforementioned CMake variable, and build it. Once the
libjpeg-turbo (default: *{source_directory}*/iosarmv7s) primary build has been built, run `make dmg` from the build directory. The
* `IOS_ARMV8_BUILD`: Directory containing an Armv8 (64-bit) iOS build of packaging system will build the sub-build, use lipo to combine it with the
libjpeg-turbo (default: *{source_directory}*/iosarmv8) primary build into a single set of universal binaries, then package the
universal binaries.
You should first use CMake to configure i386, Armv7, Armv7s, and/or Armv8
sub-builds of libjpeg-turbo (see "Build Recipes" and "Building libjpeg-turbo
for iOS" above) in build directories that match those specified in the
aforementioned CMake variables. Next, configure the primary build of
libjpeg-turbo as an out-of-tree build, and build it. Once the primary build
has been built, run `make udmg` from the build directory. The packaging system
will build the sub-builds, use lipo to combine them into a single set of
universal binaries, then package the universal binaries in the same manner as
`make dmg`.
Cygwin
------
make cygwinpkg
Build a Cygwin binary package.
Windows Windows

View File

@@ -5,7 +5,7 @@ if(CMAKE_EXECUTABLE_SUFFIX)
endif() endif()
project(mozjpeg C) project(mozjpeg C)
set(VERSION 4.0.3) set(VERSION 4.0.4)
string(REPLACE "." ";" VERSION_TRIPLET ${VERSION}) string(REPLACE "." ";" VERSION_TRIPLET ${VERSION})
list(GET VERSION_TRIPLET 0 VERSION_MAJOR) list(GET VERSION_TRIPLET 0 VERSION_MAJOR)
list(GET VERSION_TRIPLET 1 VERSION_MINOR) list(GET VERSION_TRIPLET 1 VERSION_MINOR)
@@ -41,12 +41,19 @@ message(STATUS "VERSION = ${VERSION}, BUILD = ${BUILD}")
# Detect CPU type and whether we're building 64-bit or 32-bit code # Detect CPU type and whether we're building 64-bit or 32-bit code
math(EXPR BITS "${CMAKE_SIZEOF_VOID_P} * 8") math(EXPR BITS "${CMAKE_SIZEOF_VOID_P} * 8")
string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} CMAKE_SYSTEM_PROCESSOR_LC) string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} CMAKE_SYSTEM_PROCESSOR_LC)
set(COUNT 1)
foreach(ARCH ${CMAKE_OSX_ARCHITECTURES})
if(COUNT GREATER 1)
message(FATAL_ERROR "The libjpeg-turbo build system does not support multiple values in CMAKE_OSX_ARCHITECTURES.")
endif()
math(EXPR COUNT "${COUNT}+1")
endforeach()
if(CMAKE_SYSTEM_PROCESSOR_LC MATCHES "x86_64" OR if(CMAKE_SYSTEM_PROCESSOR_LC MATCHES "x86_64" OR
CMAKE_SYSTEM_PROCESSOR_LC MATCHES "amd64" OR CMAKE_SYSTEM_PROCESSOR_LC MATCHES "amd64" OR
CMAKE_SYSTEM_PROCESSOR_LC MATCHES "i[0-9]86" OR CMAKE_SYSTEM_PROCESSOR_LC MATCHES "i[0-9]86" OR
CMAKE_SYSTEM_PROCESSOR_LC MATCHES "x86" OR CMAKE_SYSTEM_PROCESSOR_LC MATCHES "x86" OR
CMAKE_SYSTEM_PROCESSOR_LC MATCHES "ia32") CMAKE_SYSTEM_PROCESSOR_LC MATCHES "ia32")
if(BITS EQUAL 64) if(BITS EQUAL 64 OR CMAKE_C_COMPILER_ABI MATCHES "ELF X32")
set(CPU_TYPE x86_64) set(CPU_TYPE x86_64)
else() else()
set(CPU_TYPE i386) set(CPU_TYPE i386)
@@ -57,9 +64,9 @@ if(CMAKE_SYSTEM_PROCESSOR_LC MATCHES "x86_64" OR
elseif(CMAKE_SYSTEM_PROCESSOR_LC STREQUAL "aarch64" OR elseif(CMAKE_SYSTEM_PROCESSOR_LC STREQUAL "aarch64" OR
CMAKE_SYSTEM_PROCESSOR_LC MATCHES "arm*") CMAKE_SYSTEM_PROCESSOR_LC MATCHES "arm*")
if(BITS EQUAL 64) if(BITS EQUAL 64)
set(CPU_TYPE arm64) set(CPU_TYPE arm64)
else() else()
set(CPU_TYPE arm) set(CPU_TYPE arm)
endif() endif()
elseif(CMAKE_SYSTEM_PROCESSOR_LC MATCHES "ppc*" OR elseif(CMAKE_SYSTEM_PROCESSOR_LC MATCHES "ppc*" OR
CMAKE_SYSTEM_PROCESSOR_LC MATCHES "powerpc*") CMAKE_SYSTEM_PROCESSOR_LC MATCHES "powerpc*")
@@ -67,6 +74,18 @@ elseif(CMAKE_SYSTEM_PROCESSOR_LC MATCHES "ppc*" OR
else() else()
set(CPU_TYPE ${CMAKE_SYSTEM_PROCESSOR_LC}) set(CPU_TYPE ${CMAKE_SYSTEM_PROCESSOR_LC})
endif() endif()
if(CMAKE_OSX_ARCHITECTURES MATCHES "x86_64" OR
CMAKE_OSX_ARCHITECTURES MATCHES "arm64" OR
CMAKE_OSX_ARCHITECTURES MATCHES "i386")
set(CPU_TYPE ${CMAKE_OSX_ARCHITECTURES})
endif()
if(CMAKE_OSX_ARCHITECTURES MATCHES "ppc")
set(CPU_TYPE powerpc)
endif()
if(MSVC_IDE AND CMAKE_GENERATOR_PLATFORM MATCHES "arm64")
set(CPU_TYPE arm64)
endif()
message(STATUS "${BITS}-bit build (${CPU_TYPE})") message(STATUS "${BITS}-bit build (${CPU_TYPE})")
@@ -84,7 +103,9 @@ if(WIN32)
set(CMAKE_INSTALL_DEFAULT_PREFIX "${CMAKE_INSTALL_DEFAULT_PREFIX}64") set(CMAKE_INSTALL_DEFAULT_PREFIX "${CMAKE_INSTALL_DEFAULT_PREFIX}64")
endif() endif()
else() else()
if(NOT CMAKE_INSTALL_DEFAULT_PREFIX)
set(CMAKE_INSTALL_DEFAULT_PREFIX /opt/${CMAKE_PROJECT_NAME}) set(CMAKE_INSTALL_DEFAULT_PREFIX /opt/${CMAKE_PROJECT_NAME})
endif()
endif() endif()
if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT) if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
set(CMAKE_INSTALL_PREFIX "${CMAKE_INSTALL_DEFAULT_PREFIX}" CACHE PATH set(CMAKE_INSTALL_PREFIX "${CMAKE_INSTALL_DEFAULT_PREFIX}" CACHE PATH
@@ -103,6 +124,8 @@ if(CMAKE_INSTALL_PREFIX STREQUAL "${CMAKE_INSTALL_DEFAULT_PREFIX}")
if(UNIX AND NOT APPLE) if(UNIX AND NOT APPLE)
if(BITS EQUAL 64) if(BITS EQUAL 64)
set(CMAKE_INSTALL_DEFAULT_LIBDIR "lib64") set(CMAKE_INSTALL_DEFAULT_LIBDIR "lib64")
elseif(CMAKE_C_COMPILER_ABI MATCHES "ELF X32")
set(CMAKE_INSTALL_DEFAULT_LIBDIR "libx32")
else() else()
set(CMAKE_INSTALL_DEFAULT_LIBDIR "lib32") set(CMAKE_INSTALL_DEFAULT_LIBDIR "lib32")
endif() endif()
@@ -135,9 +158,9 @@ endforeach()
macro(boolean_number var) macro(boolean_number var)
if(${var}) if(${var})
set(${var} 1) set(${var} 1 ${ARGN})
else() else()
set(${var} 0) set(${var} 0 ${ARGN})
endif() endif()
endmacro() endmacro()
@@ -155,8 +178,12 @@ option(WITH_ARITH_DEC "Include arithmetic decoding support when emulating the li
boolean_number(WITH_ARITH_DEC) boolean_number(WITH_ARITH_DEC)
option(WITH_ARITH_ENC "Include arithmetic encoding support when emulating the libjpeg v6b API/ABI" FALSE) option(WITH_ARITH_ENC "Include arithmetic encoding support when emulating the libjpeg v6b API/ABI" FALSE)
boolean_number(WITH_ARITH_ENC) boolean_number(WITH_ARITH_ENC)
option(WITH_JAVA "Build Java wrapper for the TurboJPEG API library (implies ENABLE_SHARED=1)" FALSE) if(CMAKE_C_COMPILER_ABI MATCHES "ELF X32")
boolean_number(WITH_JAVA) set(WITH_JAVA 0)
else()
option(WITH_JAVA "Build Java wrapper for the TurboJPEG API library (implies ENABLE_SHARED=1)" FALSE)
boolean_number(WITH_JAVA)
endif()
option(WITH_JPEG7 "Emulate libjpeg v7 API/ABI (this makes ${CMAKE_PROJECT_NAME} backward-incompatible with libjpeg v6b)" FALSE) option(WITH_JPEG7 "Emulate libjpeg v7 API/ABI (this makes ${CMAKE_PROJECT_NAME} backward-incompatible with libjpeg v6b)" FALSE)
boolean_number(WITH_JPEG7) boolean_number(WITH_JPEG7)
option(WITH_JPEG8 "Emulate libjpeg v8 API/ABI (this makes ${CMAKE_PROJECT_NAME} backward-incompatible with libjpeg v6b)" FALSE) option(WITH_JPEG8 "Emulate libjpeg v8 API/ABI (this makes ${CMAKE_PROJECT_NAME} backward-incompatible with libjpeg v6b)" FALSE)
@@ -418,13 +445,6 @@ if(UNIX)
exit(is_shifting_signed(-0x7F7E80B1L)); exit(is_shifting_signed(-0x7F7E80B1L));
}" RIGHT_SHIFT_IS_UNSIGNED) }" RIGHT_SHIFT_IS_UNSIGNED)
endif() endif()
if(CMAKE_CROSSCOMPILING)
set(__CHAR_UNSIGNED__ 0)
else()
check_c_source_runs("int main(void) { return ((char) -1 < 0); }"
__CHAR_UNSIGNED__)
endif()
endif() endif()
if(MSVC) if(MSVC)
@@ -550,6 +570,9 @@ endif()
if(WITH_SIMD) if(WITH_SIMD)
add_subdirectory(simd) add_subdirectory(simd)
if(NEON_INTRINSICS)
add_definitions(-DNEON_INTRINSICS)
endif()
elseif(NOT WITH_12BIT) elseif(NOT WITH_12BIT)
message(STATUS "SIMD extensions: None (WITH_SIMD = ${WITH_SIMD})") message(STATUS "SIMD extensions: None (WITH_SIMD = ${WITH_SIMD})")
endif() endif()
@@ -746,6 +769,8 @@ if(WITH_12BIT)
set(MD5_PPM_RGB_ISLOW f3301d2219783b8b3d942b7239fa50c0) set(MD5_PPM_RGB_ISLOW f3301d2219783b8b3d942b7239fa50c0)
set(MD5_JPEG_422_IFAST_OPT 7322e3bd2f127f7de4b40d4480ce60e4) set(MD5_JPEG_422_IFAST_OPT 7322e3bd2f127f7de4b40d4480ce60e4)
set(MD5_PPM_422_IFAST 79807fa552899e66a04708f533e16950) set(MD5_PPM_422_IFAST 79807fa552899e66a04708f533e16950)
set(MD5_JPEG_440_ISLOW e25c1912e38367be505a89c410c1c2d2)
set(MD5_PPM_440_ISLOW e7d2e26288870cfcb30f3114ad01e380)
set(MD5_PPM_422M_IFAST 07737bfe8a7c1c87aaa393a0098d16b0) set(MD5_PPM_422M_IFAST 07737bfe8a7c1c87aaa393a0098d16b0)
set(MD5_JPEG_420_IFAST_Q100_PROG 008ab68d6ddbba04a8f01deee4e0f9f8) set(MD5_JPEG_420_IFAST_Q100_PROG 008ab68d6ddbba04a8f01deee4e0f9f8)
set(MD5_PPM_420_Q100_IFAST 1b3730122709f53d007255e8dfd3305e) set(MD5_PPM_420_Q100_IFAST 1b3730122709f53d007255e8dfd3305e)
@@ -795,6 +820,8 @@ else()
set(MD5_BMP_RGB_ISLOW_565D 4cfa0928ef3e6bb626d7728c924cfda4) set(MD5_BMP_RGB_ISLOW_565D 4cfa0928ef3e6bb626d7728c924cfda4)
set(MD5_JPEG_422_IFAST_OPT 2540287b79d913f91665e660303ab2c8) set(MD5_JPEG_422_IFAST_OPT 2540287b79d913f91665e660303ab2c8)
set(MD5_PPM_422_IFAST 35bd6b3f833bad23de82acea847129fa) set(MD5_PPM_422_IFAST 35bd6b3f833bad23de82acea847129fa)
set(MD5_JPEG_440_ISLOW 538bc02bd4b4658fd85de6ece6cbeda6)
set(MD5_PPM_440_ISLOW 11e7eab7ef7ef3276934bb7e7b6bb377)
set(MD5_PPM_422M_IFAST 8dbc65323d62cca7c91ba02dd1cfa81d) set(MD5_PPM_422M_IFAST 8dbc65323d62cca7c91ba02dd1cfa81d)
set(MD5_BMP_422M_IFAST_565 3294bd4d9a1f2b3d08ea6020d0db7065) set(MD5_BMP_422M_IFAST_565 3294bd4d9a1f2b3d08ea6020d0db7065)
set(MD5_BMP_422M_IFAST_565D da98c9c7b6039511be4a79a878a9abc1) set(MD5_BMP_422M_IFAST_565D da98c9c7b6039511be4a79a878a9abc1)
@@ -824,29 +851,7 @@ else()
set(MD5_PPM_3x2_IFAST fd283664b3b49127984af0a7f118fccd) set(MD5_PPM_3x2_IFAST fd283664b3b49127984af0a7f118fccd)
set(MD5_JPEG_420_ISLOW_ARI e986fb0a637a8d833d96e8a6d6d84ea1) set(MD5_JPEG_420_ISLOW_ARI e986fb0a637a8d833d96e8a6d6d84ea1)
set(MD5_JPEG_444_ISLOW_PROGARI 0a8f1c8f66e113c3cf635df0a475a617) set(MD5_JPEG_444_ISLOW_PROGARI 0a8f1c8f66e113c3cf635df0a475a617)
# Since v1.5.1, libjpeg-turbo uses the separate non-fancy upsampling and
# YCbCr -> RGB color conversion routines rather than merged upsampling/color
# conversion when fancy upsampling is disabled on platforms that have a SIMD
# implementation of YCbCr -> RGB color conversion but no SIMD implementation
# of merged upsampling/color conversion. This was intended to improve the
# performance of the Arm Neon SIMD extensions, the only SIMD extensions for
# which those circumstances currently apply. The separate non-fancy
# upsampling and color conversion routines usually produce bitwise-identical
# output to the merged upsampling/color conversion routines, but that is not
# the case when skipping scanlines starting at an odd-numbered scanline. In
# libjpeg-turbo 2.0.5 and prior, doing that while using merged h2v2
# upsampling caused a segfault, so this test validates the fix for that
# segfault. Unfortunately, however, the test also produces different bitwise
# output when using the Neon SIMD extensions, because of the aforementioned
# optimization. The easiest workaround is to use the old test from
# libjpeg-turbo 2.0.5 and prior when using the Neon SIMD extensions. The
# aforementioned segfault never would have occurred with the Neon SIMD
# extensions anyhow, since merged upsampling is disabled when using them.
if((CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm") AND WITH_SIMD)
set(MD5_PPM_420M_IFAST_ARI 72b59a99bcf1de24c5b27d151bde2437)
else()
set(MD5_PPM_420M_IFAST_ARI 57251da28a35b46eecb7177d82d10e0e) set(MD5_PPM_420M_IFAST_ARI 57251da28a35b46eecb7177d82d10e0e)
endif()
set(MD5_JPEG_420_ISLOW 9a68f56bc76e466aa7e52f415d0f4a5f) set(MD5_JPEG_420_ISLOW 9a68f56bc76e466aa7e52f415d0f4a5f)
set(MD5_PPM_420M_ISLOW_2_1 9f9de8c0612f8d06869b960b05abf9c9) set(MD5_PPM_420M_ISLOW_2_1 9f9de8c0612f8d06869b960b05abf9c9)
set(MD5_PPM_420M_ISLOW_15_8 b6875bc070720b899566cc06459b63b7) set(MD5_PPM_420M_ISLOW_15_8 b6875bc070720b899566cc06459b63b7)
@@ -954,7 +959,7 @@ if(CPU_TYPE STREQUAL "x86_64" OR CPU_TYPE STREQUAL "i386")
endif() endif()
else() else()
if((CPU_TYPE STREQUAL "powerpc" OR CPU_TYPE STREQUAL "arm64") AND if((CPU_TYPE STREQUAL "powerpc" OR CPU_TYPE STREQUAL "arm64") AND
NOT CMAKE_C_COMPILER_ID STREQUAL "Clang") NOT CMAKE_C_COMPILER_ID STREQUAL "Clang" AND NOT MSVC)
set(DEFAULT_FLOATTEST fp-contract) set(DEFAULT_FLOATTEST fp-contract)
else() else()
set(DEFAULT_FLOATTEST no-fp-contract) set(DEFAULT_FLOATTEST no-fp-contract)
@@ -1101,6 +1106,16 @@ foreach(libtype ${TEST_LIBTYPES})
testout_422_ifast.ppm testout_422_ifast_opt.jpg testout_422_ifast.ppm testout_422_ifast_opt.jpg
${MD5_PPM_422_IFAST} cjpeg-${libtype}-422-ifast-opt) ${MD5_PPM_422_IFAST} cjpeg-${libtype}-422-ifast-opt)
# CC: RGB->YCC SAMP: fullsize/h1v2 FDCT: islow ENT: huff
add_bittest(cjpeg 440-islow "-sample;1x2;-dct;int"
testout_440_islow.jpg ${TESTIMAGES}/testorig.ppm
${MD5_JPEG_440_ISLOW})
# CC: YCC->RGB SAMP: fullsize/h1v2 fancy IDCT: islow ENT: huff
add_bittest(djpeg 440-islow "-dct;int"
testout_440_islow.ppm testout_440_islow.jpg
${MD5_PPM_440_ISLOW} cjpeg-${libtype}-440-islow)
# CC: YCC->RGB SAMP: h2v1 merged IDCT: ifast ENT: huff # CC: YCC->RGB SAMP: h2v1 merged IDCT: ifast ENT: huff
add_bittest(djpeg 422m-ifast "-dct;fast;-nosmooth" add_bittest(djpeg 422m-ifast "-dct;fast;-nosmooth"
testout_422m_ifast.ppm testout_422_ifast_opt.jpg testout_422m_ifast.ppm testout_422_ifast_opt.jpg
@@ -1209,17 +1224,9 @@ foreach(libtype ${TEST_LIBTYPES})
if(WITH_ARITH_DEC) if(WITH_ARITH_DEC)
# CC: RGB->YCC SAMP: h2v2 merged IDCT: ifast ENT: arith # CC: RGB->YCC SAMP: h2v2 merged IDCT: ifast ENT: arith
if((CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm") AND WITH_SIMD)
# Refer to the comment above the definition of MD5_PPM_420M_IFAST_ARI for
# an explanation of why this is necessary.
add_bittest(djpeg 420m-ifast-ari "-fast;-ppm"
testout_420m_ifast_ari.ppm ${TESTIMAGES}/testimgari.jpg
${MD5_PPM_420M_IFAST_ARI})
else()
add_bittest(djpeg 420m-ifast-ari "-fast;-skip;1,20;-ppm" add_bittest(djpeg 420m-ifast-ari "-fast;-skip;1,20;-ppm"
testout_420m_ifast_ari.ppm ${TESTIMAGES}/testimgari.jpg testout_420m_ifast_ari.ppm ${TESTIMAGES}/testimgari.jpg
${MD5_PPM_420M_IFAST_ARI}) ${MD5_PPM_420M_IFAST_ARI})
endif()
add_bittest(jpegtran 420-islow "-revert" add_bittest(jpegtran 420-islow "-revert"
testout_420_islow.jpg ${TESTIMAGES}/testimgari.jpg testout_420_islow.jpg ${TESTIMAGES}/testimgari.jpg
@@ -1425,10 +1432,13 @@ set(EXE ${CMAKE_EXECUTABLE_SUFFIX})
if(WITH_TURBOJPEG) if(WITH_TURBOJPEG)
if(ENABLE_SHARED) if(ENABLE_SHARED)
install(TARGETS turbojpeg tjbench install(TARGETS turbojpeg EXPORT ${CMAKE_PROJECT_NAME}Targets
INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}) RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
install(TARGETS tjbench
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
if(NOT CMAKE_VERSION VERSION_LESS "3.1" AND MSVC AND if(NOT CMAKE_VERSION VERSION_LESS "3.1" AND MSVC AND
CMAKE_C_LINKER_SUPPORTS_PDB) CMAKE_C_LINKER_SUPPORTS_PDB)
install(FILES "$<TARGET_PDB_FILE:turbojpeg>" install(FILES "$<TARGET_PDB_FILE:turbojpeg>"
@@ -1436,8 +1446,9 @@ if(WITH_TURBOJPEG)
endif() endif()
endif() endif()
if(ENABLE_STATIC) if(ENABLE_STATIC)
install(TARGETS turbojpeg-static ARCHIVE install(TARGETS turbojpeg-static EXPORT ${CMAKE_PROJECT_NAME}Targets
DESTINATION ${CMAKE_INSTALL_LIBDIR}) INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
if(NOT ENABLE_SHARED) if(NOT ENABLE_SHARED)
if(MSVC_IDE OR XCODE) if(MSVC_IDE OR XCODE)
set(DIR "${CMAKE_CURRENT_BINARY_DIR}/\${CMAKE_INSTALL_CONFIG_NAME}") set(DIR "${CMAKE_CURRENT_BINARY_DIR}/\${CMAKE_INSTALL_CONFIG_NAME}")
@@ -1453,7 +1464,9 @@ if(WITH_TURBOJPEG)
endif() endif()
if(ENABLE_STATIC) if(ENABLE_STATIC)
install(TARGETS jpeg-static ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) install(TARGETS jpeg-static EXPORT ${CMAKE_PROJECT_NAME}Targets
INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR})
if(NOT ENABLE_SHARED) if(NOT ENABLE_SHARED)
if(MSVC_IDE OR XCODE) if(MSVC_IDE OR XCODE)
set(DIR "${CMAKE_CURRENT_BINARY_DIR}/\${CMAKE_INSTALL_CONFIG_NAME}") set(DIR "${CMAKE_CURRENT_BINARY_DIR}/\${CMAKE_INSTALL_CONFIG_NAME}")
@@ -1493,6 +1506,13 @@ endif()
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/pkgscripts/libjpeg.pc install(FILES ${CMAKE_CURRENT_BINARY_DIR}/pkgscripts/libjpeg.pc
${CMAKE_CURRENT_BINARY_DIR}/pkgscripts/libturbojpeg.pc ${CMAKE_CURRENT_BINARY_DIR}/pkgscripts/libturbojpeg.pc
DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig) DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig)
install(FILES
${CMAKE_CURRENT_BINARY_DIR}/pkgscripts/${CMAKE_PROJECT_NAME}Config.cmake
${CMAKE_CURRENT_BINARY_DIR}/pkgscripts/${CMAKE_PROJECT_NAME}ConfigVersion.cmake
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/${CMAKE_PROJECT_NAME})
install(EXPORT ${CMAKE_PROJECT_NAME}Targets
NAMESPACE ${CMAKE_PROJECT_NAME}::
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/${CMAKE_PROJECT_NAME})
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/jconfig.h install(FILES ${CMAKE_CURRENT_BINARY_DIR}/jconfig.h
${CMAKE_CURRENT_SOURCE_DIR}/jerror.h ${CMAKE_CURRENT_SOURCE_DIR}/jmorecfg.h ${CMAKE_CURRENT_SOURCE_DIR}/jerror.h ${CMAKE_CURRENT_SOURCE_DIR}/jmorecfg.h

View File

@@ -1,3 +1,168 @@
2.1 post-beta
=============
### Significant changes relative to 2.1 beta1
1. Fixed a regression introduced by 2.1 beta1[6(b)] whereby attempting to
decompress certain progressive JPEG images with one or more component planes of
width 8 or less caused a buffer overrun.
2. Fixed a regression introduced by 2.1 beta1[6(b)] whereby attempting to
decompress a specially-crafted malformed progressive JPEG image caused the
block smoothing algorithm to read from uninitialized memory.
3. Fixed an issue in the Arm Neon SIMD Huffman encoders that caused the
encoders to generate incorrect results when using the Clang compiler with
Visual Studio.
4. Fixed a floating point exception that occurred when attempting to compress a
specially-crafted malformed GIF image with a specified image width of 0 using
cjpeg.
2.0.90 (2.1 beta1)
==================
### Significant changes relative to 2.0.6:
1. The build system, x86-64 SIMD extensions, and accelerated Huffman codec now
support the x32 ABI on Linux, which allows for using x86-64 instructions with
32-bit pointers. The x32 ABI is generally enabled by adding `-mx32` to the
compiler flags.
Caveats:
- CMake 3.9.0 or later is required in order for the build system to
automatically detect an x32 build.
- Java does not support the x32 ABI, and thus the TurboJPEG Java API will
automatically be disabled with x32 builds.
2. Added Loongson MMI SIMD implementations of the RGB-to-grayscale, 4:2:2 fancy
chroma upsampling, 4:2:2 and 4:2:0 merged chroma upsampling/color conversion,
and fast integer DCT/IDCT algorithms. Relative to libjpeg-turbo 2.0.x, this
speeds up:
- the compression of RGB source images into grayscale JPEG images by
approximately 20%
- the decompression of 4:2:2 JPEG images by approximately 40-60% when
using fancy upsampling
- the decompression of 4:2:2 and 4:2:0 JPEG images by approximately
15-20% when using merged upsampling
- the compression of RGB source images by approximately 30-45% when using
the fast integer DCT
- the decompression of JPEG images into RGB destination images by
approximately 2x when using the fast integer IDCT
The overall decompression speedup for RGB images is now approximately
2.3-3.7x (compared to 2-3.5x with libjpeg-turbo 2.0.x.)
3. 32-bit (Armv7 or Armv7s) iOS builds of libjpeg-turbo are no longer
supported, and the libjpeg-turbo build system can no longer be used to package
such builds. 32-bit iOS apps cannot run in iOS 11 and later, and the App Store
no longer allows them.
4. 32-bit (i386) OS X/macOS builds of libjpeg-turbo are no longer supported,
and the libjpeg-turbo build system can no longer be used to package such
builds. 32-bit Mac applications cannot run in macOS 10.15 "Catalina" and
later, and the App Store no longer allows them.
5. The SSE2 (x86 SIMD) and C Huffman encoding algorithms have been
significantly optimized, resulting in a measured average overall compression
speedup of 12-28% for 64-bit code and 22-52% for 32-bit code on various Intel
and AMD CPUs, as well as a measured average overall compression speedup of
0-23% on platforms that do not have a SIMD-accelerated Huffman encoding
implementation.
6. The block smoothing algorithm that is applied by default when decompressing
progressive Huffman-encoded JPEG images has been improved in the following
ways:
- The algorithm is now more fault-tolerant. Previously, if a particular
scan was incomplete, then the smoothing parameters for the incomplete scan
would be applied to the entire output image, including the parts of the image
that were generated by the prior (complete) scan. Visually, this had the
effect of removing block smoothing from lower-frequency scans if they were
followed by an incomplete higher-frequency scan. libjpeg-turbo now applies
block smoothing parameters to each iMCU row based on which scan generated the
pixels in that row, rather than always using the block smoothing parameters for
the most recent scan.
- When applying block smoothing to DC scans, a Gaussian-like kernel with a
5x5 window is used to reduce the "blocky" appearance.
7. Added SIMD acceleration for progressive Huffman encoding on Arm platforms.
This speeds up the compression of full-color progressive JPEGs by about 30-40%
on average (relative to libjpeg-turbo 2.0.x) when using modern Arm CPUs.
8. Added configure-time and run-time auto-detection of Loongson MMI SIMD
instructions, so that the Loongson MMI SIMD extensions can be included in any
MIPS64 libjpeg-turbo build.
9. Added fault tolerance features to djpeg and jpegtran, mainly to demonstrate
methods by which applications can guard against the exploits of the JPEG format
described in the report
["Two Issues with the JPEG Standard"](https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf).
- Both programs now accept a `-maxscans` argument, which can be used to
limit the number of allowable scans in the input file.
- Both programs now accept a `-strict` argument, which can be used to
treat all warnings as fatal.
10. CMake package config files are now included for both the libjpeg and
TurboJPEG API libraries. This facilitates using libjpeg-turbo with CMake's
`find_package()` function. For example:
find_package(libjpeg-turbo CONFIG REQUIRED)
add_executable(libjpeg_program libjpeg_program.c)
target_link_libraries(libjpeg_program PUBLIC libjpeg-turbo::jpeg)
add_executable(libjpeg_program_static libjpeg_program.c)
target_link_libraries(libjpeg_program_static PUBLIC
libjpeg-turbo::jpeg-static)
add_executable(turbojpeg_program turbojpeg_program.c)
target_link_libraries(turbojpeg_program PUBLIC
libjpeg-turbo::turbojpeg)
add_executable(turbojpeg_program_static turbojpeg_program.c)
target_link_libraries(turbojpeg_program_static PUBLIC
libjpeg-turbo::turbojpeg-static)
11. Since the Unisys LZW patent has long expired, cjpeg and djpeg can now
read/write both LZW-compressed and uncompressed GIF files (feature ported from
jpeg-6a and jpeg-9d.)
12. jpegtran now includes the `-wipe` and `-drop` options from jpeg-9a and
jpeg-9d, as well as the ability to expand the image size using the `-crop`
option. Refer to jpegtran.1 or usage.txt for more details.
13. Added a complete intrinsics implementation of the Arm Neon SIMD extensions,
thus providing SIMD acceleration on Arm platforms for all of the algorithms
that are SIMD-accelerated on x86 platforms. This new implementation is
significantly faster in some cases than the old GAS implementation--
depending on the algorithms used, the type of CPU core, and the compiler. GCC,
as of this writing, does not provide a full or optimal set of Neon intrinsics,
so for performance reasons, the default when building libjpeg-turbo with GCC is
to continue using the GAS implementation of the following algorithms:
- 32-bit RGB-to-YCbCr color conversion
- 32-bit fast and accurate inverse DCT
- 64-bit RGB-to-YCbCr and YCbCr-to-RGB color conversion
- 64-bit accurate forward and inverse DCT
- 64-bit Huffman encoding
A new CMake variable (`NEON_INTRINSICS`) can be used to override this
default.
Since the new intrinsics implementation includes SIMD acceleration
for merged upsampling/color conversion, 1.5.1[5] is no longer necessary and has
been reverted.
14. The Arm Neon SIMD extensions can now be built using Visual Studio.
15. The build system can now be used to generate a universal x86-64 + Armv8
libjpeg-turbo SDK package for both iOS and macOS.
2.0.6 2.0.6
===== =====

View File

@@ -91,7 +91,7 @@ best of our understanding.
The Modified (3-clause) BSD License The Modified (3-clause) BSD License
=================================== ===================================
Copyright (C)2009-2020 D. R. Commander. All Rights Reserved. Copyright (C)2009-2021 D. R. Commander. All Rights Reserved.<br>
Copyright (C)2015 Viktor Szathmáry. All Rights Reserved. Copyright (C)2015 Viktor Szathmáry. All Rights Reserved.
Redistribution and use in source and binary forms, with or without Redistribution and use in source and binary forms, with or without

View File

@@ -128,7 +128,7 @@ with respect to this software, its quality, accuracy, merchantability, or
fitness for a particular purpose. This software is provided "AS IS", and you, fitness for a particular purpose. This software is provided "AS IS", and you,
its user, assume the entire risk as to its quality and accuracy. its user, assume the entire risk as to its quality and accuracy.
This software is copyright (C) 1991-2016, Thomas G. Lane, Guido Vollbeding. This software is copyright (C) 1991-2020, Thomas G. Lane, Guido Vollbeding.
All Rights Reserved except as specified below. All Rights Reserved except as specified below.
Permission is hereby granted to use, copy, modify, and distribute this Permission is hereby granted to use, copy, modify, and distribute this
@@ -159,19 +159,6 @@ commercial products, provided that all warranty or liability claims are
assumed by the product vendor. assumed by the product vendor.
The IJG distribution formerly included code to read and write GIF files.
To avoid entanglement with the Unisys LZW patent (now expired), GIF reading
support has been removed altogether, and the GIF writer has been simplified
to produce "uncompressed GIFs". This technique does not use the LZW
algorithm; the resulting GIF files are larger than usual, but are readable
by all standard GIF decoders.
We are required to state that
"The Graphics Interchange Format(c) is the Copyright property of
CompuServe Incorporated. GIF(sm) is a Service Mark property of
CompuServe Incorporated."
REFERENCES REFERENCES
========== ==========

View File

@@ -1,9 +1,11 @@
/* /*
* cderror.h * cderror.h
* *
* This file was part of the Independent JPEG Group's software:
* Copyright (C) 1994-1997, Thomas G. Lane. * Copyright (C) 1994-1997, Thomas G. Lane.
* Modified 2009-2017 by Guido Vollbeding. * Modified 2009-2017 by Guido Vollbeding.
* This file is part of the Independent JPEG Group's software. * libjpeg-turbo Modifications:
* Copyright (C) 2021, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -42,7 +44,7 @@ JMESSAGE(JMSG_FIRSTADDONCODE = 1000, NULL) /* Must be first entry! */
#ifdef BMP_SUPPORTED #ifdef BMP_SUPPORTED
JMESSAGE(JERR_BMP_BADCMAP, "Unsupported BMP colormap format") JMESSAGE(JERR_BMP_BADCMAP, "Unsupported BMP colormap format")
JMESSAGE(JERR_BMP_BADDEPTH, "Only 8- and 24-bit BMP files are supported") JMESSAGE(JERR_BMP_BADDEPTH, "Only 8-, 24-, and 32-bit BMP files are supported")
JMESSAGE(JERR_BMP_BADHEADER, "Invalid BMP file: bad header length") JMESSAGE(JERR_BMP_BADHEADER, "Invalid BMP file: bad header length")
JMESSAGE(JERR_BMP_BADPLANES, "Invalid BMP file: biPlanes not equal to 1") JMESSAGE(JERR_BMP_BADPLANES, "Invalid BMP file: biPlanes not equal to 1")
JMESSAGE(JERR_BMP_COLORSPACE, "BMP output must be grayscale or RGB") JMESSAGE(JERR_BMP_COLORSPACE, "BMP output must be grayscale or RGB")
@@ -50,9 +52,9 @@ JMESSAGE(JERR_BMP_COMPRESSED, "Sorry, compressed BMPs not yet supported")
JMESSAGE(JERR_BMP_EMPTY, "Empty BMP image") JMESSAGE(JERR_BMP_EMPTY, "Empty BMP image")
JMESSAGE(JERR_BMP_NOT, "Not a BMP file - does not start with BM") JMESSAGE(JERR_BMP_NOT, "Not a BMP file - does not start with BM")
JMESSAGE(JERR_BMP_OUTOFRANGE, "Numeric value out of range in BMP file") JMESSAGE(JERR_BMP_OUTOFRANGE, "Numeric value out of range in BMP file")
JMESSAGE(JTRC_BMP, "%ux%u 24-bit BMP image") JMESSAGE(JTRC_BMP, "%ux%u %d-bit BMP image")
JMESSAGE(JTRC_BMP_MAPPED, "%ux%u 8-bit colormapped BMP image") JMESSAGE(JTRC_BMP_MAPPED, "%ux%u 8-bit colormapped BMP image")
JMESSAGE(JTRC_BMP_OS2, "%ux%u 24-bit OS2 BMP image") JMESSAGE(JTRC_BMP_OS2, "%ux%u %d-bit OS2 BMP image")
JMESSAGE(JTRC_BMP_OS2_MAPPED, "%ux%u 8-bit colormapped OS2 BMP image") JMESSAGE(JTRC_BMP_OS2_MAPPED, "%ux%u 8-bit colormapped OS2 BMP image")
#endif /* BMP_SUPPORTED */ #endif /* BMP_SUPPORTED */
@@ -60,6 +62,7 @@ JMESSAGE(JTRC_BMP_OS2_MAPPED, "%ux%u 8-bit colormapped OS2 BMP image")
JMESSAGE(JERR_GIF_BUG, "GIF output got confused") JMESSAGE(JERR_GIF_BUG, "GIF output got confused")
JMESSAGE(JERR_GIF_CODESIZE, "Bogus GIF codesize %d") JMESSAGE(JERR_GIF_CODESIZE, "Bogus GIF codesize %d")
JMESSAGE(JERR_GIF_COLORSPACE, "GIF output must be grayscale or RGB") JMESSAGE(JERR_GIF_COLORSPACE, "GIF output must be grayscale or RGB")
JMESSAGE(JERR_GIF_EMPTY, "Empty GIF image")
JMESSAGE(JERR_GIF_IMAGENOTFOUND, "Too few images in GIF file") JMESSAGE(JERR_GIF_IMAGENOTFOUND, "Too few images in GIF file")
JMESSAGE(JERR_GIF_NOT, "Not a GIF file") JMESSAGE(JERR_GIF_NOT, "Not a GIF file")
JMESSAGE(JTRC_GIF, "%ux%ux%d GIF image") JMESSAGE(JTRC_GIF, "%ux%ux%d GIF image")
@@ -84,23 +87,6 @@ JMESSAGE(JTRC_PPM, "%ux%u PPM image")
JMESSAGE(JTRC_PPM_TEXT, "%ux%u text PPM image") JMESSAGE(JTRC_PPM_TEXT, "%ux%u text PPM image")
#endif /* PPM_SUPPORTED */ #endif /* PPM_SUPPORTED */
#ifdef RLE_SUPPORTED
JMESSAGE(JERR_RLE_BADERROR, "Bogus error code from RLE library")
JMESSAGE(JERR_RLE_COLORSPACE, "RLE output must be grayscale or RGB")
JMESSAGE(JERR_RLE_DIMENSIONS, "Image dimensions (%ux%u) too large for RLE")
JMESSAGE(JERR_RLE_EMPTY, "Empty RLE file")
JMESSAGE(JERR_RLE_EOF, "Premature EOF in RLE header")
JMESSAGE(JERR_RLE_MEM, "Insufficient memory for RLE header")
JMESSAGE(JERR_RLE_NOT, "Not an RLE file")
JMESSAGE(JERR_RLE_TOOMANYCHANNELS, "Cannot handle %d output channels for RLE")
JMESSAGE(JERR_RLE_UNSUPPORTED, "Cannot handle this RLE setup")
JMESSAGE(JTRC_RLE, "%ux%u full-color RLE file")
JMESSAGE(JTRC_RLE_FULLMAP, "%ux%u full-color RLE file with map of length %d")
JMESSAGE(JTRC_RLE_GRAY, "%ux%u grayscale RLE file")
JMESSAGE(JTRC_RLE_MAPGRAY, "%ux%u grayscale RLE file with map of length %d")
JMESSAGE(JTRC_RLE_MAPPED, "%ux%u colormapped RLE file with map of length %d")
#endif /* RLE_SUPPORTED */
#ifdef TARGA_SUPPORTED #ifdef TARGA_SUPPORTED
JMESSAGE(JERR_TGA_BADCMAP, "Unsupported Targa colormap format") JMESSAGE(JERR_TGA_BADCMAP, "Unsupported Targa colormap format")
JMESSAGE(JERR_TGA_BADPARMS, "Invalid or unsupported Targa file") JMESSAGE(JERR_TGA_BADPARMS, "Invalid or unsupported Targa file")

View File

@@ -3,8 +3,8 @@
* *
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* It was modified by The libjpeg-turbo Project to include only code relevant * libjpeg-turbo Modifications:
* to libjpeg-turbo. * Copyright (C) 2019, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -25,26 +25,37 @@
* Optional progress monitor: display a percent-done figure on stderr. * Optional progress monitor: display a percent-done figure on stderr.
*/ */
#ifdef PROGRESS_REPORT
METHODDEF(void) METHODDEF(void)
progress_monitor(j_common_ptr cinfo) progress_monitor(j_common_ptr cinfo)
{ {
cd_progress_ptr prog = (cd_progress_ptr)cinfo->progress; cd_progress_ptr prog = (cd_progress_ptr)cinfo->progress;
int total_passes = prog->pub.total_passes + prog->total_extra_passes;
int percent_done =
(int)(prog->pub.pass_counter * 100L / prog->pub.pass_limit);
if (percent_done != prog->percent_done) { if (prog->max_scans != 0 && cinfo->is_decompressor) {
prog->percent_done = percent_done; int scan_no = ((j_decompress_ptr)cinfo)->input_scan_number;
if (total_passes > 1) {
fprintf(stderr, "\rPass %d/%d: %3d%% ", if (scan_no > (int)prog->max_scans) {
prog->pub.completed_passes + prog->completed_extra_passes + 1, fprintf(stderr, "Scan number %d exceeds maximum scans (%d)\n", scan_no,
total_passes, percent_done); prog->max_scans);
} else { exit(EXIT_FAILURE);
fprintf(stderr, "\r %3d%% ", percent_done); }
}
if (prog->report) {
int total_passes = prog->pub.total_passes + prog->total_extra_passes;
int percent_done =
(int)(prog->pub.pass_counter * 100L / prog->pub.pass_limit);
if (percent_done != prog->percent_done) {
prog->percent_done = percent_done;
if (total_passes > 1) {
fprintf(stderr, "\rPass %d/%d: %3d%% ",
prog->pub.completed_passes + prog->completed_extra_passes + 1,
total_passes, percent_done);
} else {
fprintf(stderr, "\r %3d%% ", percent_done);
}
fflush(stderr);
} }
fflush(stderr);
} }
} }
@@ -57,6 +68,8 @@ start_progress_monitor(j_common_ptr cinfo, cd_progress_ptr progress)
progress->pub.progress_monitor = progress_monitor; progress->pub.progress_monitor = progress_monitor;
progress->completed_extra_passes = 0; progress->completed_extra_passes = 0;
progress->total_extra_passes = 0; progress->total_extra_passes = 0;
progress->max_scans = 0;
progress->report = FALSE;
progress->percent_done = -1; progress->percent_done = -1;
cinfo->progress = &progress->pub; cinfo->progress = &progress->pub;
} }
@@ -73,8 +86,6 @@ end_progress_monitor(j_common_ptr cinfo)
} }
} }
#endif
/* /*
* Case-insensitive matching of possibly-abbreviated keyword switches. * Case-insensitive matching of possibly-abbreviated keyword switches.

View File

@@ -3,8 +3,9 @@
* *
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1994-1997, Thomas G. Lane. * Copyright (C) 1994-1997, Thomas G. Lane.
* Modified 2019 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2017, D. R. Commander. * Copyright (C) 2017, 2019, D. R. Commander.
* mozjpeg Modifications: * mozjpeg Modifications:
* Copyright (C) 2014, Mozilla Corporation. * Copyright (C) 2014, Mozilla Corporation.
* For conditions of distribution and use, see the accompanying README.ijg file. * For conditions of distribution and use, see the accompanying README.ijg file.
@@ -65,9 +66,9 @@ struct djpeg_dest_struct {
void (*finish_output) (j_decompress_ptr cinfo, djpeg_dest_ptr dinfo); void (*finish_output) (j_decompress_ptr cinfo, djpeg_dest_ptr dinfo);
/* Re-calculate buffer dimensions based on output dimensions (for use with /* Re-calculate buffer dimensions based on output dimensions (for use with
partial image decompression.) If this is NULL, then the output format partial image decompression.) If this is NULL, then the output format
does not support partial image decompression (BMP and RLE, in particular, does not support partial image decompression (BMP, in particular, cannot
cannot support partial decompression because they use an inversion buffer support partial decompression because it uses an inversion buffer to write
to write the image in bottom-up order.) */ the image in bottom-up order.) */
void (*calc_buffer_dimensions) (j_decompress_ptr cinfo, void (*calc_buffer_dimensions) (j_decompress_ptr cinfo,
djpeg_dest_ptr dinfo); djpeg_dest_ptr dinfo);
@@ -96,6 +97,9 @@ struct cdjpeg_progress_mgr {
struct jpeg_progress_mgr pub; /* fields known to JPEG library */ struct jpeg_progress_mgr pub; /* fields known to JPEG library */
int completed_extra_passes; /* extra passes completed */ int completed_extra_passes; /* extra passes completed */
int total_extra_passes; /* total extra */ int total_extra_passes; /* total extra */
JDIMENSION max_scans; /* abort if the number of scans exceeds this
value and the value is non-zero */
boolean report; /* whether or not to report progress */
/* last printed percentage stored here to avoid multiple printouts */ /* last printed percentage stored here to avoid multiple printouts */
int percent_done; int percent_done;
}; };
@@ -112,21 +116,19 @@ EXTERN(cjpeg_source_ptr) jinit_read_bmp(j_compress_ptr cinfo,
EXTERN(djpeg_dest_ptr) jinit_write_bmp(j_decompress_ptr cinfo, boolean is_os2, EXTERN(djpeg_dest_ptr) jinit_write_bmp(j_decompress_ptr cinfo, boolean is_os2,
boolean use_inversion_array); boolean use_inversion_array);
EXTERN(cjpeg_source_ptr) jinit_read_gif(j_compress_ptr cinfo); EXTERN(cjpeg_source_ptr) jinit_read_gif(j_compress_ptr cinfo);
EXTERN(djpeg_dest_ptr) jinit_write_gif(j_decompress_ptr cinfo); EXTERN(djpeg_dest_ptr) jinit_write_gif(j_decompress_ptr cinfo, boolean is_lzw);
EXTERN(cjpeg_source_ptr) jinit_read_ppm(j_compress_ptr cinfo); EXTERN(cjpeg_source_ptr) jinit_read_ppm(j_compress_ptr cinfo);
EXTERN(djpeg_dest_ptr) jinit_write_ppm(j_decompress_ptr cinfo); EXTERN(djpeg_dest_ptr) jinit_write_ppm(j_decompress_ptr cinfo);
EXTERN(cjpeg_source_ptr) jinit_read_rle(j_compress_ptr cinfo);
EXTERN(djpeg_dest_ptr) jinit_write_rle(j_decompress_ptr cinfo);
EXTERN(cjpeg_source_ptr) jinit_read_targa(j_compress_ptr cinfo); EXTERN(cjpeg_source_ptr) jinit_read_targa(j_compress_ptr cinfo);
EXTERN(djpeg_dest_ptr) jinit_write_targa(j_decompress_ptr cinfo); EXTERN(djpeg_dest_ptr) jinit_write_targa(j_decompress_ptr cinfo);
/* cjpeg support routines (in rdswitch.c) */ /* cjpeg support routines (in rdswitch.c) */
EXTERN(boolean) read_quant_tables(j_compress_ptr cinfo, char *filename, EXTERN(boolean) read_quant_tables(j_compress_ptr cinfo, char *filename,
boolean force_baseline); boolean force_baseline);
EXTERN(boolean) read_scan_script(j_compress_ptr cinfo, char *filename); EXTERN(boolean) read_scan_script(j_compress_ptr cinfo, char *filename);
EXTERN(boolean) set_quality_ratings(j_compress_ptr cinfo, char *arg, EXTERN(boolean) set_quality_ratings(j_compress_ptr cinfo, char *arg,
boolean force_baseline); boolean force_baseline);
EXTERN(boolean) set_quant_slots(j_compress_ptr cinfo, char *arg); EXTERN(boolean) set_quant_slots(j_compress_ptr cinfo, char *arg);
EXTERN(boolean) set_sample_factors(j_compress_ptr cinfo, char *arg); EXTERN(boolean) set_sample_factors(j_compress_ptr cinfo, char *arg);
@@ -137,7 +139,7 @@ EXTERN(void) read_color_map(j_decompress_ptr cinfo, FILE *infile);
/* common support routines (in cdjpeg.c) */ /* common support routines (in cdjpeg.c) */
EXTERN(void) start_progress_monitor(j_common_ptr cinfo, EXTERN(void) start_progress_monitor(j_common_ptr cinfo,
cd_progress_ptr progress); cd_progress_ptr progress);
EXTERN(void) end_progress_monitor(j_common_ptr cinfo); EXTERN(void) end_progress_monitor(j_common_ptr cinfo);
EXTERN(boolean) keymatch(char *arg, const char *keyword, int minchars); EXTERN(boolean) keymatch(char *arg, const char *keyword, int minchars);
EXTERN(FILE *) read_stdin(void); EXTERN(FILE *) read_stdin(void);

View File

@@ -6,6 +6,25 @@ reference. Please see ChangeLog.md for information specific to libjpeg-turbo.
CHANGE LOG for Independent JPEG Group's JPEG software CHANGE LOG for Independent JPEG Group's JPEG software
Version 9d 12-Jan-2020
-----------------------
Restore GIF read and write support from libjpeg version 6a.
Thank to Wolfgang Werner (W.W.) Heinz for suggestion.
Add jpegtran -drop option; add options to the crop extension and wipe
to fill the extra area with content from the source image region,
instead of gray out.
Version 9c 14-Jan-2018
-----------------------
jpegtran: add an option to the -wipe switch to fill the region
with the average of adjacent blocks, instead of gray out.
Thank to Caitlyn Feddock and Maddie Ziegler for inspiration.
Version 9b 17-Jan-2016 Version 9b 17-Jan-2016
----------------------- -----------------------
@@ -13,6 +32,13 @@ Document 'f' specifier for jpegtran -crop specification.
Thank to Michele Martone for suggestion. Thank to Michele Martone for suggestion.
Version 9a 19-Jan-2014
-----------------------
Add jpegtran -wipe option and extension for -crop.
Thank to Andrew Senior, David Clunie, and Josef Schmid for suggestion.
Version 9 13-Jan-2013 Version 9 13-Jan-2013
---------------------- ----------------------
@@ -138,11 +164,6 @@ Huffman tables being used.
Huffman tables are checked for validity much more carefully than before. Huffman tables are checked for validity much more carefully than before.
To avoid the Unisys LZW patent, djpeg's GIF output capability has been
changed to produce "uncompressed GIFs", and cjpeg's GIF input capability
has been removed altogether. We're not happy about it either, but there
seems to be no good alternative.
The configure script now supports building libjpeg as a shared library The configure script now supports building libjpeg as a shared library
on many flavors of Unix (all the ones that GNU libtool knows how to on many flavors of Unix (all the ones that GNU libtool knows how to
build shared libraries for). Use "./configure --enable-shared" to build shared libraries for). Use "./configure --enable-shared" to

17
cjpeg.1
View File

@@ -16,8 +16,7 @@ cjpeg \- compress an image file to a JPEG file
compresses the named image file, or the standard input if no file is compresses the named image file, or the standard input if no file is
named, and produces a JPEG/JFIF file on the standard output. named, and produces a JPEG/JFIF file on the standard output.
The currently supported input file formats are: PPM (PBMPLUS color The currently supported input file formats are: PPM (PBMPLUS color
format), PGM (PBMPLUS grayscale format), BMP, Targa, and RLE (Utah Raster format), PGM (PBMPLUS grayscale format), BMP, GIF, and Targa.
Toolkit format). (RLE is supported only if the URT library is available.)
.SH OPTIONS .SH OPTIONS
All switch names may be abbreviated; for example, All switch names may be abbreviated; for example,
.B \-grayscale .B \-grayscale
@@ -42,10 +41,10 @@ Scale quantization tables to adjust image quality. Quality is 0 (worst) to
.TP .TP
.B \-grayscale .B \-grayscale
Create monochrome JPEG file from color input. Be sure to use this switch when Create monochrome JPEG file from color input. Be sure to use this switch when
compressing a grayscale BMP file, because compressing a grayscale BMP or GIF file, because
.B cjpeg .B cjpeg
isn't bright enough to notice whether a BMP file uses only shades of gray. isn't bright enough to notice whether a BMP or GIF file uses only shades of
By saying gray. By saying
.BR \-grayscale, .BR \-grayscale,
you'll get a smaller JPEG file that takes less time to process. you'll get a smaller JPEG file that takes less time to process.
.TP .TP
@@ -224,6 +223,9 @@ Compress to memory instead of a file. This feature was implemented mainly as a
way of testing the in-memory destination manager (jpeg_mem_dest()), but it is way of testing the in-memory destination manager (jpeg_mem_dest()), but it is
also useful for benchmarking, since it reduces the I/O overhead. also useful for benchmarking, since it reduces the I/O overhead.
.TP .TP
.BI \-report
Report compression progress.
.TP
.B \-verbose .B \-verbose
Enable debug printout. More Enable debug printout. More
.BR \-v 's .BR \-v 's
@@ -350,11 +352,6 @@ This file was modified by The libjpeg-turbo Project to include only information
relevant to libjpeg-turbo, to wordsmith certain sections, and to describe relevant to libjpeg-turbo, to wordsmith certain sections, and to describe
features not present in libjpeg. features not present in libjpeg.
.SH ISSUES .SH ISSUES
Support for GIF input files was removed in cjpeg v6b due to concerns over
the Unisys LZW patent. Although this patent expired in 2006, cjpeg still
lacks GIF support, for these historical reasons. (Conversion of GIF files to
JPEG is usually a bad idea anyway, since GIF is a 256-color format.)
.PP
Not all variants of BMP and Targa file formats are supported. Not all variants of BMP and Targa file formats are supported.
.PP .PP
The The

27
cjpeg.c
View File

@@ -5,7 +5,7 @@
* Copyright (C) 1991-1998, Thomas G. Lane. * Copyright (C) 1991-1998, Thomas G. Lane.
* Modified 2003-2011 by Guido Vollbeding. * Modified 2003-2011 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2010, 2013-2014, 2017, 2020, D. R. Commander. * Copyright (C) 2010, 2013-2014, 2017, 2019-2020, D. R. Commander.
* mozjpeg Modifications: * mozjpeg Modifications:
* Copyright (C) 2014, Mozilla Corporation. * Copyright (C) 2014, Mozilla Corporation.
* For conditions of distribution and use, see the accompanying README file. * For conditions of distribution and use, see the accompanying README file.
@@ -70,9 +70,9 @@ static const char * const cdjpeg_message_table[] = {
* 2) assume we can push back more than one character (works in * 2) assume we can push back more than one character (works in
* some C implementations, but unportable); * some C implementations, but unportable);
* 3) provide our own buffering (breaks input readers that want to use * 3) provide our own buffering (breaks input readers that want to use
* stdio directly, such as the RLE library); * stdio directly);
* or 4) don't put back the data, and modify the input_init methods to assume * or 4) don't put back the data, and modify the input_init methods to assume
* they start reading after the start of file (also breaks RLE library). * they start reading after the start of file.
* #1 is attractive for MS-DOS but is untenable on Unix. * #1 is attractive for MS-DOS but is untenable on Unix.
* *
* The most portable solution for file types that can't be identified by their * The most portable solution for file types that can't be identified by their
@@ -124,10 +124,6 @@ select_file_type(j_compress_ptr cinfo, FILE *infile)
copy_markers = TRUE; copy_markers = TRUE;
return jinit_read_png(cinfo); return jinit_read_png(cinfo);
#endif #endif
#ifdef RLE_SUPPORTED
case 'R':
return jinit_read_rle(cinfo);
#endif
#ifdef TARGA_SUPPORTED #ifdef TARGA_SUPPORTED
case 0x00: case 0x00:
return jinit_read_targa(cinfo); return jinit_read_targa(cinfo);
@@ -158,6 +154,7 @@ static const char *progname; /* program name for error messages */
static char *icc_filename; /* for -icc switch */ static char *icc_filename; /* for -icc switch */
static char *outfilename; /* for -outfile switch */ static char *outfilename; /* for -outfile switch */
boolean memdst; /* for -memdst switch */ boolean memdst; /* for -memdst switch */
boolean report; /* for -report switch */
LOCAL(void) LOCAL(void)
@@ -236,6 +233,7 @@ usage(void)
#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) #if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
fprintf(stderr, " -memdst Compress to memory instead of file (useful for benchmarking)\n"); fprintf(stderr, " -memdst Compress to memory instead of file (useful for benchmarking)\n");
#endif #endif
fprintf(stderr, " -report Report compression progress\n");
fprintf(stderr, " -verbose or -debug Emit debug output\n"); fprintf(stderr, " -verbose or -debug Emit debug output\n");
fprintf(stderr, " -version Print version information and exit\n"); fprintf(stderr, " -version Print version information and exit\n");
fprintf(stderr, "Switches for wizards:\n"); fprintf(stderr, "Switches for wizards:\n");
@@ -283,6 +281,7 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
icc_filename = NULL; icc_filename = NULL;
outfilename = NULL; outfilename = NULL;
memdst = FALSE; memdst = FALSE;
report = FALSE;
cinfo->err->trace_level = 0; cinfo->err->trace_level = 0;
/* Scan command line options, adjust parameters */ /* Scan command line options, adjust parameters */
@@ -470,6 +469,8 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
qtablefile = argv[argn]; qtablefile = argv[argn];
/* We postpone actually reading the file in case -quality comes later. */ /* We postpone actually reading the file in case -quality comes later. */
} else if (keymatch(arg, "report", 3)) {
report = TRUE;
} else if (keymatch(arg, "quant-table", 7)) { } else if (keymatch(arg, "quant-table", 7)) {
int val; int val;
if (++argn >= argc) /* advance to next argument */ if (++argn >= argc) /* advance to next argument */
@@ -485,7 +486,7 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
} else if (keymatch(arg, "quant-baseline", 7)) { } else if (keymatch(arg, "quant-baseline", 7)) {
/* Force quantization table to meet baseline requirements */ /* Force quantization table to meet baseline requirements */
force_baseline = TRUE; force_baseline = TRUE;
} else if (keymatch(arg, "restart", 1)) { } else if (keymatch(arg, "restart", 1)) {
/* Restart interval in MCU rows (or in MCUs with 'b'). */ /* Restart interval in MCU rows (or in MCUs with 'b'). */
long lval; long lval;
@@ -662,9 +663,7 @@ main(int argc, char **argv)
{ {
struct jpeg_compress_struct cinfo; struct jpeg_compress_struct cinfo;
struct jpeg_error_mgr jerr; struct jpeg_error_mgr jerr;
#ifdef PROGRESS_REPORT
struct cdjpeg_progress_mgr progress; struct cdjpeg_progress_mgr progress;
#endif
int file_index; int file_index;
cjpeg_source_ptr src_mgr; cjpeg_source_ptr src_mgr;
FILE *input_file; FILE *input_file;
@@ -785,9 +784,10 @@ main(int argc, char **argv)
fclose(icc_file); fclose(icc_file);
} }
#ifdef PROGRESS_REPORT if (report) {
start_progress_monitor((j_common_ptr)&cinfo, &progress); start_progress_monitor((j_common_ptr)&cinfo, &progress);
#endif progress.report = report;
}
/* Figure out the input file format, and set up to read it. */ /* Figure out the input file format, and set up to read it. */
src_mgr = select_file_type(&cinfo, input_file); src_mgr = select_file_type(&cinfo, input_file);
@@ -873,9 +873,8 @@ main(int argc, char **argv)
if (output_file != stdout && output_file != NULL) if (output_file != stdout && output_file != NULL)
fclose(output_file); fclose(output_file);
#ifdef PROGRESS_REPORT if (report)
end_progress_monitor((j_common_ptr)&cinfo); end_progress_monitor((j_common_ptr)&cinfo);
#endif
if (memdst) { if (memdst) {
fprintf(stderr, "Compressed size: %lu bytes\n", outsize); fprintf(stderr, "Compressed size: %lu bytes\n", outsize);

View File

@@ -22,13 +22,15 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Linux")
set(RPMARCH ${CMAKE_SYSTEM_PROCESSOR}) set(RPMARCH ${CMAKE_SYSTEM_PROCESSOR})
if(CPU_TYPE STREQUAL "x86_64") if(CPU_TYPE STREQUAL "x86_64")
set(DEBARCH amd64) set(DEBARCH amd64)
elseif(CMAKE_SYSTEM_PROCESSOR MATCHES "armv7*")
set(RPMARCH armv7hl)
set(DEBARCH armhf)
elseif(CPU_TYPE STREQUAL "arm64") elseif(CPU_TYPE STREQUAL "arm64")
set(DEBARCH ${CPU_TYPE}) set(DEBARCH ${CPU_TYPE})
elseif(CPU_TYPE STREQUAL "arm") elseif(CPU_TYPE STREQUAL "arm")
if(CMAKE_C_COMPILER MATCHES "gnueabihf") check_c_source_compiles("
#if __ARM_PCS_VFP != 1
#error \"float ABI = softfp\"
#endif
int main(void) { return 0; }" HAVE_HARD_FLOAT)
if(HAVE_HARD_FLOAT)
set(RPMARCH armv7hl) set(RPMARCH armv7hl)
set(DEBARCH armhf) set(DEBARCH armhf)
else() else()
@@ -78,12 +80,14 @@ if(WIN32)
if(MSVC) if(MSVC)
set(INST_PLATFORM "Visual C++") set(INST_PLATFORM "Visual C++")
set(INST_NAME ${CMAKE_PROJECT_NAME}-${VERSION}-vc) set(INST_ID vc)
set(INST_NAME ${CMAKE_PROJECT_NAME}-${VERSION}-${INST_ID})
set(INST_REG_NAME ${CMAKE_PROJECT_NAME}) set(INST_REG_NAME ${CMAKE_PROJECT_NAME})
elseif(MINGW) elseif(MINGW)
set(INST_PLATFORM GCC) set(INST_PLATFORM GCC)
set(INST_NAME ${CMAKE_PROJECT_NAME}-${VERSION}-gcc) set(INST_ID gcc)
set(INST_REG_NAME ${CMAKE_PROJECT_NAME}-gcc) set(INST_NAME ${CMAKE_PROJECT_NAME}-${VERSION}-${INST_ID})
set(INST_REG_NAME ${CMAKE_PROJECT_NAME}-${INST_ID})
set(INST_DEFS -DGCC) set(INST_DEFS -DGCC)
endif() endif()
@@ -107,6 +111,12 @@ endif()
string(REGEX REPLACE "/" "\\\\" INST_DIR ${CMAKE_INSTALL_PREFIX}) string(REGEX REPLACE "/" "\\\\" INST_DIR ${CMAKE_INSTALL_PREFIX})
configure_file(release/installer.nsi.in installer.nsi @ONLY) configure_file(release/installer.nsi.in installer.nsi @ONLY)
# TODO: It would be nice to eventually switch to CPack and eliminate this mess,
# but not today.
configure_file(win/projectTargets.cmake.in
win/${CMAKE_PROJECT_NAME}Targets.cmake @ONLY)
configure_file(win/${INST_ID}/projectTargets-release.cmake.in
win/${CMAKE_PROJECT_NAME}Targets-release.cmake @ONLY)
if(WITH_JAVA) if(WITH_JAVA)
set(JAVA_DEPEND turbojpeg-java) set(JAVA_DEPEND turbojpeg-java)
@@ -120,53 +130,28 @@ add_custom_target(installer
endif() # WIN32 endif() # WIN32
###############################################################################
# Cygwin Package
###############################################################################
if(CYGWIN)
configure_file(release/makecygwinpkg.in pkgscripts/makecygwinpkg)
add_custom_target(cygwinpkg pkgscripts/makecygwinpkg)
endif() # CYGWIN
############################################################################### ###############################################################################
# Mac DMG # Mac DMG
############################################################################### ###############################################################################
if(APPLE) if(APPLE)
set(DEFAULT_OSX_32BIT_BUILD ${CMAKE_SOURCE_DIR}/osxx86) set(ARMV8_BUILD "" CACHE PATH
set(OSX_32BIT_BUILD ${DEFAULT_OSX_32BIT_BUILD} CACHE PATH "Directory containing Armv8 iOS or macOS build to include in universal binaries")
"Directory containing 32-bit (i386) Mac build to include in universal binaries (default: ${DEFAULT_OSX_32BIT_BUILD})")
set(DEFAULT_IOS_ARMV7_BUILD ${CMAKE_SOURCE_DIR}/iosarmv7)
set(IOS_ARMV7_BUILD ${DEFAULT_IOS_ARMV7_BUILD} CACHE PATH
"Directory containing Armv7 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7_BUILD})")
set(DEFAULT_IOS_ARMV7S_BUILD ${CMAKE_SOURCE_DIR}/iosarmv7s)
set(IOS_ARMV7S_BUILD ${DEFAULT_IOS_ARMV7S_BUILD} CACHE PATH
"Directory containing Armv7s iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7S_BUILD})")
set(DEFAULT_IOS_ARMV8_BUILD ${CMAKE_SOURCE_DIR}/iosarmv8)
set(IOS_ARMV8_BUILD ${DEFAULT_IOS_ARMV8_BUILD} CACHE PATH
"Directory containing Armv8 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV8_BUILD})")
set(OSX_APP_CERT_NAME "" CACHE STRING set(MACOS_APP_CERT_NAME "" CACHE STRING
"Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG. Leave this blank to generate an unsigned DMG.") "Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG. Leave this blank to generate an unsigned DMG.")
set(OSX_INST_CERT_NAME "" CACHE STRING set(MACOS_INST_CERT_NAME "" CACHE STRING
"Name of the Developer ID Installer certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo installer package. Leave this blank to generate an unsigned package.") "Name of the Developer ID Installer certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo installer package. Leave this blank to generate an unsigned package.")
configure_file(release/makemacpkg.in pkgscripts/makemacpkg) configure_file(release/makemacpkg.in pkgscripts/makemacpkg)
configure_file(release/Distribution.xml.in pkgscripts/Distribution.xml) configure_file(release/Distribution.xml.in pkgscripts/Distribution.xml)
configure_file(release/Welcome.rtf.in pkgscripts/Welcome.rtf)
configure_file(release/uninstall.in pkgscripts/uninstall) configure_file(release/uninstall.in pkgscripts/uninstall)
add_custom_target(dmg pkgscripts/makemacpkg add_custom_target(dmg pkgscripts/makemacpkg
SOURCES pkgscripts/makemacpkg) SOURCES pkgscripts/makemacpkg)
add_custom_target(udmg pkgscripts/makemacpkg universal
SOURCES pkgscripts/makemacpkg)
endif() # APPLE endif() # APPLE
@@ -187,3 +172,12 @@ add_custom_target(tarball pkgscripts/maketarball
configure_file(release/libjpeg.pc.in pkgscripts/libjpeg.pc @ONLY) configure_file(release/libjpeg.pc.in pkgscripts/libjpeg.pc @ONLY)
configure_file(release/libturbojpeg.pc.in pkgscripts/libturbojpeg.pc @ONLY) configure_file(release/libturbojpeg.pc.in pkgscripts/libturbojpeg.pc @ONLY)
include(CMakePackageConfigHelpers)
write_basic_package_version_file(
pkgscripts/${CMAKE_PROJECT_NAME}ConfigVersion.cmake
VERSION ${VERSION} COMPATIBILITY AnyNewerVersion)
configure_package_config_file(release/Config.cmake.in
pkgscripts/${CMAKE_PROJECT_NAME}Config.cmake
INSTALL_DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/${CMAKE_PROJECT_NAME})

View File

@@ -118,6 +118,7 @@
# absolute paths where necessary, using the same logic. # absolute paths where necessary, using the same logic.
#============================================================================= #=============================================================================
# Copyright 2018 Matthias Räncker
# Copyright 2016, 2019 D. R. Commander # Copyright 2016, 2019 D. R. Commander
# Copyright 2016 Dmitry Marakasov # Copyright 2016 Dmitry Marakasov
# Copyright 2016 Roger Leigh # Copyright 2016 Roger Leigh
@@ -259,6 +260,8 @@ if(NOT DEFINED CMAKE_INSTALL_DEFAULT_LIBDIR)
else() else()
if("${CMAKE_SIZEOF_VOID_P}" EQUAL "8") if("${CMAKE_SIZEOF_VOID_P}" EQUAL "8")
set(CMAKE_INSTALL_DEFAULT_LIBDIR "lib64") set(CMAKE_INSTALL_DEFAULT_LIBDIR "lib64")
elseif(CMAKE_C_COMPILER_ABI MATCHES "ELF X32")
set(CMAKE_INSTALL_DEFAULT_LIBDIR "libx32")
endif() endif()
endif() endif()
endif() endif()

50
djpeg.1
View File

@@ -15,8 +15,7 @@ djpeg \- decompress a JPEG file to an image file
.B djpeg .B djpeg
decompresses the named JPEG file, or the standard input if no file is named, decompresses the named JPEG file, or the standard input if no file is named,
and produces an image file on the standard output. PBMPLUS (PPM/PGM), BMP, and produces an image file on the standard output. PBMPLUS (PPM/PGM), BMP,
GIF, Targa, or RLE (Utah Raster Toolkit) output format can be selected. GIF, or Targa output format can be selected.
(RLE is supported only if the URT library is available.)
.SH OPTIONS .SH OPTIONS
All switch names may be abbreviated; for example, All switch names may be abbreviated; for example,
.B \-grayscale .B \-grayscale
@@ -81,9 +80,20 @@ is specified, or if the JPEG file is grayscale; otherwise, 24-bit full-color
format is emitted. format is emitted.
.TP .TP
.B \-gif .B \-gif
Select GIF output format. Since GIF does not support more than 256 colors, Select GIF output format (LZW-compressed). Since GIF does not support more
than 256 colors,
.B \-colors 256 .B \-colors 256
is assumed (unless you specify a smaller number of colors). is assumed (unless you specify a smaller number of colors). If you specify
.BR \-fast,
the default number of colors is 216.
.TP
.B \-gif0
Select GIF output format (uncompressed). Since GIF does not support more than
256 colors,
.B \-colors 256
is assumed (unless you specify a smaller number of colors). If you specify
.BR \-fast,
the default number of colors is 216.
.TP .TP
.B \-os2 .B \-os2
Select BMP output format (OS/2 1.x flavor). 8-bit colormapped format is Select BMP output format (OS/2 1.x flavor). 8-bit colormapped format is
@@ -100,9 +110,6 @@ PGM is emitted if the JPEG file is grayscale or if
.B \-grayscale .B \-grayscale
is specified; otherwise PPM is emitted. is specified; otherwise PPM is emitted.
.TP .TP
.B \-rle
Select RLE output format. (Requires URT library.)
.TP
.B \-targa .B \-targa
Select Targa output format. Grayscale format is emitted if the JPEG file is Select Targa output format. Grayscale format is emitted if the JPEG file is
grayscale or if grayscale or if
@@ -198,6 +205,19 @@ number. For example,
.B \-max 4m .B \-max 4m
selects 4000000 bytes. If more space is needed, an error will occur. selects 4000000 bytes. If more space is needed, an error will occur.
.TP .TP
.BI \-maxscans " N"
Abort if the JPEG image contains more than
.I N
scans. This feature demonstrates a method by which applications can guard
against denial-of-service attacks instigated by specially-crafted malformed
JPEG images containing numerous scans with missing image data or image data
consisting only of "EOB runs" (a feature of progressive JPEG images that allows
potentially hundreds of thousands of adjoining zero-value pixels to be
represented using only a few bytes.) Attempting to decompress such malformed
JPEG images can cause excessive CPU activity, since the decompressor must fully
process each scan (even if the scan is corrupt) before it can proceed to the
next scan.
.TP
.BI \-outfile " name" .BI \-outfile " name"
Send output image to the named file, not to standard output. Send output image to the named file, not to standard output.
.TP .TP
@@ -205,6 +225,9 @@ Send output image to the named file, not to standard output.
Load input file into memory before decompressing. This feature was implemented Load input file into memory before decompressing. This feature was implemented
mainly as a way of testing the in-memory source manager (jpeg_mem_src().) mainly as a way of testing the in-memory source manager (jpeg_mem_src().)
.TP .TP
.BI \-report
Report decompression progress.
.TP
.BI \-skip " Y0,Y1" .BI \-skip " Y0,Y1"
Decompress all rows of the JPEG image except those between Y0 and Y1 Decompress all rows of the JPEG image except those between Y0 and Y1
(inclusive.) Note that if decompression scaling is being used, then Y0 and Y1 (inclusive.) Note that if decompression scaling is being used, then Y0 and Y1
@@ -218,6 +241,12 @@ decompression scaling is being used, then X, Y, W, and H are relative to the
scaled image dimensions. Currently this option only works with the scaled image dimensions. Currently this option only works with the
PBMPLUS (PPM/PGM), GIF, and Targa output formats. PBMPLUS (PPM/PGM), GIF, and Targa output formats.
.TP .TP
.BI \-strict
Treat all warnings as fatal. This feature also demonstrates a method by which
applications can guard against attacks instigated by specially-crafted
malformed JPEG images. Enabling this option will cause the decompressor to
abort if the JPEG image contains incomplete or corrupt image data.
.TP
.B \-verbose .B \-verbose
Enable debug printout. More Enable debug printout. More
.BR \-v 's .BR \-v 's
@@ -289,10 +318,3 @@ Independent JPEG Group
This file was modified by The libjpeg-turbo Project to include only information This file was modified by The libjpeg-turbo Project to include only information
relevant to libjpeg-turbo, to wordsmith certain sections, and to describe relevant to libjpeg-turbo, to wordsmith certain sections, and to describe
features not present in libjpeg. features not present in libjpeg.
.SH ISSUES
Support for compressed GIF output files was removed in djpeg v6b due to
concerns over the Unisys LZW patent. Although this patent expired in 2006,
djpeg still lacks compressed GIF support, for these historical reasons.
(Conversion of JPEG files to GIF is usually a bad idea anyway, since GIF is a
256-color format.) The uncompressed GIF files that djpeg generates are larger
than they should be, but they are readable by standard GIF decoders.

96
djpeg.c
View File

@@ -3,9 +3,9 @@
* *
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* Modified 2013 by Guido Vollbeding. * Modified 2013-2019 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2010-2011, 2013-2017, 2020, D. R. Commander. * Copyright (C) 2010-2011, 2013-2017, 2019-2020, D. R. Commander.
* Copyright (C) 2015, Google, Inc. * Copyright (C) 2015, Google, Inc.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
@@ -68,10 +68,10 @@ static const char * const cdjpeg_message_table[] = {
typedef enum { typedef enum {
FMT_BMP, /* BMP format (Windows flavor) */ FMT_BMP, /* BMP format (Windows flavor) */
FMT_GIF, /* GIF format */ FMT_GIF, /* GIF format (LZW-compressed) */
FMT_GIF0, /* GIF format (uncompressed) */
FMT_OS2, /* BMP format (OS/2 flavor) */ FMT_OS2, /* BMP format (OS/2 flavor) */
FMT_PPM, /* PPM/PGM (PBMPLUS formats) */ FMT_PPM, /* PPM/PGM (PBMPLUS formats) */
FMT_RLE, /* RLE format */
FMT_TARGA, /* Targa format */ FMT_TARGA, /* Targa format */
FMT_TIFF /* TIFF format */ FMT_TIFF /* TIFF format */
} IMAGE_FORMATS; } IMAGE_FORMATS;
@@ -94,11 +94,14 @@ static IMAGE_FORMATS requested_fmt;
static const char *progname; /* program name for error messages */ static const char *progname; /* program name for error messages */
static char *icc_filename; /* for -icc switch */ static char *icc_filename; /* for -icc switch */
JDIMENSION max_scans; /* for -maxscans switch */
static char *outfilename; /* for -outfile switch */ static char *outfilename; /* for -outfile switch */
boolean memsrc; /* for -memsrc switch */ boolean memsrc; /* for -memsrc switch */
boolean report; /* for -report switch */
boolean skip, crop; boolean skip, crop;
JDIMENSION skip_start, skip_end; JDIMENSION skip_start, skip_end;
JDIMENSION crop_x, crop_y, crop_width, crop_height; JDIMENSION crop_x, crop_y, crop_width, crop_height;
boolean strict; /* for -strict switch */
#define INPUT_BUF_SIZE 4096 #define INPUT_BUF_SIZE 4096
@@ -127,8 +130,10 @@ usage(void)
(DEFAULT_FMT == FMT_BMP ? " (default)" : "")); (DEFAULT_FMT == FMT_BMP ? " (default)" : ""));
#endif #endif
#ifdef GIF_SUPPORTED #ifdef GIF_SUPPORTED
fprintf(stderr, " -gif Select GIF output format%s\n", fprintf(stderr, " -gif Select GIF output format (LZW-compressed)%s\n",
(DEFAULT_FMT == FMT_GIF ? " (default)" : "")); (DEFAULT_FMT == FMT_GIF ? " (default)" : ""));
fprintf(stderr, " -gif0 Select GIF output format (uncompressed)%s\n",
(DEFAULT_FMT == FMT_GIF0 ? " (default)" : ""));
#endif #endif
#ifdef BMP_SUPPORTED #ifdef BMP_SUPPORTED
fprintf(stderr, " -os2 Select BMP output format (OS/2 style)%s\n", fprintf(stderr, " -os2 Select BMP output format (OS/2 style)%s\n",
@@ -138,10 +143,6 @@ usage(void)
fprintf(stderr, " -pnm Select PBMPLUS (PPM/PGM) output format%s\n", fprintf(stderr, " -pnm Select PBMPLUS (PPM/PGM) output format%s\n",
(DEFAULT_FMT == FMT_PPM ? " (default)" : "")); (DEFAULT_FMT == FMT_PPM ? " (default)" : ""));
#endif #endif
#ifdef RLE_SUPPORTED
fprintf(stderr, " -rle Select Utah RLE output format%s\n",
(DEFAULT_FMT == FMT_RLE ? " (default)" : ""));
#endif
#ifdef TARGA_SUPPORTED #ifdef TARGA_SUPPORTED
fprintf(stderr, " -targa Select Targa output format%s\n", fprintf(stderr, " -targa Select Targa output format%s\n",
(DEFAULT_FMT == FMT_TARGA ? " (default)" : "")); (DEFAULT_FMT == FMT_TARGA ? " (default)" : ""));
@@ -171,14 +172,16 @@ usage(void)
fprintf(stderr, " -onepass Use 1-pass quantization (fast, low quality)\n"); fprintf(stderr, " -onepass Use 1-pass quantization (fast, low quality)\n");
#endif #endif
fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n"); fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
fprintf(stderr, " -maxscans N Maximum number of scans to allow in input file\n");
fprintf(stderr, " -outfile name Specify name for output file\n"); fprintf(stderr, " -outfile name Specify name for output file\n");
#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) #if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
fprintf(stderr, " -memsrc Load input file into memory before decompressing\n"); fprintf(stderr, " -memsrc Load input file into memory before decompressing\n");
#endif #endif
fprintf(stderr, " -report Report decompression progress\n");
fprintf(stderr, " -skip Y0,Y1 Decompress all rows except those between Y0 and Y1 (inclusive)\n"); fprintf(stderr, " -skip Y0,Y1 Decompress all rows except those between Y0 and Y1 (inclusive)\n");
fprintf(stderr, " -crop WxH+X+Y Decompress only a rectangular subregion of the image\n"); fprintf(stderr, " -crop WxH+X+Y Decompress only a rectangular subregion of the image\n");
fprintf(stderr, " [requires PBMPLUS (PPM/PGM), GIF, or Targa output format]\n"); fprintf(stderr, " [requires PBMPLUS (PPM/PGM), GIF, or Targa output format]\n");
fprintf(stderr, " -strict Treat all warnings as fatal\n");
fprintf(stderr, " -verbose or -debug Emit debug output\n"); fprintf(stderr, " -verbose or -debug Emit debug output\n");
fprintf(stderr, " -version Print version information and exit\n"); fprintf(stderr, " -version Print version information and exit\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
@@ -203,10 +206,13 @@ parse_switches(j_decompress_ptr cinfo, int argc, char **argv,
/* Set up default JPEG parameters. */ /* Set up default JPEG parameters. */
requested_fmt = DEFAULT_FMT; /* set default output file format */ requested_fmt = DEFAULT_FMT; /* set default output file format */
icc_filename = NULL; icc_filename = NULL;
max_scans = 0;
outfilename = NULL; outfilename = NULL;
memsrc = FALSE; memsrc = FALSE;
report = FALSE;
skip = FALSE; skip = FALSE;
crop = FALSE; crop = FALSE;
strict = FALSE;
cinfo->err->trace_level = 0; cinfo->err->trace_level = 0;
/* Scan command line options, adjust parameters */ /* Scan command line options, adjust parameters */
@@ -224,7 +230,7 @@ parse_switches(j_decompress_ptr cinfo, int argc, char **argv,
arg++; /* advance past switch marker character */ arg++; /* advance past switch marker character */
if (keymatch(arg, "bmp", 1)) { if (keymatch(arg, "bmp", 1)) {
/* BMP output format. */ /* BMP output format (Windows flavor). */
requested_fmt = FMT_BMP; requested_fmt = FMT_BMP;
} else if (keymatch(arg, "colors", 1) || keymatch(arg, "colours", 1) || } else if (keymatch(arg, "colors", 1) || keymatch(arg, "colours", 1) ||
@@ -295,9 +301,13 @@ parse_switches(j_decompress_ptr cinfo, int argc, char **argv,
cinfo->do_fancy_upsampling = FALSE; cinfo->do_fancy_upsampling = FALSE;
} else if (keymatch(arg, "gif", 1)) { } else if (keymatch(arg, "gif", 1)) {
/* GIF output format. */ /* GIF output format (LZW-compressed). */
requested_fmt = FMT_GIF; requested_fmt = FMT_GIF;
} else if (keymatch(arg, "gif0", 4)) {
/* GIF output format (uncompressed). */
requested_fmt = FMT_GIF0;
} else if (keymatch(arg, "grayscale", 2) || } else if (keymatch(arg, "grayscale", 2) ||
keymatch(arg, "greyscale", 2)) { keymatch(arg, "greyscale", 2)) {
/* Force monochrome output. */ /* Force monochrome output. */
@@ -351,6 +361,12 @@ parse_switches(j_decompress_ptr cinfo, int argc, char **argv,
lval *= 1000L; lval *= 1000L;
cinfo->mem->max_memory_to_use = lval * 1000L; cinfo->mem->max_memory_to_use = lval * 1000L;
} else if (keymatch(arg, "maxscans", 4)) {
if (++argn >= argc) /* advance to next argument */
usage();
if (sscanf(argv[argn], "%u", &max_scans) != 1)
usage();
} else if (keymatch(arg, "nosmooth", 3)) { } else if (keymatch(arg, "nosmooth", 3)) {
/* Suppress fancy upsampling */ /* Suppress fancy upsampling */
cinfo->do_fancy_upsampling = FALSE; cinfo->do_fancy_upsampling = FALSE;
@@ -383,9 +399,8 @@ parse_switches(j_decompress_ptr cinfo, int argc, char **argv,
/* PPM/PGM output format. */ /* PPM/PGM output format. */
requested_fmt = FMT_PPM; requested_fmt = FMT_PPM;
} else if (keymatch(arg, "rle", 1)) { } else if (keymatch(arg, "report", 2)) {
/* RLE output format. */ report = TRUE;
requested_fmt = FMT_RLE;
} else if (keymatch(arg, "scale", 2)) { } else if (keymatch(arg, "scale", 2)) {
/* Scale the output image by a fraction M/N. */ /* Scale the output image by a fraction M/N. */
@@ -413,6 +428,9 @@ parse_switches(j_decompress_ptr cinfo, int argc, char **argv,
usage(); usage();
crop = TRUE; crop = TRUE;
} else if (keymatch(arg, "strict", 2)) {
strict = TRUE;
} else if (keymatch(arg, "targa", 1)) { } else if (keymatch(arg, "targa", 1)) {
/* Targa output format. */ /* Targa output format. */
requested_fmt = FMT_TARGA; requested_fmt = FMT_TARGA;
@@ -444,7 +462,7 @@ jpeg_getc(j_decompress_ptr cinfo)
ERREXIT(cinfo, JERR_CANT_SUSPEND); ERREXIT(cinfo, JERR_CANT_SUSPEND);
} }
datasrc->bytes_in_buffer--; datasrc->bytes_in_buffer--;
return GETJOCTET(*datasrc->next_input_byte++); return *datasrc->next_input_byte++;
} }
@@ -499,6 +517,19 @@ print_text_marker(j_decompress_ptr cinfo)
} }
METHODDEF(void)
my_emit_message(j_common_ptr cinfo, int msg_level)
{
if (msg_level < 0) {
/* Treat warning as fatal */
cinfo->err->error_exit(cinfo);
} else {
if (cinfo->err->trace_level >= msg_level)
cinfo->err->output_message(cinfo);
}
}
/* /*
* The main program. * The main program.
*/ */
@@ -508,9 +539,7 @@ main(int argc, char **argv)
{ {
struct jpeg_decompress_struct cinfo; struct jpeg_decompress_struct cinfo;
struct jpeg_error_mgr jerr; struct jpeg_error_mgr jerr;
#ifdef PROGRESS_REPORT
struct cdjpeg_progress_mgr progress; struct cdjpeg_progress_mgr progress;
#endif
int file_index; int file_index;
djpeg_dest_ptr dest_mgr = NULL; djpeg_dest_ptr dest_mgr = NULL;
FILE *input_file; FILE *input_file;
@@ -557,6 +586,9 @@ main(int argc, char **argv)
file_index = parse_switches(&cinfo, argc, argv, 0, FALSE); file_index = parse_switches(&cinfo, argc, argv, 0, FALSE);
if (strict)
jerr.emit_message = my_emit_message;
#ifdef TWO_FILE_COMMANDLINE #ifdef TWO_FILE_COMMANDLINE
/* Must have either -outfile switch or explicit output file name */ /* Must have either -outfile switch or explicit output file name */
if (outfilename == NULL) { if (outfilename == NULL) {
@@ -603,9 +635,11 @@ main(int argc, char **argv)
output_file = write_stdout(); output_file = write_stdout();
} }
#ifdef PROGRESS_REPORT if (report || max_scans != 0) {
start_progress_monitor((j_common_ptr)&cinfo, &progress); start_progress_monitor((j_common_ptr)&cinfo, &progress);
#endif progress.report = report;
progress.max_scans = max_scans;
}
/* Specify data source for decompression */ /* Specify data source for decompression */
#if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED) #if JPEG_LIB_VERSION >= 80 || defined(MEM_SRCDST_SUPPORTED)
@@ -653,7 +687,10 @@ main(int argc, char **argv)
#endif #endif
#ifdef GIF_SUPPORTED #ifdef GIF_SUPPORTED
case FMT_GIF: case FMT_GIF:
dest_mgr = jinit_write_gif(&cinfo); dest_mgr = jinit_write_gif(&cinfo, TRUE);
break;
case FMT_GIF0:
dest_mgr = jinit_write_gif(&cinfo, FALSE);
break; break;
#endif #endif
#ifdef PPM_SUPPORTED #ifdef PPM_SUPPORTED
@@ -661,11 +698,6 @@ main(int argc, char **argv)
dest_mgr = jinit_write_ppm(&cinfo); dest_mgr = jinit_write_ppm(&cinfo);
break; break;
#endif #endif
#ifdef RLE_SUPPORTED
case FMT_RLE:
dest_mgr = jinit_write_rle(&cinfo);
break;
#endif
#ifdef TARGA_SUPPORTED #ifdef TARGA_SUPPORTED
case FMT_TARGA: case FMT_TARGA:
dest_mgr = jinit_write_targa(&cinfo); dest_mgr = jinit_write_targa(&cinfo);
@@ -781,12 +813,11 @@ main(int argc, char **argv)
} }
} }
#ifdef PROGRESS_REPORT
/* Hack: count final pass as done in case finish_output does an extra pass. /* Hack: count final pass as done in case finish_output does an extra pass.
* The library won't have updated completed_passes. * The library won't have updated completed_passes.
*/ */
progress.pub.completed_passes = progress.pub.total_passes; if (report || max_scans != 0)
#endif progress.pub.completed_passes = progress.pub.total_passes;
if (icc_filename != NULL) { if (icc_filename != NULL) {
FILE *icc_file; FILE *icc_file;
@@ -825,9 +856,8 @@ main(int argc, char **argv)
if (output_file != stdout) if (output_file != stdout)
fclose(output_file); fclose(output_file);
#ifdef PROGRESS_REPORT if (report || max_scans != 0)
end_progress_monitor((j_common_ptr)&cinfo); end_progress_monitor((j_common_ptr)&cinfo);
#endif
if (memsrc) if (memsrc)
free(inbuffer); free(inbuffer);

View File

@@ -38,7 +38,7 @@ Installation Directory
---------------------- ----------------------
The TurboJPEG Java Wrapper will look for the TurboJPEG JNI library The TurboJPEG Java Wrapper will look for the TurboJPEG JNI library
(libturbojpeg.so, libturbojpeg.jnilib, or turbojpeg.dll) in the system library (libturbojpeg.so, libturbojpeg.dylib, or turbojpeg.dll) in the system library
paths or in any paths specified in LD_LIBRARY_PATH (Un*x), DYLD_LIBRARY_PATH paths or in any paths specified in LD_LIBRARY_PATH (Un*x), DYLD_LIBRARY_PATH
(Mac), or PATH (Windows.) Failing this, on Un*x and Mac systems, the wrapper (Mac), or PATH (Windows.) Failing this, on Un*x and Mac systems, the wrapper
will look for the JNI library under the library directory configured when will look for the JNI library under the library directory configured when

View File

@@ -1,5 +1,5 @@
/* /*
* Copyright (C)2011-2013, 2016 D. R. Commander. All Rights Reserved. * Copyright (C)2011-2013, 2016, 2020 D. R. Commander. All Rights Reserved.
* *
* Redistribution and use in source and binary forms, with or without * Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met: * modification, are permitted provided that the following conditions are met:
@@ -36,9 +36,9 @@ final class TJLoader {
String os = System.getProperty("os.name").toLowerCase(); String os = System.getProperty("os.name").toLowerCase();
if (os.indexOf("mac") >= 0) { if (os.indexOf("mac") >= 0) {
try { try {
System.load("@CMAKE_INSTALL_FULL_LIBDIR@/libturbojpeg.jnilib"); System.load("@CMAKE_INSTALL_FULL_LIBDIR@/libturbojpeg.dylib");
} catch (java.lang.UnsatisfiedLinkError e2) { } catch (java.lang.UnsatisfiedLinkError e2) {
System.load("/usr/lib/libturbojpeg.jnilib"); System.load("/usr/lib/libturbojpeg.dylib");
} }
} else { } else {
try { try {

View File

@@ -48,9 +48,9 @@ rgb_ycc_convert_internal(j_compress_ptr cinfo, JSAMPARRAY input_buf,
outptr2 = output_buf[2][output_row]; outptr2 = output_buf[2][output_row];
output_row++; output_row++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
r = GETJSAMPLE(inptr[RGB_RED]); r = inptr[RGB_RED];
g = GETJSAMPLE(inptr[RGB_GREEN]); g = inptr[RGB_GREEN];
b = GETJSAMPLE(inptr[RGB_BLUE]); b = inptr[RGB_BLUE];
inptr += RGB_PIXELSIZE; inptr += RGB_PIXELSIZE;
/* If the inputs are 0..MAXJSAMPLE, the outputs of these equations /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations
* must be too; we do not need an explicit range-limiting operation. * must be too; we do not need an explicit range-limiting operation.
@@ -100,9 +100,9 @@ rgb_gray_convert_internal(j_compress_ptr cinfo, JSAMPARRAY input_buf,
outptr = output_buf[0][output_row]; outptr = output_buf[0][output_row];
output_row++; output_row++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
r = GETJSAMPLE(inptr[RGB_RED]); r = inptr[RGB_RED];
g = GETJSAMPLE(inptr[RGB_GREEN]); g = inptr[RGB_GREEN];
b = GETJSAMPLE(inptr[RGB_BLUE]); b = inptr[RGB_BLUE];
inptr += RGB_PIXELSIZE; inptr += RGB_PIXELSIZE;
/* Y */ /* Y */
outptr[col] = (JSAMPLE)((ctab[r + R_Y_OFF] + ctab[g + G_Y_OFF] + outptr[col] = (JSAMPLE)((ctab[r + R_Y_OFF] + ctab[g + G_Y_OFF] +
@@ -135,9 +135,9 @@ rgb_rgb_convert_internal(j_compress_ptr cinfo, JSAMPARRAY input_buf,
outptr2 = output_buf[2][output_row]; outptr2 = output_buf[2][output_row];
output_row++; output_row++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
outptr0[col] = GETJSAMPLE(inptr[RGB_RED]); outptr0[col] = inptr[RGB_RED];
outptr1[col] = GETJSAMPLE(inptr[RGB_GREEN]); outptr1[col] = inptr[RGB_GREEN];
outptr2[col] = GETJSAMPLE(inptr[RGB_BLUE]); outptr2[col] = inptr[RGB_BLUE];
inptr += RGB_PIXELSIZE; inptr += RGB_PIXELSIZE;
} }
} }

View File

@@ -392,11 +392,11 @@ cmyk_ycck_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf,
outptr3 = output_buf[3][output_row]; outptr3 = output_buf[3][output_row];
output_row++; output_row++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
r = MAXJSAMPLE - GETJSAMPLE(inptr[0]); r = MAXJSAMPLE - inptr[0];
g = MAXJSAMPLE - GETJSAMPLE(inptr[1]); g = MAXJSAMPLE - inptr[1];
b = MAXJSAMPLE - GETJSAMPLE(inptr[2]); b = MAXJSAMPLE - inptr[2];
/* K passes through as-is */ /* K passes through as-is */
outptr3[col] = inptr[3]; /* don't need GETJSAMPLE here */ outptr3[col] = inptr[3];
inptr += 4; inptr += 4;
/* If the inputs are 0..MAXJSAMPLE, the outputs of these equations /* If the inputs are 0..MAXJSAMPLE, the outputs of these equations
* must be too; we do not need an explicit range-limiting operation. * must be too; we do not need an explicit range-limiting operation.
@@ -438,7 +438,7 @@ grayscale_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf,
outptr = output_buf[0][output_row]; outptr = output_buf[0][output_row];
output_row++; output_row++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
outptr[col] = inptr[0]; /* don't need GETJSAMPLE() here */ outptr[col] = inptr[0];
inptr += instride; inptr += instride;
} }
} }
@@ -497,7 +497,7 @@ null_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
inptr = *input_buf; inptr = *input_buf;
outptr = output_buf[ci][output_row]; outptr = output_buf[ci][output_row];
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
outptr[col] = inptr[ci]; /* don't need GETJSAMPLE() here */ outptr[col] = inptr[ci];
inptr += nc; inptr += nc;
} }
} }

View File

@@ -574,19 +574,19 @@ convsamp (JSAMPARRAY sample_data, JDIMENSION start_col, DCTELEM *workspace)
elemptr = sample_data[elemr] + start_col; elemptr = sample_data[elemr] + start_col;
#if DCTSIZE == 8 /* unroll the inner loop */ #if DCTSIZE == 8 /* unroll the inner loop */
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
#else #else
{ {
register int elemc; register int elemc;
for (elemc = DCTSIZE; elemc > 0; elemc--) for (elemc = DCTSIZE; elemc > 0; elemc--)
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE; *workspaceptr++ = (*elemptr++) - CENTERJSAMPLE;
} }
#endif #endif
} }
@@ -774,20 +774,19 @@ convsamp_float(JSAMPARRAY sample_data, JDIMENSION start_col,
for (elemr = 0; elemr < DCTSIZE; elemr++) { for (elemr = 0; elemr < DCTSIZE; elemr++) {
elemptr = sample_data[elemr] + start_col; elemptr = sample_data[elemr] + start_col;
#if DCTSIZE == 8 /* unroll the inner loop */ #if DCTSIZE == 8 /* unroll the inner loop */
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE); *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
#else #else
{ {
register int elemc; register int elemc;
for (elemc = DCTSIZE; elemc > 0; elemc--) for (elemc = DCTSIZE; elemc > 0; elemc--)
*workspaceptr++ = (FAST_FLOAT) *workspaceptr++ = (FAST_FLOAT)((*elemptr++) - CENTERJSAMPLE);
(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
} }
#endif #endif
} }

399
jchuff.c
View File

@@ -4,8 +4,10 @@
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2009-2011, 2014-2016, 2018-2019, D. R. Commander. * Copyright (C) 2009-2011, 2014-2016, 2018-2020, D. R. Commander.
* Copyright (C) 2015, Matthieu Darbois. * Copyright (C) 2015, Matthieu Darbois.
* Copyright (C) 2018, Matthias Räncker.
* Copyright (C) 2020, Arm Limited.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -42,15 +44,19 @@
* flags (this defines __thumb__). * flags (this defines __thumb__).
*/ */
/* NOTE: Both GCC and Clang define __GNUC__ */ #if defined(__arm__) || defined(__aarch64__) || defined(_M_ARM) || \
#if defined(__GNUC__) && (defined(__arm__) || defined(__aarch64__)) defined(_M_ARM64)
#if !defined(__thumb__) || defined(__thumb2__) #if !defined(__thumb__) || defined(__thumb2__)
#define USE_CLZ_INTRINSIC #define USE_CLZ_INTRINSIC
#endif #endif
#endif #endif
#ifdef USE_CLZ_INTRINSIC #ifdef USE_CLZ_INTRINSIC
#if defined(_MSC_VER) && !defined(__clang__)
#define JPEG_NBITS_NONZERO(x) (32 - _CountLeadingZeros(x))
#else
#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x)) #define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x))
#endif
#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0) #define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0)
#else #else
#include "jpeg_nbits_table.h" #include "jpeg_nbits_table.h"
@@ -65,32 +71,43 @@
* but must not be updated permanently until we complete the MCU. * but must not be updated permanently until we complete the MCU.
*/ */
#if defined(__x86_64__) && defined(__ILP32__)
typedef unsigned long long bit_buf_type;
#else
typedef size_t bit_buf_type;
#endif
/* NOTE: The more optimal Huffman encoding algorithm is only used by the
* intrinsics implementation of the Arm Neon SIMD extensions, which is why we
* retain the old Huffman encoder behavior when using the GAS implementation.
*/
#if defined(WITH_SIMD) && !(defined(__arm__) || defined(__aarch64__) || \
defined(_M_ARM) || defined(_M_ARM64))
typedef unsigned long long simd_bit_buf_type;
#else
typedef bit_buf_type simd_bit_buf_type;
#endif
#if (defined(SIZEOF_SIZE_T) && SIZEOF_SIZE_T == 8) || defined(_WIN64) || \
(defined(__x86_64__) && defined(__ILP32__))
#define BIT_BUF_SIZE 64
#elif (defined(SIZEOF_SIZE_T) && SIZEOF_SIZE_T == 4) || defined(_WIN32)
#define BIT_BUF_SIZE 32
#else
#error Cannot determine word size
#endif
#define SIMD_BIT_BUF_SIZE (sizeof(simd_bit_buf_type) * 8)
typedef struct { typedef struct {
size_t put_buffer; /* current bit-accumulation buffer */ union {
int put_bits; /* # of bits now in it */ bit_buf_type c;
simd_bit_buf_type simd;
} put_buffer; /* current bit accumulation buffer */
int free_bits; /* # of bits available in it */
/* (Neon GAS: # of bits now in it) */
int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */
} savable_state; } savable_state;
/* This macro is to work around compilers with missing or broken
* structure assignment. You'll need to fix this code if you have
* such a compiler and you change MAX_COMPS_IN_SCAN.
*/
#ifndef NO_STRUCT_ASSIGN
#define ASSIGN_STATE(dest, src) ((dest) = (src))
#else
#if MAX_COMPS_IN_SCAN == 4
#define ASSIGN_STATE(dest, src) \
((dest).put_buffer = (src).put_buffer, \
(dest).put_bits = (src).put_bits, \
(dest).last_dc_val[0] = (src).last_dc_val[0], \
(dest).last_dc_val[1] = (src).last_dc_val[1], \
(dest).last_dc_val[2] = (src).last_dc_val[2], \
(dest).last_dc_val[3] = (src).last_dc_val[3])
#endif
#endif
typedef struct { typedef struct {
struct jpeg_entropy_encoder pub; /* public fields */ struct jpeg_entropy_encoder pub; /* public fields */
@@ -123,6 +140,7 @@ typedef struct {
size_t free_in_buffer; /* # of byte spaces remaining in buffer */ size_t free_in_buffer; /* # of byte spaces remaining in buffer */
savable_state cur; /* Current bit buffer & DC state */ savable_state cur; /* Current bit buffer & DC state */
j_compress_ptr cinfo; /* dump_buffer needs access to this */ j_compress_ptr cinfo; /* dump_buffer needs access to this */
int simd;
} working_state; } working_state;
@@ -201,8 +219,17 @@ start_pass_huff(j_compress_ptr cinfo, boolean gather_statistics)
} }
/* Initialize bit buffer to empty */ /* Initialize bit buffer to empty */
entropy->saved.put_buffer = 0; if (entropy->simd) {
entropy->saved.put_bits = 0; entropy->saved.put_buffer.simd = 0;
#if defined(__aarch64__) && !defined(NEON_INTRINSICS)
entropy->saved.free_bits = 0;
#else
entropy->saved.free_bits = SIMD_BIT_BUF_SIZE;
#endif
} else {
entropy->saved.put_buffer.c = 0;
entropy->saved.free_bits = BIT_BUF_SIZE;
}
/* Initialize restart stuff */ /* Initialize restart stuff */
entropy->restarts_to_go = cinfo->restart_interval; entropy->restarts_to_go = cinfo->restart_interval;
@@ -334,94 +361,94 @@ dump_buffer(working_state *state)
/* Outputting bits to the file */ /* Outputting bits to the file */
/* These macros perform the same task as the emit_bits() function in the /* Output byte b and, speculatively, an additional 0 byte. 0xFF must be
* original libjpeg code. In addition to reducing overhead by explicitly * encoded as 0xFF 0x00, so the output buffer pointer is advanced by 2 if the
* inlining the code, additional performance is achieved by taking into * byte is 0xFF. Otherwise, the output buffer pointer is advanced by 1, and
* account the size of the bit buffer and waiting until it is almost full * the speculative 0 byte will be overwritten by the next byte.
* before emptying it. This mostly benefits 64-bit platforms, since 6
* bytes can be stored in a 64-bit bit buffer before it has to be emptied.
*/ */
#define EMIT_BYTE(b) { \
#define EMIT_BYTE() { \ buffer[0] = (JOCTET)(b); \
JOCTET c; \ buffer[1] = 0; \
put_bits -= 8; \ buffer -= -2 + ((JOCTET)(b) < 0xFF); \
c = (JOCTET)GETJOCTET(put_buffer >> put_bits); \
*buffer++ = c; \
if (c == 0xFF) /* need to stuff a zero byte? */ \
*buffer++ = 0; \
} }
#define PUT_BITS(code, size) { \ /* Output the entire bit buffer. If there are no 0xFF bytes in it, then write
put_bits += size; \ * directly to the output buffer. Otherwise, use the EMIT_BYTE() macro to
put_buffer = (put_buffer << size) | code; \ * encode 0xFF as 0xFF 0x00.
} */
#if BIT_BUF_SIZE == 64
#if SIZEOF_SIZE_T != 8 && !defined(_WIN64) #define FLUSH() { \
if (put_buffer & 0x8080808080808080 & ~(put_buffer + 0x0101010101010101)) { \
#define CHECKBUF15() { \ EMIT_BYTE(put_buffer >> 56) \
if (put_bits > 15) { \ EMIT_BYTE(put_buffer >> 48) \
EMIT_BYTE() \ EMIT_BYTE(put_buffer >> 40) \
EMIT_BYTE() \ EMIT_BYTE(put_buffer >> 32) \
EMIT_BYTE(put_buffer >> 24) \
EMIT_BYTE(put_buffer >> 16) \
EMIT_BYTE(put_buffer >> 8) \
EMIT_BYTE(put_buffer ) \
} else { \
buffer[0] = (JOCTET)(put_buffer >> 56); \
buffer[1] = (JOCTET)(put_buffer >> 48); \
buffer[2] = (JOCTET)(put_buffer >> 40); \
buffer[3] = (JOCTET)(put_buffer >> 32); \
buffer[4] = (JOCTET)(put_buffer >> 24); \
buffer[5] = (JOCTET)(put_buffer >> 16); \
buffer[6] = (JOCTET)(put_buffer >> 8); \
buffer[7] = (JOCTET)(put_buffer); \
buffer += 8; \
} \ } \
} }
#endif
#define CHECKBUF31() { \
if (put_bits > 31) { \
EMIT_BYTE() \
EMIT_BYTE() \
EMIT_BYTE() \
EMIT_BYTE() \
} \
}
#define CHECKBUF47() { \
if (put_bits > 47) { \
EMIT_BYTE() \
EMIT_BYTE() \
EMIT_BYTE() \
EMIT_BYTE() \
EMIT_BYTE() \
EMIT_BYTE() \
} \
}
#if !defined(_WIN32) && !defined(SIZEOF_SIZE_T)
#error Cannot determine word size
#endif
#if SIZEOF_SIZE_T == 8 || defined(_WIN64)
#define EMIT_BITS(code, size) { \
CHECKBUF47() \
PUT_BITS(code, size) \
}
#define EMIT_CODE(code, size) { \
temp2 &= (((JLONG)1) << nbits) - 1; \
CHECKBUF31() \
PUT_BITS(code, size) \
PUT_BITS(temp2, nbits) \
}
#else #else
#define EMIT_BITS(code, size) { \ #define FLUSH() { \
PUT_BITS(code, size) \ if (put_buffer & 0x80808080 & ~(put_buffer + 0x01010101)) { \
CHECKBUF15() \ EMIT_BYTE(put_buffer >> 24) \
} EMIT_BYTE(put_buffer >> 16) \
EMIT_BYTE(put_buffer >> 8) \
#define EMIT_CODE(code, size) { \ EMIT_BYTE(put_buffer ) \
temp2 &= (((JLONG)1) << nbits) - 1; \ } else { \
PUT_BITS(code, size) \ buffer[0] = (JOCTET)(put_buffer >> 24); \
CHECKBUF15() \ buffer[1] = (JOCTET)(put_buffer >> 16); \
PUT_BITS(temp2, nbits) \ buffer[2] = (JOCTET)(put_buffer >> 8); \
CHECKBUF15() \ buffer[3] = (JOCTET)(put_buffer); \
buffer += 4; \
} \
} }
#endif #endif
/* Fill the bit buffer to capacity with the leading bits from code, then output
* the bit buffer and put the remaining bits from code into the bit buffer.
*/
#define PUT_AND_FLUSH(code, size) { \
put_buffer = (put_buffer << (size + free_bits)) | (code >> -free_bits); \
FLUSH() \
free_bits += BIT_BUF_SIZE; \
put_buffer = code; \
}
/* Insert code into the bit buffer and output the bit buffer if needed.
* NOTE: We can't flush with free_bits == 0, since the left shift in
* PUT_AND_FLUSH() would have undefined behavior.
*/
#define PUT_BITS(code, size) { \
free_bits -= size; \
if (free_bits < 0) \
PUT_AND_FLUSH(code, size) \
else \
put_buffer = (put_buffer << size) | code; \
}
#define PUT_CODE(code, size) { \
temp &= (((JLONG)1) << nbits) - 1; \
temp |= code << nbits; \
nbits += size; \
PUT_BITS(temp, nbits) \
}
/* Although it is exceedingly rare, it is possible for a Huffman-encoded /* Although it is exceedingly rare, it is possible for a Huffman-encoded
* coefficient block to be larger than the 128-byte unencoded block. For each * coefficient block to be larger than the 128-byte unencoded block. For each
@@ -444,6 +471,7 @@ dump_buffer(working_state *state)
#define STORE_BUFFER() { \ #define STORE_BUFFER() { \
if (localbuf) { \ if (localbuf) { \
size_t bytes, bytestocopy; \
bytes = buffer - _buffer; \ bytes = buffer - _buffer; \
buffer = _buffer; \ buffer = _buffer; \
while (bytes > 0) { \ while (bytes > 0) { \
@@ -466,20 +494,46 @@ dump_buffer(working_state *state)
LOCAL(boolean) LOCAL(boolean)
flush_bits(working_state *state) flush_bits(working_state *state)
{ {
JOCTET _buffer[BUFSIZE], *buffer; JOCTET _buffer[BUFSIZE], *buffer, temp;
size_t put_buffer; int put_bits; simd_bit_buf_type put_buffer; int put_bits;
size_t bytes, bytestocopy; int localbuf = 0; int localbuf = 0;
if (state->simd) {
#if defined(__aarch64__) && !defined(NEON_INTRINSICS)
put_bits = state->cur.free_bits;
#else
put_bits = SIMD_BIT_BUF_SIZE - state->cur.free_bits;
#endif
put_buffer = state->cur.put_buffer.simd;
} else {
put_bits = BIT_BUF_SIZE - state->cur.free_bits;
put_buffer = state->cur.put_buffer.c;
}
put_buffer = state->cur.put_buffer;
put_bits = state->cur.put_bits;
LOAD_BUFFER() LOAD_BUFFER()
/* fill any partial byte with ones */ while (put_bits >= 8) {
PUT_BITS(0x7F, 7) put_bits -= 8;
while (put_bits >= 8) EMIT_BYTE() temp = (JOCTET)(put_buffer >> put_bits);
EMIT_BYTE(temp)
}
if (put_bits) {
/* fill partial byte with ones */
temp = (JOCTET)((put_buffer << (8 - put_bits)) | (0xFF >> put_bits));
EMIT_BYTE(temp)
}
state->cur.put_buffer = 0; /* and reset bit-buffer to empty */ if (state->simd) { /* and reset bit buffer to empty */
state->cur.put_bits = 0; state->cur.put_buffer.simd = 0;
#if defined(__aarch64__) && !defined(NEON_INTRINSICS)
state->cur.free_bits = 0;
#else
state->cur.free_bits = SIMD_BIT_BUF_SIZE;
#endif
} else {
state->cur.put_buffer.c = 0;
state->cur.free_bits = BIT_BUF_SIZE;
}
STORE_BUFFER() STORE_BUFFER()
return TRUE; return TRUE;
@@ -493,7 +547,7 @@ encode_one_block_simd(working_state *state, JCOEFPTR block, int last_dc_val,
c_derived_tbl *dctbl, c_derived_tbl *actbl) c_derived_tbl *dctbl, c_derived_tbl *actbl)
{ {
JOCTET _buffer[BUFSIZE], *buffer; JOCTET _buffer[BUFSIZE], *buffer;
size_t bytes, bytestocopy; int localbuf = 0; int localbuf = 0;
LOAD_BUFFER() LOAD_BUFFER()
@@ -509,53 +563,41 @@ LOCAL(boolean)
encode_one_block(working_state *state, JCOEFPTR block, int last_dc_val, encode_one_block(working_state *state, JCOEFPTR block, int last_dc_val,
c_derived_tbl *dctbl, c_derived_tbl *actbl) c_derived_tbl *dctbl, c_derived_tbl *actbl)
{ {
int temp, temp2, temp3; int temp, nbits, free_bits;
int nbits; bit_buf_type put_buffer;
int r, code, size;
JOCTET _buffer[BUFSIZE], *buffer; JOCTET _buffer[BUFSIZE], *buffer;
size_t put_buffer; int put_bits; int localbuf = 0;
int code_0xf0 = actbl->ehufco[0xf0], size_0xf0 = actbl->ehufsi[0xf0];
size_t bytes, bytestocopy; int localbuf = 0;
put_buffer = state->cur.put_buffer; free_bits = state->cur.free_bits;
put_bits = state->cur.put_bits; put_buffer = state->cur.put_buffer.c;
LOAD_BUFFER() LOAD_BUFFER()
/* Encode the DC coefficient difference per section F.1.2.1 */ /* Encode the DC coefficient difference per section F.1.2.1 */
temp = temp2 = block[0] - last_dc_val; temp = block[0] - last_dc_val;
/* This is a well-known technique for obtaining the absolute value without a /* This is a well-known technique for obtaining the absolute value without a
* branch. It is derived from an assembly language technique presented in * branch. It is derived from an assembly language technique presented in
* "How to Optimize for the Pentium Processors", Copyright (c) 1996, 1997 by * "How to Optimize for the Pentium Processors", Copyright (c) 1996, 1997 by
* Agner Fog. * Agner Fog. This code assumes we are on a two's complement machine.
*/ */
temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); nbits = temp >> (CHAR_BIT * sizeof(int) - 1);
temp ^= temp3; temp += nbits;
temp -= temp3; nbits ^= temp;
/* For a negative input, want temp2 = bitwise complement of abs(input) */
/* This code assumes we are on a two's complement machine */
temp2 += temp3;
/* Find the number of bits needed for the magnitude of the coefficient */ /* Find the number of bits needed for the magnitude of the coefficient */
nbits = JPEG_NBITS(temp); nbits = JPEG_NBITS(nbits);
/* Emit the Huffman-coded symbol for the number of bits */ /* Emit the Huffman-coded symbol for the number of bits.
code = dctbl->ehufco[nbits]; * Emit that number of bits of the value, if positive,
size = dctbl->ehufsi[nbits]; * or the complement of its magnitude, if negative.
EMIT_BITS(code, size) */
PUT_CODE(dctbl->ehufco[nbits], dctbl->ehufsi[nbits])
/* Mask off any extra bits in code */
temp2 &= (((JLONG)1) << nbits) - 1;
/* Emit that number of bits of the value, if positive, */
/* or the complement of its magnitude, if negative. */
EMIT_BITS(temp2, nbits)
/* Encode the AC coefficients per section F.1.2.2 */ /* Encode the AC coefficients per section F.1.2.2 */
r = 0; /* r = run length of zeros */ {
int r = 0; /* r = run length of zeros */
/* Manually unroll the k loop to eliminate the counter variable. This /* Manually unroll the k loop to eliminate the counter variable. This
* improves performance greatly on systems with a limited number of * improves performance greatly on systems with a limited number of
@@ -563,51 +605,46 @@ encode_one_block(working_state *state, JCOEFPTR block, int last_dc_val,
*/ */
#define kloop(jpeg_natural_order_of_k) { \ #define kloop(jpeg_natural_order_of_k) { \
if ((temp = block[jpeg_natural_order_of_k]) == 0) { \ if ((temp = block[jpeg_natural_order_of_k]) == 0) { \
r++; \ r += 16; \
} else { \ } else { \
temp2 = temp; \
/* Branch-less absolute value, bitwise complement, etc., same as above */ \ /* Branch-less absolute value, bitwise complement, etc., same as above */ \
temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); \ nbits = temp >> (CHAR_BIT * sizeof(int) - 1); \
temp ^= temp3; \ temp += nbits; \
temp -= temp3; \ nbits ^= temp; \
temp2 += temp3; \ nbits = JPEG_NBITS_NONZERO(nbits); \
nbits = JPEG_NBITS_NONZERO(temp); \
/* if run length > 15, must emit special run-length-16 codes (0xF0) */ \ /* if run length > 15, must emit special run-length-16 codes (0xF0) */ \
while (r > 15) { \ while (r >= 16 * 16) { \
EMIT_BITS(code_0xf0, size_0xf0) \ r -= 16 * 16; \
r -= 16; \ PUT_BITS(actbl->ehufco[0xf0], actbl->ehufsi[0xf0]) \
} \ } \
/* Emit Huffman symbol for run length / number of bits */ \ /* Emit Huffman symbol for run length / number of bits */ \
temp3 = (r << 4) + nbits; \ r += nbits; \
code = actbl->ehufco[temp3]; \ PUT_CODE(actbl->ehufco[r], actbl->ehufsi[r]) \
size = actbl->ehufsi[temp3]; \
EMIT_CODE(code, size) \
r = 0; \ r = 0; \
} \ } \
} }
/* One iteration for each value in jpeg_natural_order[] */ /* One iteration for each value in jpeg_natural_order[] */
kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3); kloop(1); kloop(8); kloop(16); kloop(9); kloop(2); kloop(3);
kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18); kloop(10); kloop(17); kloop(24); kloop(32); kloop(25); kloop(18);
kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26); kloop(11); kloop(4); kloop(5); kloop(12); kloop(19); kloop(26);
kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27); kloop(33); kloop(40); kloop(48); kloop(41); kloop(34); kloop(27);
kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21); kloop(20); kloop(13); kloop(6); kloop(7); kloop(14); kloop(21);
kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57); kloop(28); kloop(35); kloop(42); kloop(49); kloop(56); kloop(57);
kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15); kloop(50); kloop(43); kloop(36); kloop(29); kloop(22); kloop(15);
kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58); kloop(23); kloop(30); kloop(37); kloop(44); kloop(51); kloop(58);
kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39); kloop(59); kloop(52); kloop(45); kloop(38); kloop(31); kloop(39);
kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47); kloop(46); kloop(53); kloop(60); kloop(61); kloop(54); kloop(47);
kloop(55); kloop(62); kloop(63); kloop(55); kloop(62); kloop(63);
/* If the last coef(s) were zero, emit an end-of-block code */ /* If the last coef(s) were zero, emit an end-of-block code */
if (r > 0) { if (r > 0) {
code = actbl->ehufco[0]; PUT_BITS(actbl->ehufco[0], actbl->ehufsi[0])
size = actbl->ehufsi[0]; }
EMIT_BITS(code, size)
} }
state->cur.put_buffer = put_buffer; state->cur.put_buffer.c = put_buffer;
state->cur.put_bits = put_bits; state->cur.free_bits = free_bits;
STORE_BUFFER() STORE_BUFFER()
return TRUE; return TRUE;
@@ -654,8 +691,9 @@ encode_mcu_huff(j_compress_ptr cinfo, JBLOCKROW *MCU_data)
/* Load up working state */ /* Load up working state */
state.next_output_byte = cinfo->dest->next_output_byte; state.next_output_byte = cinfo->dest->next_output_byte;
state.free_in_buffer = cinfo->dest->free_in_buffer; state.free_in_buffer = cinfo->dest->free_in_buffer;
ASSIGN_STATE(state.cur, entropy->saved); state.cur = entropy->saved;
state.cinfo = cinfo; state.cinfo = cinfo;
state.simd = entropy->simd;
/* Emit restart marker if needed */ /* Emit restart marker if needed */
if (cinfo->restart_interval) { if (cinfo->restart_interval) {
@@ -694,7 +732,7 @@ encode_mcu_huff(j_compress_ptr cinfo, JBLOCKROW *MCU_data)
/* Completed MCU, so update state */ /* Completed MCU, so update state */
cinfo->dest->next_output_byte = state.next_output_byte; cinfo->dest->next_output_byte = state.next_output_byte;
cinfo->dest->free_in_buffer = state.free_in_buffer; cinfo->dest->free_in_buffer = state.free_in_buffer;
ASSIGN_STATE(entropy->saved, state.cur); entropy->saved = state.cur;
/* Update restart-interval state too */ /* Update restart-interval state too */
if (cinfo->restart_interval) { if (cinfo->restart_interval) {
@@ -723,8 +761,9 @@ finish_pass_huff(j_compress_ptr cinfo)
/* Load up working state ... flush_bits needs it */ /* Load up working state ... flush_bits needs it */
state.next_output_byte = cinfo->dest->next_output_byte; state.next_output_byte = cinfo->dest->next_output_byte;
state.free_in_buffer = cinfo->dest->free_in_buffer; state.free_in_buffer = cinfo->dest->free_in_buffer;
ASSIGN_STATE(state.cur, entropy->saved); state.cur = entropy->saved;
state.cinfo = cinfo; state.cinfo = cinfo;
state.simd = entropy->simd;
/* Flush out the last data */ /* Flush out the last data */
if (!flush_bits(&state)) if (!flush_bits(&state))
@@ -733,7 +772,7 @@ finish_pass_huff(j_compress_ptr cinfo)
/* Update state */ /* Update state */
cinfo->dest->next_output_byte = state.next_output_byte; cinfo->dest->next_output_byte = state.next_output_byte;
cinfo->dest->free_in_buffer = state.free_in_buffer; cinfo->dest->free_in_buffer = state.free_in_buffer;
ASSIGN_STATE(entropy->saved, state.cur); entropy->saved = state.cur;
} }

View File

@@ -61,11 +61,6 @@
unsigned. */ unsigned. */
#cmakedefine RIGHT_SHIFT_IS_UNSIGNED 1 #cmakedefine RIGHT_SHIFT_IS_UNSIGNED 1
/* Define to 1 if type `char' is unsigned and you are not using gcc. */
#ifndef __CHAR_UNSIGNED__
#cmakedefine __CHAR_UNSIGNED__ 1
#endif
/* Define to empty if `const' does not conform to ANSI C. */ /* Define to empty if `const' does not conform to ANSI C. */
/* #undef const */ /* #undef const */

View File

@@ -42,12 +42,6 @@
*/ */
/* #define const */ /* #define const */
/* Define this if an ordinary "char" type is unsigned.
* If you're not sure, leaving it undefined will work at some cost in speed.
* If you defined HAVE_UNSIGNED_CHAR then the speed difference is minimal.
*/
#undef __CHAR_UNSIGNED__
/* Define this if your system has an ANSI-conforming <stddef.h> file. /* Define this if your system has an ANSI-conforming <stddef.h> file.
*/ */
#define HAVE_STDDEF_H #define HAVE_STDDEF_H
@@ -119,7 +113,6 @@ typedef unsigned char boolean;
#define BMP_SUPPORTED /* BMP image file format */ #define BMP_SUPPORTED /* BMP image file format */
#define GIF_SUPPORTED /* GIF image file format */ #define GIF_SUPPORTED /* GIF image file format */
#define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */ #define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */
#undef RLE_SUPPORTED /* Utah RLE image file format */
#define TARGA_SUPPORTED /* Targa image file format */ #define TARGA_SUPPORTED /* Targa image file format */
/* Define this if you want to name both input and output files on the command /* Define this if you want to name both input and output files on the command

View File

@@ -4,8 +4,9 @@
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1995-1997, Thomas G. Lane. * Copyright (C) 1995-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2011, 2015, 2018, D. R. Commander. * Copyright (C) 2011, 2015, 2018, 2021, D. R. Commander.
* Copyright (C) 2016, 2018, Matthieu Darbois. * Copyright (C) 2016, 2018, Matthieu Darbois.
* Copyright (C) 2020, Arm Limited.
* Copyright (C) 2014, Mozilla Corporation. * Copyright (C) 2014, Mozilla Corporation.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
@@ -52,15 +53,19 @@
* flags (this defines __thumb__). * flags (this defines __thumb__).
*/ */
/* NOTE: Both GCC and Clang define __GNUC__ */ #if defined(__arm__) || defined(__aarch64__) || defined(_M_ARM) || \
#if defined(__GNUC__) && (defined(__arm__) || defined(__aarch64__)) defined(_M_ARM64)
#if !defined(__thumb__) || defined(__thumb2__) #if !defined(__thumb__) || defined(__thumb2__)
#define USE_CLZ_INTRINSIC #define USE_CLZ_INTRINSIC
#endif #endif
#endif #endif
#ifdef USE_CLZ_INTRINSIC #ifdef USE_CLZ_INTRINSIC
#if defined(_MSC_VER) && !defined(__clang__)
#define JPEG_NBITS_NONZERO(x) (32 - _CountLeadingZeros(x))
#else
#define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x)) #define JPEG_NBITS_NONZERO(x) (32 - __builtin_clz(x))
#endif
#define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0) #define JPEG_NBITS(x) (x ? JPEG_NBITS_NONZERO(x) : 0)
#else #else
#include "jpeg_nbits_table.h" #include "jpeg_nbits_table.h"
@@ -136,9 +141,9 @@ typedef phuff_entropy_encoder *phuff_entropy_ptr;
#ifdef RIGHT_SHIFT_IS_UNSIGNED #ifdef RIGHT_SHIFT_IS_UNSIGNED
#define ISHIFT_TEMPS int ishift_temp; #define ISHIFT_TEMPS int ishift_temp;
#define IRIGHT_SHIFT(x,shft) \ #define IRIGHT_SHIFT(x,shft) \
((ishift_temp = (x)) < 0 ? \ ((ishift_temp = (x)) < 0 ? \
(ishift_temp >> (shft)) | ((~0) << (16-(shft))) : \ (ishift_temp >> (shft)) | ((~0) << (16-(shft))) : \
(ishift_temp >> (shft))) (ishift_temp >> (shft)))
#else #else
#define ISHIFT_TEMPS #define ISHIFT_TEMPS
#define IRIGHT_SHIFT(x,shft) ((x) >> (shft)) #define IRIGHT_SHIFT(x,shft) ((x) >> (shft))
@@ -148,19 +153,19 @@ typedef phuff_entropy_encoder *phuff_entropy_ptr;
/* Forward declarations */ /* Forward declarations */
METHODDEF(boolean) encode_mcu_DC_first (j_compress_ptr cinfo, METHODDEF(boolean) encode_mcu_DC_first (j_compress_ptr cinfo,
JBLOCKROW *MCU_data); JBLOCKROW *MCU_data);
METHODDEF(void) encode_mcu_AC_first_prepare METHODDEF(void) encode_mcu_AC_first_prepare
(const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al,
JCOEF *values, size_t *zerobits); JCOEF *values, size_t *zerobits);
METHODDEF(boolean) encode_mcu_AC_first (j_compress_ptr cinfo, METHODDEF(boolean) encode_mcu_AC_first (j_compress_ptr cinfo,
JBLOCKROW *MCU_data); JBLOCKROW *MCU_data);
METHODDEF(boolean) encode_mcu_DC_refine (j_compress_ptr cinfo, METHODDEF(boolean) encode_mcu_DC_refine (j_compress_ptr cinfo,
JBLOCKROW *MCU_data); JBLOCKROW *MCU_data);
METHODDEF(int) encode_mcu_AC_refine_prepare METHODDEF(int) encode_mcu_AC_refine_prepare
(const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al, (const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al,
JCOEF *absvalues, size_t *bits); JCOEF *absvalues, size_t *bits);
METHODDEF(boolean) encode_mcu_AC_refine (j_compress_ptr cinfo, METHODDEF(boolean) encode_mcu_AC_refine (j_compress_ptr cinfo,
JBLOCKROW *MCU_data); JBLOCKROW *MCU_data);
METHODDEF(void) finish_pass_phuff (j_compress_ptr cinfo); METHODDEF(void) finish_pass_phuff (j_compress_ptr cinfo);
METHODDEF(void) finish_pass_gather_phuff (j_compress_ptr cinfo); METHODDEF(void) finish_pass_gather_phuff (j_compress_ptr cinfo);
@@ -170,24 +175,26 @@ INLINE
METHODDEF(int) METHODDEF(int)
count_zeroes(size_t *x) count_zeroes(size_t *x)
{ {
int result;
#if defined(HAVE_BUILTIN_CTZL) #if defined(HAVE_BUILTIN_CTZL)
int result;
result = __builtin_ctzl(*x); result = __builtin_ctzl(*x);
*x >>= result; *x >>= result;
#elif defined(HAVE_BITSCANFORWARD64) #elif defined(HAVE_BITSCANFORWARD64)
unsigned long result;
_BitScanForward64(&result, *x); _BitScanForward64(&result, *x);
*x >>= result; *x >>= result;
#elif defined(HAVE_BITSCANFORWARD) #elif defined(HAVE_BITSCANFORWARD)
unsigned long result;
_BitScanForward(&result, *x); _BitScanForward(&result, *x);
*x >>= result; *x >>= result;
#else #else
result = 0; int result = 0;
while ((*x & 1) == 0) { while ((*x & 1) == 0) {
++result; ++result;
*x >>= 1; *x >>= 1;
} }
#endif #endif
return result; return (int)result;
} }
@@ -306,7 +313,7 @@ start_pass_phuff (j_compress_ptr cinfo, boolean gather_statistics)
/* Emit a byte */ /* Emit a byte */
#define emit_byte(entropy, val) { \ #define emit_byte(entropy, val) { \
*(entropy)->next_output_byte++ = (JOCTET)(val); \ *(entropy)->next_output_byte++ = (JOCTET)(val); \
if (--(entropy)->free_in_buffer == 0) \ if (--(entropy)->free_in_buffer == 0) \
dump_buffer(entropy); \ dump_buffer(entropy); \
} }
@@ -403,7 +410,7 @@ emit_symbol (phuff_entropy_ptr entropy, int tbl_no, int symbol)
LOCAL(void) LOCAL(void)
emit_buffered_bits (phuff_entropy_ptr entropy, char *bufstart, emit_buffered_bits (phuff_entropy_ptr entropy, char *bufstart,
unsigned int nbits) unsigned int nbits)
{ {
if (entropy->gather_statistics) if (entropy->gather_statistics)
return; /* no real work */ return; /* no real work */
@@ -524,7 +531,7 @@ encode_mcu_DC_first (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
temp3 = temp >> (CHAR_BIT * sizeof(int) - 1); temp3 = temp >> (CHAR_BIT * sizeof(int) - 1);
temp ^= temp3; temp ^= temp3;
temp -= temp3; /* temp is abs value of input */ temp -= temp3; /* temp is abs value of input */
/* For a negative input, want temp2 = bitwise complement of abs(input) */ /* For a negative input, want temp2 = bitwise complement of abs(input) */
temp2 = temp ^ temp3; temp2 = temp ^ temp3;
/* Find the number of bits needed for the magnitude of the coefficient */ /* Find the number of bits needed for the magnitude of the coefficient */
@@ -696,9 +703,9 @@ encode_mcu_AC_first (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
zerobits |= bits[1]; zerobits |= bits[1];
#endif #endif
/* Emit any pending EOBRUN */ /* Emit any pending EOBRUN */
if (zerobits && (entropy->EOBRUN > 0)) if (zerobits && (entropy->EOBRUN > 0))
emit_eobrun(entropy); emit_eobrun(entropy);
#if SIZEOF_SIZE_T == 4 #if SIZEOF_SIZE_T == 4
zerobits = bits[0]; zerobits = bits[0];
@@ -983,7 +990,7 @@ encode_mcu_AC_refine (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
r += idx; r += idx;
cabsvalue += idx; cabsvalue += idx;
goto first_iter_ac_refine; goto first_iter_ac_refine;
} }
ENCODE_COEFS_AC_REFINE(first_iter_ac_refine:); ENCODE_COEFS_AC_REFINE(first_iter_ac_refine:);
#endif #endif

View File

@@ -6,7 +6,7 @@
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2014, MIPS Technologies, Inc., California. * Copyright (C) 2014, MIPS Technologies, Inc., California.
* Copyright (C) 2015, D. R. Commander. * Copyright (C) 2015, 2019, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -103,7 +103,7 @@ expand_right_edge(JSAMPARRAY image_data, int num_rows, JDIMENSION input_cols,
if (numcols > 0) { if (numcols > 0) {
for (row = 0; row < num_rows; row++) { for (row = 0; row < num_rows; row++) {
ptr = image_data[row] + input_cols; ptr = image_data[row] + input_cols;
pixval = ptr[-1]; /* don't need GETJSAMPLE() here */ pixval = ptr[-1];
for (count = numcols; count > 0; count--) for (count = numcols; count > 0; count--)
*ptr++ = pixval; *ptr++ = pixval;
} }
@@ -174,7 +174,7 @@ int_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
for (v = 0; v < v_expand; v++) { for (v = 0; v < v_expand; v++) {
inptr = input_data[inrow + v] + outcol_h; inptr = input_data[inrow + v] + outcol_h;
for (h = 0; h < h_expand; h++) { for (h = 0; h < h_expand; h++) {
outvalue += (JLONG)GETJSAMPLE(*inptr++); outvalue += (JLONG)(*inptr++);
} }
} }
*outptr++ = (JSAMPLE)((outvalue + numpix2) / numpix); *outptr++ = (JSAMPLE)((outvalue + numpix2) / numpix);
@@ -237,8 +237,7 @@ h2v1_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
inptr = input_data[outrow]; inptr = input_data[outrow];
bias = 0; /* bias = 0,1,0,1,... for successive samples */ bias = 0; /* bias = 0,1,0,1,... for successive samples */
for (outcol = 0; outcol < output_cols; outcol++) { for (outcol = 0; outcol < output_cols; outcol++) {
*outptr++ = *outptr++ = (JSAMPLE)((inptr[0] + inptr[1] + bias) >> 1);
(JSAMPLE)((GETJSAMPLE(*inptr) + GETJSAMPLE(inptr[1]) + bias) >> 1);
bias ^= 1; /* 0=>1, 1=>0 */ bias ^= 1; /* 0=>1, 1=>0 */
inptr += 2; inptr += 2;
} }
@@ -277,8 +276,7 @@ h2v2_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
bias = 1; /* bias = 1,2,1,2,... for successive samples */ bias = 1; /* bias = 1,2,1,2,... for successive samples */
for (outcol = 0; outcol < output_cols; outcol++) { for (outcol = 0; outcol < output_cols; outcol++) {
*outptr++ = *outptr++ =
(JSAMPLE)((GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + (JSAMPLE)((inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1] + bias) >> 2);
GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]) + bias) >> 2);
bias ^= 3; /* 1=>2, 2=>1 */ bias ^= 3; /* 1=>2, 2=>1 */
inptr0 += 2; inptr1 += 2; inptr0 += 2; inptr1 += 2;
} }
@@ -337,33 +335,25 @@ h2v2_smooth_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
below_ptr = input_data[inrow + 2]; below_ptr = input_data[inrow + 2];
/* Special case for first column: pretend column -1 is same as column 0 */ /* Special case for first column: pretend column -1 is same as column 0 */
membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + membersum = inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1];
GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]); neighsum = above_ptr[0] + above_ptr[1] + below_ptr[0] + below_ptr[1] +
neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) + inptr0[0] + inptr0[2] + inptr1[0] + inptr1[2];
GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) +
GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[2]) +
GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[2]);
neighsum += neighsum; neighsum += neighsum;
neighsum += GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[2]) + neighsum += above_ptr[0] + above_ptr[2] + below_ptr[0] + below_ptr[2];
GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[2]);
membersum = membersum * memberscale + neighsum * neighscale; membersum = membersum * memberscale + neighsum * neighscale;
*outptr++ = (JSAMPLE)((membersum + 32768) >> 16); *outptr++ = (JSAMPLE)((membersum + 32768) >> 16);
inptr0 += 2; inptr1 += 2; above_ptr += 2; below_ptr += 2; inptr0 += 2; inptr1 += 2; above_ptr += 2; below_ptr += 2;
for (colctr = output_cols - 2; colctr > 0; colctr--) { for (colctr = output_cols - 2; colctr > 0; colctr--) {
/* sum of pixels directly mapped to this output element */ /* sum of pixels directly mapped to this output element */
membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + membersum = inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1];
GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]);
/* sum of edge-neighbor pixels */ /* sum of edge-neighbor pixels */
neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) + neighsum = above_ptr[0] + above_ptr[1] + below_ptr[0] + below_ptr[1] +
GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) + inptr0[-1] + inptr0[2] + inptr1[-1] + inptr1[2];
GETJSAMPLE(inptr0[-1]) + GETJSAMPLE(inptr0[2]) +
GETJSAMPLE(inptr1[-1]) + GETJSAMPLE(inptr1[2]);
/* The edge-neighbors count twice as much as corner-neighbors */ /* The edge-neighbors count twice as much as corner-neighbors */
neighsum += neighsum; neighsum += neighsum;
/* Add in the corner-neighbors */ /* Add in the corner-neighbors */
neighsum += GETJSAMPLE(above_ptr[-1]) + GETJSAMPLE(above_ptr[2]) + neighsum += above_ptr[-1] + above_ptr[2] + below_ptr[-1] + below_ptr[2];
GETJSAMPLE(below_ptr[-1]) + GETJSAMPLE(below_ptr[2]);
/* form final output scaled up by 2^16 */ /* form final output scaled up by 2^16 */
membersum = membersum * memberscale + neighsum * neighscale; membersum = membersum * memberscale + neighsum * neighscale;
/* round, descale and output it */ /* round, descale and output it */
@@ -372,15 +362,11 @@ h2v2_smooth_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
} }
/* Special case for last column */ /* Special case for last column */
membersum = GETJSAMPLE(*inptr0) + GETJSAMPLE(inptr0[1]) + membersum = inptr0[0] + inptr0[1] + inptr1[0] + inptr1[1];
GETJSAMPLE(*inptr1) + GETJSAMPLE(inptr1[1]); neighsum = above_ptr[0] + above_ptr[1] + below_ptr[0] + below_ptr[1] +
neighsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(above_ptr[1]) + inptr0[-1] + inptr0[1] + inptr1[-1] + inptr1[1];
GETJSAMPLE(*below_ptr) + GETJSAMPLE(below_ptr[1]) +
GETJSAMPLE(inptr0[-1]) + GETJSAMPLE(inptr0[1]) +
GETJSAMPLE(inptr1[-1]) + GETJSAMPLE(inptr1[1]);
neighsum += neighsum; neighsum += neighsum;
neighsum += GETJSAMPLE(above_ptr[-1]) + GETJSAMPLE(above_ptr[1]) + neighsum += above_ptr[-1] + above_ptr[1] + below_ptr[-1] + below_ptr[1];
GETJSAMPLE(below_ptr[-1]) + GETJSAMPLE(below_ptr[1]);
membersum = membersum * memberscale + neighsum * neighscale; membersum = membersum * memberscale + neighsum * neighscale;
*outptr = (JSAMPLE)((membersum + 32768) >> 16); *outptr = (JSAMPLE)((membersum + 32768) >> 16);
@@ -429,21 +415,18 @@ fullsize_smooth_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
below_ptr = input_data[outrow + 1]; below_ptr = input_data[outrow + 1];
/* Special case for first column */ /* Special case for first column */
colsum = GETJSAMPLE(*above_ptr++) + GETJSAMPLE(*below_ptr++) + colsum = (*above_ptr++) + (*below_ptr++) + inptr[0];
GETJSAMPLE(*inptr); membersum = *inptr++;
membersum = GETJSAMPLE(*inptr++); nextcolsum = above_ptr[0] + below_ptr[0] + inptr[0];
nextcolsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(*below_ptr) +
GETJSAMPLE(*inptr);
neighsum = colsum + (colsum - membersum) + nextcolsum; neighsum = colsum + (colsum - membersum) + nextcolsum;
membersum = membersum * memberscale + neighsum * neighscale; membersum = membersum * memberscale + neighsum * neighscale;
*outptr++ = (JSAMPLE)((membersum + 32768) >> 16); *outptr++ = (JSAMPLE)((membersum + 32768) >> 16);
lastcolsum = colsum; colsum = nextcolsum; lastcolsum = colsum; colsum = nextcolsum;
for (colctr = output_cols - 2; colctr > 0; colctr--) { for (colctr = output_cols - 2; colctr > 0; colctr--) {
membersum = GETJSAMPLE(*inptr++); membersum = *inptr++;
above_ptr++; below_ptr++; above_ptr++; below_ptr++;
nextcolsum = GETJSAMPLE(*above_ptr) + GETJSAMPLE(*below_ptr) + nextcolsum = above_ptr[0] + below_ptr[0] + inptr[0];
GETJSAMPLE(*inptr);
neighsum = lastcolsum + (colsum - membersum) + nextcolsum; neighsum = lastcolsum + (colsum - membersum) + nextcolsum;
membersum = membersum * memberscale + neighsum * neighscale; membersum = membersum * memberscale + neighsum * neighscale;
*outptr++ = (JSAMPLE)((membersum + 32768) >> 16); *outptr++ = (JSAMPLE)((membersum + 32768) >> 16);
@@ -451,7 +434,7 @@ fullsize_smooth_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
} }
/* Special case for last column */ /* Special case for last column */
membersum = GETJSAMPLE(*inptr); membersum = *inptr;
neighsum = lastcolsum + (colsum - membersum) + colsum; neighsum = lastcolsum + (colsum - membersum) + colsum;
membersum = membersum * memberscale + neighsum * neighscale; membersum = membersum * memberscale + neighsum * neighscale;
*outptr = (JSAMPLE)((membersum + 32768) >> 16); *outptr = (JSAMPLE)((membersum + 32768) >> 16);

View File

@@ -4,7 +4,7 @@
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1994-1996, Thomas G. Lane. * Copyright (C) 1994-1996, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2010, 2015-2018, 2020, D. R. Commander. * Copyright (C) 2010, 2015-2020, D. R. Commander.
* Copyright (C) 2015, Google, Inc. * Copyright (C) 2015, Google, Inc.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
@@ -319,6 +319,8 @@ read_and_discard_scanlines(j_decompress_ptr cinfo, JDIMENSION num_lines)
{ {
JDIMENSION n; JDIMENSION n;
my_master_ptr master = (my_master_ptr)cinfo->master; my_master_ptr master = (my_master_ptr)cinfo->master;
JSAMPLE dummy_sample[1] = { 0 };
JSAMPROW dummy_row = dummy_sample;
JSAMPARRAY scanlines = NULL; JSAMPARRAY scanlines = NULL;
void (*color_convert) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf, void (*color_convert) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
JDIMENSION input_row, JSAMPARRAY output_buf, JDIMENSION input_row, JSAMPARRAY output_buf,
@@ -329,6 +331,10 @@ read_and_discard_scanlines(j_decompress_ptr cinfo, JDIMENSION num_lines)
if (cinfo->cconvert && cinfo->cconvert->color_convert) { if (cinfo->cconvert && cinfo->cconvert->color_convert) {
color_convert = cinfo->cconvert->color_convert; color_convert = cinfo->cconvert->color_convert;
cinfo->cconvert->color_convert = noop_convert; cinfo->cconvert->color_convert = noop_convert;
/* This just prevents UBSan from complaining about adding 0 to a NULL
* pointer. The pointer isn't actually used.
*/
scanlines = &dummy_row;
} }
if (cinfo->cquantize && cinfo->cquantize->color_quantize) { if (cinfo->cquantize && cinfo->cquantize->color_quantize) {
@@ -532,6 +538,8 @@ jpeg_skip_scanlines(j_decompress_ptr cinfo, JDIMENSION num_lines)
* decoded coefficients. This is ~5% faster for large subsets, but * decoded coefficients. This is ~5% faster for large subsets, but
* it's tough to tell a difference for smaller images. * it's tough to tell a difference for smaller images.
*/ */
if (!cinfo->entropy->insufficient_data)
cinfo->master->last_good_iMCU_row = cinfo->input_iMCU_row;
(*cinfo->entropy->decode_mcu) (cinfo, NULL); (*cinfo->entropy->decode_mcu) (cinfo, NULL);
} }
} }

View File

@@ -4,7 +4,7 @@
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Developed 1997-2015 by Guido Vollbeding. * Developed 1997-2015 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2015-2018, D. R. Commander. * Copyright (C) 2015-2020, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -80,7 +80,7 @@ get_byte(j_decompress_ptr cinfo)
if (!(*src->fill_input_buffer) (cinfo)) if (!(*src->fill_input_buffer) (cinfo))
ERREXIT(cinfo, JERR_CANT_SUSPEND); ERREXIT(cinfo, JERR_CANT_SUSPEND);
src->bytes_in_buffer--; src->bytes_in_buffer--;
return GETJOCTET(*src->next_input_byte++); return *src->next_input_byte++;
} }
@@ -665,8 +665,16 @@ bad:
for (ci = 0; ci < cinfo->comps_in_scan; ci++) { for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
int coefi, cindex = cinfo->cur_comp_info[ci]->component_index; int coefi, cindex = cinfo->cur_comp_info[ci]->component_index;
int *coef_bit_ptr = &cinfo->coef_bits[cindex][0]; int *coef_bit_ptr = &cinfo->coef_bits[cindex][0];
int *prev_coef_bit_ptr =
&cinfo->coef_bits[cindex + cinfo->num_components][0];
if (cinfo->Ss && coef_bit_ptr[0] < 0) /* AC without prior DC scan */ if (cinfo->Ss && coef_bit_ptr[0] < 0) /* AC without prior DC scan */
WARNMS2(cinfo, JWRN_BOGUS_PROGRESSION, cindex, 0); WARNMS2(cinfo, JWRN_BOGUS_PROGRESSION, cindex, 0);
for (coefi = MIN(cinfo->Ss, 1); coefi <= MAX(cinfo->Se, 9); coefi++) {
if (cinfo->input_scan_number > 1)
prev_coef_bit_ptr[coefi] = coef_bit_ptr[coefi];
else
prev_coef_bit_ptr[coefi] = 0;
}
for (coefi = cinfo->Ss; coefi <= cinfo->Se; coefi++) { for (coefi = cinfo->Ss; coefi <= cinfo->Se; coefi++) {
int expected = (coef_bit_ptr[coefi] < 0) ? 0 : coef_bit_ptr[coefi]; int expected = (coef_bit_ptr[coefi] < 0) ? 0 : coef_bit_ptr[coefi];
if (cinfo->Ah != expected) if (cinfo->Ah != expected)
@@ -727,6 +735,7 @@ bad:
entropy->c = 0; entropy->c = 0;
entropy->a = 0; entropy->a = 0;
entropy->ct = -16; /* force reading 2 initial bytes to fill C */ entropy->ct = -16; /* force reading 2 initial bytes to fill C */
entropy->pub.insufficient_data = FALSE;
/* Initialize restart counter */ /* Initialize restart counter */
entropy->restarts_to_go = cinfo->restart_interval; entropy->restarts_to_go = cinfo->restart_interval;
@@ -763,7 +772,7 @@ jinit_arith_decoder(j_decompress_ptr cinfo)
int *coef_bit_ptr, ci; int *coef_bit_ptr, ci;
cinfo->coef_bits = (int (*)[DCTSIZE2]) cinfo->coef_bits = (int (*)[DCTSIZE2])
(*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE,
cinfo->num_components * DCTSIZE2 * cinfo->num_components * 2 * DCTSIZE2 *
sizeof(int)); sizeof(int));
coef_bit_ptr = &cinfo->coef_bits[0][0]; coef_bit_ptr = &cinfo->coef_bits[0][0];
for (ci = 0; ci < cinfo->num_components; ci++) for (ci = 0; ci < cinfo->num_components; ci++)

View File

@@ -5,7 +5,7 @@
* Copyright (C) 1994-1997, Thomas G. Lane. * Copyright (C) 1994-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2010, 2015-2016, D. R. Commander. * Copyright (C) 2010, 2015-2016, 2019-2020, D. R. Commander.
* Copyright (C) 2015, 2020, Google, Inc. * Copyright (C) 2015, 2020, Google, Inc.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
@@ -102,6 +102,8 @@ decompress_onepass(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
/* Try to fetch an MCU. Entropy decoder expects buffer to be zeroed. */ /* Try to fetch an MCU. Entropy decoder expects buffer to be zeroed. */
jzero_far((void *)coef->MCU_buffer[0], jzero_far((void *)coef->MCU_buffer[0],
(size_t)(cinfo->blocks_in_MCU * sizeof(JBLOCK))); (size_t)(cinfo->blocks_in_MCU * sizeof(JBLOCK)));
if (!cinfo->entropy->insufficient_data)
cinfo->master->last_good_iMCU_row = cinfo->input_iMCU_row;
if (!(*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) { if (!(*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
/* Suspension forced; update state counters and exit */ /* Suspension forced; update state counters and exit */
coef->MCU_vert_offset = yoffset; coef->MCU_vert_offset = yoffset;
@@ -227,6 +229,8 @@ consume_data(j_decompress_ptr cinfo)
} }
} }
} }
if (!cinfo->entropy->insufficient_data)
cinfo->master->last_good_iMCU_row = cinfo->input_iMCU_row;
/* Try to fetch the MCU. */ /* Try to fetch the MCU. */
if (!(*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) { if (!(*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
/* Suspension forced; update state counters and exit */ /* Suspension forced; update state counters and exit */
@@ -326,19 +330,22 @@ decompress_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
#ifdef BLOCK_SMOOTHING_SUPPORTED #ifdef BLOCK_SMOOTHING_SUPPORTED
/* /*
* This code applies interblock smoothing as described by section K.8 * This code applies interblock smoothing; the first 9 AC coefficients are
* of the JPEG standard: the first 5 AC coefficients are estimated from * estimated from the DC values of a DCT block and its 24 neighboring blocks.
* the DC values of a DCT block and its 8 neighboring blocks.
* We apply smoothing only for progressive JPEG decoding, and only if * We apply smoothing only for progressive JPEG decoding, and only if
* the coefficients it can estimate are not yet known to full precision. * the coefficients it can estimate are not yet known to full precision.
*/ */
/* Natural-order array positions of the first 5 zigzag-order coefficients */ /* Natural-order array positions of the first 9 zigzag-order coefficients */
#define Q01_POS 1 #define Q01_POS 1
#define Q10_POS 8 #define Q10_POS 8
#define Q20_POS 16 #define Q20_POS 16
#define Q11_POS 9 #define Q11_POS 9
#define Q02_POS 2 #define Q02_POS 2
#define Q03_POS 3
#define Q12_POS 10
#define Q21_POS 17
#define Q30_POS 24
/* /*
* Determine whether block smoothing is applicable and safe. * Determine whether block smoothing is applicable and safe.
@@ -356,8 +363,8 @@ smoothing_ok(j_decompress_ptr cinfo)
int ci, coefi; int ci, coefi;
jpeg_component_info *compptr; jpeg_component_info *compptr;
JQUANT_TBL *qtable; JQUANT_TBL *qtable;
int *coef_bits; int *coef_bits, *prev_coef_bits;
int *coef_bits_latch; int *coef_bits_latch, *prev_coef_bits_latch;
if (!cinfo->progressive_mode || cinfo->coef_bits == NULL) if (!cinfo->progressive_mode || cinfo->coef_bits == NULL)
return FALSE; return FALSE;
@@ -366,34 +373,47 @@ smoothing_ok(j_decompress_ptr cinfo)
if (coef->coef_bits_latch == NULL) if (coef->coef_bits_latch == NULL)
coef->coef_bits_latch = (int *) coef->coef_bits_latch = (int *)
(*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE,
cinfo->num_components * cinfo->num_components * 2 *
(SAVED_COEFS * sizeof(int))); (SAVED_COEFS * sizeof(int)));
coef_bits_latch = coef->coef_bits_latch; coef_bits_latch = coef->coef_bits_latch;
prev_coef_bits_latch =
&coef->coef_bits_latch[cinfo->num_components * SAVED_COEFS];
for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components; for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
ci++, compptr++) { ci++, compptr++) {
/* All components' quantization values must already be latched. */ /* All components' quantization values must already be latched. */
if ((qtable = compptr->quant_table) == NULL) if ((qtable = compptr->quant_table) == NULL)
return FALSE; return FALSE;
/* Verify DC & first 5 AC quantizers are nonzero to avoid zero-divide. */ /* Verify DC & first 9 AC quantizers are nonzero to avoid zero-divide. */
if (qtable->quantval[0] == 0 || if (qtable->quantval[0] == 0 ||
qtable->quantval[Q01_POS] == 0 || qtable->quantval[Q01_POS] == 0 ||
qtable->quantval[Q10_POS] == 0 || qtable->quantval[Q10_POS] == 0 ||
qtable->quantval[Q20_POS] == 0 || qtable->quantval[Q20_POS] == 0 ||
qtable->quantval[Q11_POS] == 0 || qtable->quantval[Q11_POS] == 0 ||
qtable->quantval[Q02_POS] == 0) qtable->quantval[Q02_POS] == 0 ||
qtable->quantval[Q03_POS] == 0 ||
qtable->quantval[Q12_POS] == 0 ||
qtable->quantval[Q21_POS] == 0 ||
qtable->quantval[Q30_POS] == 0)
return FALSE; return FALSE;
/* DC values must be at least partly known for all components. */ /* DC values must be at least partly known for all components. */
coef_bits = cinfo->coef_bits[ci]; coef_bits = cinfo->coef_bits[ci];
prev_coef_bits = cinfo->coef_bits[ci + cinfo->num_components];
if (coef_bits[0] < 0) if (coef_bits[0] < 0)
return FALSE; return FALSE;
coef_bits_latch[0] = coef_bits[0];
/* Block smoothing is helpful if some AC coefficients remain inaccurate. */ /* Block smoothing is helpful if some AC coefficients remain inaccurate. */
for (coefi = 1; coefi <= 5; coefi++) { for (coefi = 1; coefi < SAVED_COEFS; coefi++) {
if (cinfo->input_scan_number > 1)
prev_coef_bits_latch[coefi] = prev_coef_bits[coefi];
else
prev_coef_bits_latch[coefi] = -1;
coef_bits_latch[coefi] = coef_bits[coefi]; coef_bits_latch[coefi] = coef_bits[coefi];
if (coef_bits[coefi] != 0) if (coef_bits[coefi] != 0)
smoothing_useful = TRUE; smoothing_useful = TRUE;
} }
coef_bits_latch += SAVED_COEFS; coef_bits_latch += SAVED_COEFS;
prev_coef_bits_latch += SAVED_COEFS;
} }
return smoothing_useful; return smoothing_useful;
@@ -412,17 +432,20 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
JDIMENSION block_num, last_block_column; JDIMENSION block_num, last_block_column;
int ci, block_row, block_rows, access_rows; int ci, block_row, block_rows, access_rows;
JBLOCKARRAY buffer; JBLOCKARRAY buffer;
JBLOCKROW buffer_ptr, prev_block_row, next_block_row; JBLOCKROW buffer_ptr, prev_prev_block_row, prev_block_row;
JBLOCKROW next_block_row, next_next_block_row;
JSAMPARRAY output_ptr; JSAMPARRAY output_ptr;
JDIMENSION output_col; JDIMENSION output_col;
jpeg_component_info *compptr; jpeg_component_info *compptr;
inverse_DCT_method_ptr inverse_DCT; inverse_DCT_method_ptr inverse_DCT;
boolean first_row, last_row; boolean change_dc;
JCOEF *workspace; JCOEF *workspace;
int *coef_bits; int *coef_bits;
JQUANT_TBL *quanttbl; JQUANT_TBL *quanttbl;
JLONG Q00, Q01, Q02, Q10, Q11, Q20, num; JLONG Q00, Q01, Q02, Q03 = 0, Q10, Q11, Q12 = 0, Q20, Q21 = 0, Q30 = 0, num;
int DC1, DC2, DC3, DC4, DC5, DC6, DC7, DC8, DC9; int DC01, DC02, DC03, DC04, DC05, DC06, DC07, DC08, DC09, DC10, DC11, DC12,
DC13, DC14, DC15, DC16, DC17, DC18, DC19, DC20, DC21, DC22, DC23, DC24,
DC25;
int Al, pred; int Al, pred;
/* Keep a local variable to avoid looking it up more than once */ /* Keep a local variable to avoid looking it up more than once */
@@ -434,10 +457,10 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
if (cinfo->input_scan_number == cinfo->output_scan_number) { if (cinfo->input_scan_number == cinfo->output_scan_number) {
/* If input is working on current scan, we ordinarily want it to /* If input is working on current scan, we ordinarily want it to
* have completed the current row. But if input scan is DC, * have completed the current row. But if input scan is DC,
* we want it to keep one row ahead so that next block row's DC * we want it to keep two rows ahead so that next two block rows' DC
* values are up to date. * values are up to date.
*/ */
JDIMENSION delta = (cinfo->Ss == 0) ? 1 : 0; JDIMENSION delta = (cinfo->Ss == 0) ? 2 : 0;
if (cinfo->input_iMCU_row > cinfo->output_iMCU_row + delta) if (cinfo->input_iMCU_row > cinfo->output_iMCU_row + delta)
break; break;
} }
@@ -452,34 +475,53 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
if (!compptr->component_needed) if (!compptr->component_needed)
continue; continue;
/* Count non-dummy DCT block rows in this iMCU row. */ /* Count non-dummy DCT block rows in this iMCU row. */
if (cinfo->output_iMCU_row < last_iMCU_row) { if (cinfo->output_iMCU_row < last_iMCU_row - 1) {
block_rows = compptr->v_samp_factor;
access_rows = block_rows * 3; /* this and next two iMCU rows */
} else if (cinfo->output_iMCU_row < last_iMCU_row) {
block_rows = compptr->v_samp_factor; block_rows = compptr->v_samp_factor;
access_rows = block_rows * 2; /* this and next iMCU row */ access_rows = block_rows * 2; /* this and next iMCU row */
last_row = FALSE;
} else { } else {
/* NB: can't use last_row_height here; it is input-side-dependent! */ /* NB: can't use last_row_height here; it is input-side-dependent! */
block_rows = (int)(compptr->height_in_blocks % compptr->v_samp_factor); block_rows = (int)(compptr->height_in_blocks % compptr->v_samp_factor);
if (block_rows == 0) block_rows = compptr->v_samp_factor; if (block_rows == 0) block_rows = compptr->v_samp_factor;
access_rows = block_rows; /* this iMCU row only */ access_rows = block_rows; /* this iMCU row only */
last_row = TRUE;
} }
/* Align the virtual buffer for this component. */ /* Align the virtual buffer for this component. */
if (cinfo->output_iMCU_row > 0) { if (cinfo->output_iMCU_row > 1) {
access_rows += compptr->v_samp_factor; /* prior iMCU row too */ access_rows += 2 * compptr->v_samp_factor; /* prior two iMCU rows too */
buffer = (*cinfo->mem->access_virt_barray)
((j_common_ptr)cinfo, coef->whole_image[ci],
(cinfo->output_iMCU_row - 2) * compptr->v_samp_factor,
(JDIMENSION)access_rows, FALSE);
buffer += 2 * compptr->v_samp_factor; /* point to current iMCU row */
} else if (cinfo->output_iMCU_row > 0) {
buffer = (*cinfo->mem->access_virt_barray) buffer = (*cinfo->mem->access_virt_barray)
((j_common_ptr)cinfo, coef->whole_image[ci], ((j_common_ptr)cinfo, coef->whole_image[ci],
(cinfo->output_iMCU_row - 1) * compptr->v_samp_factor, (cinfo->output_iMCU_row - 1) * compptr->v_samp_factor,
(JDIMENSION)access_rows, FALSE); (JDIMENSION)access_rows, FALSE);
buffer += compptr->v_samp_factor; /* point to current iMCU row */ buffer += compptr->v_samp_factor; /* point to current iMCU row */
first_row = FALSE;
} else { } else {
buffer = (*cinfo->mem->access_virt_barray) buffer = (*cinfo->mem->access_virt_barray)
((j_common_ptr)cinfo, coef->whole_image[ci], ((j_common_ptr)cinfo, coef->whole_image[ci],
(JDIMENSION)0, (JDIMENSION)access_rows, FALSE); (JDIMENSION)0, (JDIMENSION)access_rows, FALSE);
first_row = TRUE;
} }
/* Fetch component-dependent info */ /* Fetch component-dependent info.
coef_bits = coef->coef_bits_latch + (ci * SAVED_COEFS); * If the current scan is incomplete, then we use the component-dependent
* info from the previous scan.
*/
if (cinfo->output_iMCU_row > cinfo->master->last_good_iMCU_row)
coef_bits =
coef->coef_bits_latch + ((ci + cinfo->num_components) * SAVED_COEFS);
else
coef_bits = coef->coef_bits_latch + (ci * SAVED_COEFS);
/* We only do DC interpolation if no AC coefficient data is available. */
change_dc =
coef_bits[1] == -1 && coef_bits[2] == -1 && coef_bits[3] == -1 &&
coef_bits[4] == -1 && coef_bits[5] == -1 && coef_bits[6] == -1 &&
coef_bits[7] == -1 && coef_bits[8] == -1 && coef_bits[9] == -1;
quanttbl = compptr->quant_table; quanttbl = compptr->quant_table;
Q00 = quanttbl->quantval[0]; Q00 = quanttbl->quantval[0];
Q01 = quanttbl->quantval[Q01_POS]; Q01 = quanttbl->quantval[Q01_POS];
@@ -487,27 +529,51 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
Q20 = quanttbl->quantval[Q20_POS]; Q20 = quanttbl->quantval[Q20_POS];
Q11 = quanttbl->quantval[Q11_POS]; Q11 = quanttbl->quantval[Q11_POS];
Q02 = quanttbl->quantval[Q02_POS]; Q02 = quanttbl->quantval[Q02_POS];
if (change_dc) {
Q03 = quanttbl->quantval[Q03_POS];
Q12 = quanttbl->quantval[Q12_POS];
Q21 = quanttbl->quantval[Q21_POS];
Q30 = quanttbl->quantval[Q30_POS];
}
inverse_DCT = cinfo->idct->inverse_DCT[ci]; inverse_DCT = cinfo->idct->inverse_DCT[ci];
output_ptr = output_buf[ci]; output_ptr = output_buf[ci];
/* Loop over all DCT blocks to be processed. */ /* Loop over all DCT blocks to be processed. */
for (block_row = 0; block_row < block_rows; block_row++) { for (block_row = 0; block_row < block_rows; block_row++) {
buffer_ptr = buffer[block_row] + cinfo->master->first_MCU_col[ci]; buffer_ptr = buffer[block_row] + cinfo->master->first_MCU_col[ci];
if (first_row && block_row == 0)
if (block_row > 0 || cinfo->output_iMCU_row > 0)
prev_block_row =
buffer[block_row - 1] + cinfo->master->first_MCU_col[ci];
else
prev_block_row = buffer_ptr; prev_block_row = buffer_ptr;
if (block_row > 1 || cinfo->output_iMCU_row > 1)
prev_prev_block_row =
buffer[block_row - 2] + cinfo->master->first_MCU_col[ci];
else
prev_prev_block_row = prev_block_row;
if (block_row < block_rows - 1 || cinfo->output_iMCU_row < last_iMCU_row)
next_block_row =
buffer[block_row + 1] + cinfo->master->first_MCU_col[ci];
else else
prev_block_row = buffer[block_row - 1] +
cinfo->master->first_MCU_col[ci];
if (last_row && block_row == block_rows - 1)
next_block_row = buffer_ptr; next_block_row = buffer_ptr;
if (block_row < block_rows - 2 ||
cinfo->output_iMCU_row < last_iMCU_row - 1)
next_next_block_row =
buffer[block_row + 2] + cinfo->master->first_MCU_col[ci];
else else
next_block_row = buffer[block_row + 1] + next_next_block_row = next_block_row;
cinfo->master->first_MCU_col[ci];
/* We fetch the surrounding DC values using a sliding-register approach. /* We fetch the surrounding DC values using a sliding-register approach.
* Initialize all nine here so as to do the right thing on narrow pics. * Initialize all 25 here so as to do the right thing on narrow pics.
*/ */
DC1 = DC2 = DC3 = (int)prev_block_row[0][0]; DC01 = DC02 = DC03 = DC04 = DC05 = (int)prev_prev_block_row[0][0];
DC4 = DC5 = DC6 = (int)buffer_ptr[0][0]; DC06 = DC07 = DC08 = DC09 = DC10 = (int)prev_block_row[0][0];
DC7 = DC8 = DC9 = (int)next_block_row[0][0]; DC11 = DC12 = DC13 = DC14 = DC15 = (int)buffer_ptr[0][0];
DC16 = DC17 = DC18 = DC19 = DC20 = (int)next_block_row[0][0];
DC21 = DC22 = DC23 = DC24 = DC25 = (int)next_next_block_row[0][0];
output_col = 0; output_col = 0;
last_block_column = compptr->width_in_blocks - 1; last_block_column = compptr->width_in_blocks - 1;
for (block_num = cinfo->master->first_MCU_col[ci]; for (block_num = cinfo->master->first_MCU_col[ci];
@@ -515,18 +581,39 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
/* Fetch current DCT block into workspace so we can modify it. */ /* Fetch current DCT block into workspace so we can modify it. */
jcopy_block_row(buffer_ptr, (JBLOCKROW)workspace, (JDIMENSION)1); jcopy_block_row(buffer_ptr, (JBLOCKROW)workspace, (JDIMENSION)1);
/* Update DC values */ /* Update DC values */
if (block_num < last_block_column) { if (block_num == cinfo->master->first_MCU_col[ci] &&
DC3 = (int)prev_block_row[1][0]; block_num < last_block_column) {
DC6 = (int)buffer_ptr[1][0]; DC04 = (int)prev_prev_block_row[1][0];
DC9 = (int)next_block_row[1][0]; DC09 = (int)prev_block_row[1][0];
DC14 = (int)buffer_ptr[1][0];
DC19 = (int)next_block_row[1][0];
DC24 = (int)next_next_block_row[1][0];
} }
/* Compute coefficient estimates per K.8. if (block_num + 1 < last_block_column) {
* An estimate is applied only if coefficient is still zero, DC05 = (int)prev_prev_block_row[2][0];
* and is not known to be fully accurate. DC10 = (int)prev_block_row[2][0];
DC15 = (int)buffer_ptr[2][0];
DC20 = (int)next_block_row[2][0];
DC25 = (int)next_next_block_row[2][0];
}
/* If DC interpolation is enabled, compute coefficient estimates using
* a Gaussian-like kernel, keeping the averages of the DC values.
*
* If DC interpolation is disabled, compute coefficient estimates using
* an algorithm similar to the one described in Section K.8 of the JPEG
* standard, except applied to a 5x5 window rather than a 3x3 window.
*
* An estimate is applied only if the coefficient is still zero and is
* not known to be fully accurate.
*/ */
/* AC01 */ /* AC01 */
if ((Al = coef_bits[1]) != 0 && workspace[1] == 0) { if ((Al = coef_bits[1]) != 0 && workspace[1] == 0) {
num = 36 * Q00 * (DC4 - DC6); num = Q00 * (change_dc ?
(-DC01 - DC02 + DC04 + DC05 - 3 * DC06 + 13 * DC07 -
13 * DC09 + 3 * DC10 - 3 * DC11 + 38 * DC12 - 38 * DC14 +
3 * DC15 - 3 * DC16 + 13 * DC17 - 13 * DC19 + 3 * DC20 -
DC21 - DC22 + DC24 + DC25) :
(-7 * DC11 + 50 * DC12 - 50 * DC14 + 7 * DC15));
if (num >= 0) { if (num >= 0) {
pred = (int)(((Q01 << 7) + num) / (Q01 << 8)); pred = (int)(((Q01 << 7) + num) / (Q01 << 8));
if (Al > 0 && pred >= (1 << Al)) if (Al > 0 && pred >= (1 << Al))
@@ -541,7 +628,12 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
} }
/* AC10 */ /* AC10 */
if ((Al = coef_bits[2]) != 0 && workspace[8] == 0) { if ((Al = coef_bits[2]) != 0 && workspace[8] == 0) {
num = 36 * Q00 * (DC2 - DC8); num = Q00 * (change_dc ?
(-DC01 - 3 * DC02 - 3 * DC03 - 3 * DC04 - DC05 - DC06 +
13 * DC07 + 38 * DC08 + 13 * DC09 - DC10 + DC16 -
13 * DC17 - 38 * DC18 - 13 * DC19 + DC20 + DC21 +
3 * DC22 + 3 * DC23 + 3 * DC24 + DC25) :
(-7 * DC03 + 50 * DC08 - 50 * DC18 + 7 * DC23));
if (num >= 0) { if (num >= 0) {
pred = (int)(((Q10 << 7) + num) / (Q10 << 8)); pred = (int)(((Q10 << 7) + num) / (Q10 << 8));
if (Al > 0 && pred >= (1 << Al)) if (Al > 0 && pred >= (1 << Al))
@@ -556,7 +648,10 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
} }
/* AC20 */ /* AC20 */
if ((Al = coef_bits[3]) != 0 && workspace[16] == 0) { if ((Al = coef_bits[3]) != 0 && workspace[16] == 0) {
num = 9 * Q00 * (DC2 + DC8 - 2 * DC5); num = Q00 * (change_dc ?
(DC03 + 2 * DC07 + 7 * DC08 + 2 * DC09 - 5 * DC12 - 14 * DC13 -
5 * DC14 + 2 * DC17 + 7 * DC18 + 2 * DC19 + DC23) :
(-DC03 + 13 * DC08 - 24 * DC13 + 13 * DC18 - DC23));
if (num >= 0) { if (num >= 0) {
pred = (int)(((Q20 << 7) + num) / (Q20 << 8)); pred = (int)(((Q20 << 7) + num) / (Q20 << 8));
if (Al > 0 && pred >= (1 << Al)) if (Al > 0 && pred >= (1 << Al))
@@ -571,7 +666,11 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
} }
/* AC11 */ /* AC11 */
if ((Al = coef_bits[4]) != 0 && workspace[9] == 0) { if ((Al = coef_bits[4]) != 0 && workspace[9] == 0) {
num = 5 * Q00 * (DC1 - DC3 - DC7 + DC9); num = Q00 * (change_dc ?
(-DC01 + DC05 + 9 * DC07 - 9 * DC09 - 9 * DC17 +
9 * DC19 + DC21 - DC25) :
(DC10 + DC16 - 10 * DC17 + 10 * DC19 - DC02 - DC20 + DC22 -
DC24 + DC04 - DC06 + 10 * DC07 - 10 * DC09));
if (num >= 0) { if (num >= 0) {
pred = (int)(((Q11 << 7) + num) / (Q11 << 8)); pred = (int)(((Q11 << 7) + num) / (Q11 << 8));
if (Al > 0 && pred >= (1 << Al)) if (Al > 0 && pred >= (1 << Al))
@@ -586,7 +685,10 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
} }
/* AC02 */ /* AC02 */
if ((Al = coef_bits[5]) != 0 && workspace[2] == 0) { if ((Al = coef_bits[5]) != 0 && workspace[2] == 0) {
num = 9 * Q00 * (DC4 + DC6 - 2 * DC5); num = Q00 * (change_dc ?
(2 * DC07 - 5 * DC08 + 2 * DC09 + DC11 + 7 * DC12 - 14 * DC13 +
7 * DC14 + DC15 + 2 * DC17 - 5 * DC18 + 2 * DC19) :
(-DC11 + 13 * DC12 - 24 * DC13 + 13 * DC14 - DC15));
if (num >= 0) { if (num >= 0) {
pred = (int)(((Q02 << 7) + num) / (Q02 << 8)); pred = (int)(((Q02 << 7) + num) / (Q02 << 8));
if (Al > 0 && pred >= (1 << Al)) if (Al > 0 && pred >= (1 << Al))
@@ -599,14 +701,96 @@ decompress_smooth_data(j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
} }
workspace[2] = (JCOEF)pred; workspace[2] = (JCOEF)pred;
} }
if (change_dc) {
/* AC03 */
if ((Al = coef_bits[6]) != 0 && workspace[3] == 0) {
num = Q00 * (DC07 - DC09 + 2 * DC12 - 2 * DC14 + DC17 - DC19);
if (num >= 0) {
pred = (int)(((Q03 << 7) + num) / (Q03 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
} else {
pred = (int)(((Q03 << 7) - num) / (Q03 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
pred = -pred;
}
workspace[3] = (JCOEF)pred;
}
/* AC12 */
if ((Al = coef_bits[7]) != 0 && workspace[10] == 0) {
num = Q00 * (DC07 - 3 * DC08 + DC09 - DC17 + 3 * DC18 - DC19);
if (num >= 0) {
pred = (int)(((Q12 << 7) + num) / (Q12 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
} else {
pred = (int)(((Q12 << 7) - num) / (Q12 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
pred = -pred;
}
workspace[10] = (JCOEF)pred;
}
/* AC21 */
if ((Al = coef_bits[8]) != 0 && workspace[17] == 0) {
num = Q00 * (DC07 - DC09 - 3 * DC12 + 3 * DC14 + DC17 - DC19);
if (num >= 0) {
pred = (int)(((Q21 << 7) + num) / (Q21 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
} else {
pred = (int)(((Q21 << 7) - num) / (Q21 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
pred = -pred;
}
workspace[17] = (JCOEF)pred;
}
/* AC30 */
if ((Al = coef_bits[9]) != 0 && workspace[24] == 0) {
num = Q00 * (DC07 + 2 * DC08 + DC09 - DC17 - 2 * DC18 - DC19);
if (num >= 0) {
pred = (int)(((Q30 << 7) + num) / (Q30 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
} else {
pred = (int)(((Q30 << 7) - num) / (Q30 << 8));
if (Al > 0 && pred >= (1 << Al))
pred = (1 << Al) - 1;
pred = -pred;
}
workspace[24] = (JCOEF)pred;
}
/* coef_bits[0] is non-negative. Otherwise this function would not
* be called.
*/
num = Q00 *
(-2 * DC01 - 6 * DC02 - 8 * DC03 - 6 * DC04 - 2 * DC05 -
6 * DC06 + 6 * DC07 + 42 * DC08 + 6 * DC09 - 6 * DC10 -
8 * DC11 + 42 * DC12 + 152 * DC13 + 42 * DC14 - 8 * DC15 -
6 * DC16 + 6 * DC17 + 42 * DC18 + 6 * DC19 - 6 * DC20 -
2 * DC21 - 6 * DC22 - 8 * DC23 - 6 * DC24 - 2 * DC25);
if (num >= 0) {
pred = (int)(((Q00 << 7) + num) / (Q00 << 8));
} else {
pred = (int)(((Q00 << 7) - num) / (Q00 << 8));
pred = -pred;
}
workspace[0] = (JCOEF)pred;
} /* change_dc */
/* OK, do the IDCT */ /* OK, do the IDCT */
(*inverse_DCT) (cinfo, compptr, (JCOEFPTR)workspace, output_ptr, (*inverse_DCT) (cinfo, compptr, (JCOEFPTR)workspace, output_ptr,
output_col); output_col);
/* Advance for next column */ /* Advance for next column */
DC1 = DC2; DC2 = DC3; DC01 = DC02; DC02 = DC03; DC03 = DC04; DC04 = DC05;
DC4 = DC5; DC5 = DC6; DC06 = DC07; DC07 = DC08; DC08 = DC09; DC09 = DC10;
DC7 = DC8; DC8 = DC9; DC11 = DC12; DC12 = DC13; DC13 = DC14; DC14 = DC15;
buffer_ptr++, prev_block_row++, next_block_row++; DC16 = DC17; DC17 = DC18; DC18 = DC19; DC19 = DC20;
DC21 = DC22; DC22 = DC23; DC23 = DC24; DC24 = DC25;
buffer_ptr++, prev_block_row++, next_block_row++,
prev_prev_block_row++, next_next_block_row++;
output_col += compptr->_DCT_scaled_size; output_col += compptr->_DCT_scaled_size;
} }
output_ptr += compptr->_DCT_scaled_size; output_ptr += compptr->_DCT_scaled_size;
@@ -655,7 +839,7 @@ jinit_d_coef_controller(j_decompress_ptr cinfo, boolean need_full_buffer)
#ifdef BLOCK_SMOOTHING_SUPPORTED #ifdef BLOCK_SMOOTHING_SUPPORTED
/* If block smoothing could be used, need a bigger window */ /* If block smoothing could be used, need a bigger window */
if (cinfo->progressive_mode) if (cinfo->progressive_mode)
access_rows *= 3; access_rows *= 5;
#endif #endif
coef->whole_image[ci] = (*cinfo->mem->request_virt_barray) coef->whole_image[ci] = (*cinfo->mem->request_virt_barray)
((j_common_ptr)cinfo, JPOOL_IMAGE, TRUE, ((j_common_ptr)cinfo, JPOOL_IMAGE, TRUE,

View File

@@ -5,6 +5,7 @@
* Copyright (C) 1994-1997, Thomas G. Lane. * Copyright (C) 1994-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2020, Google, Inc.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
*/ */
@@ -51,7 +52,7 @@ typedef struct {
#ifdef BLOCK_SMOOTHING_SUPPORTED #ifdef BLOCK_SMOOTHING_SUPPORTED
/* When doing block smoothing, we latch coefficient Al values here */ /* When doing block smoothing, we latch coefficient Al values here */
int *coef_bits_latch; int *coef_bits_latch;
#define SAVED_COEFS 6 /* we save coef_bits[0..5] */ #define SAVED_COEFS 10 /* we save coef_bits[0..9] */
#endif #endif
} my_coef_controller; } my_coef_controller;

View File

@@ -45,9 +45,9 @@ ycc_rgb565_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr = *output_buf++; outptr = *output_buf++;
if (PACK_NEED_ALIGNMENT(outptr)) { if (PACK_NEED_ALIGNMENT(outptr)) {
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
r = range_limit[y + Crrtab[cr]]; r = range_limit[y + Crrtab[cr]];
g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
SCALEBITS))]; SCALEBITS))];
@@ -58,18 +58,18 @@ ycc_rgb565_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
num_cols--; num_cols--;
} }
for (col = 0; col < (num_cols >> 1); col++) { for (col = 0; col < (num_cols >> 1); col++) {
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
r = range_limit[y + Crrtab[cr]]; r = range_limit[y + Crrtab[cr]];
g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
SCALEBITS))]; SCALEBITS))];
b = range_limit[y + Cbbtab[cb]]; b = range_limit[y + Cbbtab[cb]];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
r = range_limit[y + Crrtab[cr]]; r = range_limit[y + Crrtab[cr]];
g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
SCALEBITS))]; SCALEBITS))];
@@ -80,9 +80,9 @@ ycc_rgb565_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr += 4; outptr += 4;
} }
if (num_cols & 1) { if (num_cols & 1) {
y = GETJSAMPLE(*inptr0); y = *inptr0;
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
r = range_limit[y + Crrtab[cr]]; r = range_limit[y + Crrtab[cr]];
g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], g = range_limit[y + ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
SCALEBITS))]; SCALEBITS))];
@@ -125,9 +125,9 @@ ycc_rgb565D_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
if (PACK_NEED_ALIGNMENT(outptr)) { if (PACK_NEED_ALIGNMENT(outptr)) {
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)];
g = range_limit[DITHER_565_G(y + g = range_limit[DITHER_565_G(y +
((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
@@ -139,9 +139,9 @@ ycc_rgb565D_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
num_cols--; num_cols--;
} }
for (col = 0; col < (num_cols >> 1); col++) { for (col = 0; col < (num_cols >> 1); col++) {
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)];
g = range_limit[DITHER_565_G(y + g = range_limit[DITHER_565_G(y +
((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
@@ -150,9 +150,9 @@ ycc_rgb565D_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
d0 = DITHER_ROTATE(d0); d0 = DITHER_ROTATE(d0);
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)];
g = range_limit[DITHER_565_G(y + g = range_limit[DITHER_565_G(y +
((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
@@ -165,9 +165,9 @@ ycc_rgb565D_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr += 4; outptr += 4;
} }
if (num_cols & 1) { if (num_cols & 1) {
y = GETJSAMPLE(*inptr0); y = *inptr0;
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)]; r = range_limit[DITHER_565_R(y + Crrtab[cr], d0)];
g = range_limit[DITHER_565_G(y + g = range_limit[DITHER_565_G(y +
((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], ((int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
@@ -202,32 +202,32 @@ rgb_rgb565_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
if (PACK_NEED_ALIGNMENT(outptr)) { if (PACK_NEED_ALIGNMENT(outptr)) {
r = GETJSAMPLE(*inptr0++); r = *inptr0++;
g = GETJSAMPLE(*inptr1++); g = *inptr1++;
b = GETJSAMPLE(*inptr2++); b = *inptr2++;
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
*(INT16 *)outptr = (INT16)rgb; *(INT16 *)outptr = (INT16)rgb;
outptr += 2; outptr += 2;
num_cols--; num_cols--;
} }
for (col = 0; col < (num_cols >> 1); col++) { for (col = 0; col < (num_cols >> 1); col++) {
r = GETJSAMPLE(*inptr0++); r = *inptr0++;
g = GETJSAMPLE(*inptr1++); g = *inptr1++;
b = GETJSAMPLE(*inptr2++); b = *inptr2++;
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
r = GETJSAMPLE(*inptr0++); r = *inptr0++;
g = GETJSAMPLE(*inptr1++); g = *inptr1++;
b = GETJSAMPLE(*inptr2++); b = *inptr2++;
rgb = PACK_TWO_PIXELS(rgb, PACK_SHORT_565(r, g, b)); rgb = PACK_TWO_PIXELS(rgb, PACK_SHORT_565(r, g, b));
WRITE_TWO_ALIGNED_PIXELS(outptr, rgb); WRITE_TWO_ALIGNED_PIXELS(outptr, rgb);
outptr += 4; outptr += 4;
} }
if (num_cols & 1) { if (num_cols & 1) {
r = GETJSAMPLE(*inptr0); r = *inptr0;
g = GETJSAMPLE(*inptr1); g = *inptr1;
b = GETJSAMPLE(*inptr2); b = *inptr2;
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
*(INT16 *)outptr = (INT16)rgb; *(INT16 *)outptr = (INT16)rgb;
} }
@@ -259,24 +259,24 @@ rgb_rgb565D_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
if (PACK_NEED_ALIGNMENT(outptr)) { if (PACK_NEED_ALIGNMENT(outptr)) {
r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0++), d0)]; r = range_limit[DITHER_565_R(*inptr0++, d0)];
g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1++), d0)]; g = range_limit[DITHER_565_G(*inptr1++, d0)];
b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2++), d0)]; b = range_limit[DITHER_565_B(*inptr2++, d0)];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
*(INT16 *)outptr = (INT16)rgb; *(INT16 *)outptr = (INT16)rgb;
outptr += 2; outptr += 2;
num_cols--; num_cols--;
} }
for (col = 0; col < (num_cols >> 1); col++) { for (col = 0; col < (num_cols >> 1); col++) {
r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0++), d0)]; r = range_limit[DITHER_565_R(*inptr0++, d0)];
g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1++), d0)]; g = range_limit[DITHER_565_G(*inptr1++, d0)];
b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2++), d0)]; b = range_limit[DITHER_565_B(*inptr2++, d0)];
d0 = DITHER_ROTATE(d0); d0 = DITHER_ROTATE(d0);
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0++), d0)]; r = range_limit[DITHER_565_R(*inptr0++, d0)];
g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1++), d0)]; g = range_limit[DITHER_565_G(*inptr1++, d0)];
b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2++), d0)]; b = range_limit[DITHER_565_B(*inptr2++, d0)];
d0 = DITHER_ROTATE(d0); d0 = DITHER_ROTATE(d0);
rgb = PACK_TWO_PIXELS(rgb, PACK_SHORT_565(r, g, b)); rgb = PACK_TWO_PIXELS(rgb, PACK_SHORT_565(r, g, b));
@@ -284,9 +284,9 @@ rgb_rgb565D_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr += 4; outptr += 4;
} }
if (num_cols & 1) { if (num_cols & 1) {
r = range_limit[DITHER_565_R(GETJSAMPLE(*inptr0), d0)]; r = range_limit[DITHER_565_R(*inptr0, d0)];
g = range_limit[DITHER_565_G(GETJSAMPLE(*inptr1), d0)]; g = range_limit[DITHER_565_G(*inptr1, d0)];
b = range_limit[DITHER_565_B(GETJSAMPLE(*inptr2), d0)]; b = range_limit[DITHER_565_B(*inptr2, d0)];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
*(INT16 *)outptr = (INT16)rgb; *(INT16 *)outptr = (INT16)rgb;
} }

View File

@@ -53,9 +53,9 @@ ycc_rgb_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
y = GETJSAMPLE(inptr0[col]); y = inptr0[col];
cb = GETJSAMPLE(inptr1[col]); cb = inptr1[col];
cr = GETJSAMPLE(inptr2[col]); cr = inptr2[col];
/* Range-limiting is essential due to noise introduced by DCT losses. */ /* Range-limiting is essential due to noise introduced by DCT losses. */
outptr[RGB_RED] = range_limit[y + Crrtab[cr]]; outptr[RGB_RED] = range_limit[y + Crrtab[cr]];
outptr[RGB_GREEN] = range_limit[y + outptr[RGB_GREEN] = range_limit[y +
@@ -93,7 +93,6 @@ gray_rgb_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
inptr = input_buf[0][input_row++]; inptr = input_buf[0][input_row++];
outptr = *output_buf++; outptr = *output_buf++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
/* We can dispense with GETJSAMPLE() here */
outptr[RGB_RED] = outptr[RGB_GREEN] = outptr[RGB_BLUE] = inptr[col]; outptr[RGB_RED] = outptr[RGB_GREEN] = outptr[RGB_BLUE] = inptr[col];
/* Set unused byte to 0xFF so it can be interpreted as an opaque */ /* Set unused byte to 0xFF so it can be interpreted as an opaque */
/* alpha channel value */ /* alpha channel value */
@@ -128,7 +127,6 @@ rgb_rgb_convert_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
/* We can dispense with GETJSAMPLE() here */
outptr[RGB_RED] = inptr0[col]; outptr[RGB_RED] = inptr0[col];
outptr[RGB_GREEN] = inptr1[col]; outptr[RGB_GREEN] = inptr1[col];
outptr[RGB_BLUE] = inptr2[col]; outptr[RGB_BLUE] = inptr2[col];

View File

@@ -341,9 +341,9 @@ rgb_gray_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
r = GETJSAMPLE(inptr0[col]); r = inptr0[col];
g = GETJSAMPLE(inptr1[col]); g = inptr1[col];
b = GETJSAMPLE(inptr2[col]); b = inptr2[col];
/* Y */ /* Y */
outptr[col] = (JSAMPLE)((ctab[r + R_Y_OFF] + ctab[g + G_Y_OFF] + outptr[col] = (JSAMPLE)((ctab[r + R_Y_OFF] + ctab[g + G_Y_OFF] +
ctab[b + B_Y_OFF]) >> SCALEBITS); ctab[b + B_Y_OFF]) >> SCALEBITS);
@@ -550,9 +550,9 @@ ycck_cmyk_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
input_row++; input_row++;
outptr = *output_buf++; outptr = *output_buf++;
for (col = 0; col < num_cols; col++) { for (col = 0; col < num_cols; col++) {
y = GETJSAMPLE(inptr0[col]); y = inptr0[col];
cb = GETJSAMPLE(inptr1[col]); cb = inptr1[col];
cr = GETJSAMPLE(inptr2[col]); cr = inptr2[col];
/* Range-limiting is essential due to noise introduced by DCT losses. */ /* Range-limiting is essential due to noise introduced by DCT losses. */
outptr[0] = range_limit[MAXJSAMPLE - (y + Crrtab[cr])]; /* red */ outptr[0] = range_limit[MAXJSAMPLE - (y + Crrtab[cr])]; /* red */
outptr[1] = range_limit[MAXJSAMPLE - (y + /* green */ outptr[1] = range_limit[MAXJSAMPLE - (y + /* green */
@@ -560,7 +560,7 @@ ycck_cmyk_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
SCALEBITS)))]; SCALEBITS)))];
outptr[2] = range_limit[MAXJSAMPLE - (y + Cbbtab[cb])]; /* blue */ outptr[2] = range_limit[MAXJSAMPLE - (y + Cbbtab[cb])]; /* blue */
/* K passes through unchanged */ /* K passes through unchanged */
outptr[3] = inptr3[col]; /* don't need GETJSAMPLE here */ outptr[3] = inptr3[col];
outptr += 4; outptr += 4;
} }
} }

View File

@@ -5,6 +5,7 @@
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2009-2011, 2016, 2018-2019, D. R. Commander. * Copyright (C) 2009-2011, 2016, 2018-2019, D. R. Commander.
* Copyright (C) 2018, Matthias Räncker.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -39,24 +40,6 @@ typedef struct {
int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */
} savable_state; } savable_state;
/* This macro is to work around compilers with missing or broken
* structure assignment. You'll need to fix this code if you have
* such a compiler and you change MAX_COMPS_IN_SCAN.
*/
#ifndef NO_STRUCT_ASSIGN
#define ASSIGN_STATE(dest, src) ((dest) = (src))
#else
#if MAX_COMPS_IN_SCAN == 4
#define ASSIGN_STATE(dest, src) \
((dest).last_dc_val[0] = (src).last_dc_val[0], \
(dest).last_dc_val[1] = (src).last_dc_val[1], \
(dest).last_dc_val[2] = (src).last_dc_val[2], \
(dest).last_dc_val[3] = (src).last_dc_val[3])
#endif
#endif
typedef struct { typedef struct {
struct jpeg_entropy_decoder pub; /* public fields */ struct jpeg_entropy_decoder pub; /* public fields */
@@ -325,7 +308,7 @@ jpeg_fill_bit_buffer(bitread_working_state *state,
bytes_in_buffer = cinfo->src->bytes_in_buffer; bytes_in_buffer = cinfo->src->bytes_in_buffer;
} }
bytes_in_buffer--; bytes_in_buffer--;
c = GETJOCTET(*next_input_byte++); c = *next_input_byte++;
/* If it's 0xFF, check and discard stuffed zero byte */ /* If it's 0xFF, check and discard stuffed zero byte */
if (c == 0xFF) { if (c == 0xFF) {
@@ -342,7 +325,7 @@ jpeg_fill_bit_buffer(bitread_working_state *state,
bytes_in_buffer = cinfo->src->bytes_in_buffer; bytes_in_buffer = cinfo->src->bytes_in_buffer;
} }
bytes_in_buffer--; bytes_in_buffer--;
c = GETJOCTET(*next_input_byte++); c = *next_input_byte++;
} while (c == 0xFF); } while (c == 0xFF);
if (c == 0) { if (c == 0) {
@@ -405,8 +388,8 @@ no_more_bytes:
#define GET_BYTE { \ #define GET_BYTE { \
register int c0, c1; \ register int c0, c1; \
c0 = GETJOCTET(*buffer++); \ c0 = *buffer++; \
c1 = GETJOCTET(*buffer); \ c1 = *buffer; \
/* Pre-execute most common case */ \ /* Pre-execute most common case */ \
get_buffer = (get_buffer << 8) | c0; \ get_buffer = (get_buffer << 8) | c0; \
bits_left += 8; \ bits_left += 8; \
@@ -423,7 +406,7 @@ no_more_bytes:
} \ } \
} }
#if SIZEOF_SIZE_T == 8 || defined(_WIN64) #if SIZEOF_SIZE_T == 8 || defined(_WIN64) || (defined(__x86_64__) && defined(__ILP32__))
/* Pre-fetch 48 bytes, because the holding register is 64-bit */ /* Pre-fetch 48 bytes, because the holding register is 64-bit */
#define FILL_BIT_BUFFER_FAST \ #define FILL_BIT_BUFFER_FAST \
@@ -568,7 +551,7 @@ decode_mcu_slow(j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
/* Load up working state */ /* Load up working state */
BITREAD_LOAD_STATE(cinfo, entropy->bitstate); BITREAD_LOAD_STATE(cinfo, entropy->bitstate);
ASSIGN_STATE(state, entropy->saved); state = entropy->saved;
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL; JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL;
@@ -653,7 +636,7 @@ decode_mcu_slow(j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
/* Completed MCU, so update state */ /* Completed MCU, so update state */
BITREAD_SAVE_STATE(cinfo, entropy->bitstate); BITREAD_SAVE_STATE(cinfo, entropy->bitstate);
ASSIGN_STATE(entropy->saved, state); entropy->saved = state;
return TRUE; return TRUE;
} }
@@ -671,7 +654,7 @@ decode_mcu_fast(j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
/* Load up working state */ /* Load up working state */
BITREAD_LOAD_STATE(cinfo, entropy->bitstate); BITREAD_LOAD_STATE(cinfo, entropy->bitstate);
buffer = (JOCTET *)br_state.next_input_byte; buffer = (JOCTET *)br_state.next_input_byte;
ASSIGN_STATE(state, entropy->saved); state = entropy->saved;
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) { for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL; JBLOCKROW block = MCU_data ? MCU_data[blkn] : NULL;
@@ -740,7 +723,7 @@ decode_mcu_fast(j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
br_state.bytes_in_buffer -= (buffer - br_state.next_input_byte); br_state.bytes_in_buffer -= (buffer - br_state.next_input_byte);
br_state.next_input_byte = buffer; br_state.next_input_byte = buffer;
BITREAD_SAVE_STATE(cinfo, entropy->bitstate); BITREAD_SAVE_STATE(cinfo, entropy->bitstate);
ASSIGN_STATE(entropy->saved, state); entropy->saved = state;
return TRUE; return TRUE;
} }

View File

@@ -5,6 +5,7 @@
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2010-2011, 2015-2016, D. R. Commander. * Copyright (C) 2010-2011, 2015-2016, D. R. Commander.
* Copyright (C) 2018, Matthias Räncker.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -78,6 +79,11 @@ EXTERN(void) jpeg_make_d_derived_tbl(j_decompress_ptr cinfo, boolean isDC,
typedef size_t bit_buf_type; /* type of bit-extraction buffer */ typedef size_t bit_buf_type; /* type of bit-extraction buffer */
#define BIT_BUF_SIZE 64 /* size of buffer in bits */ #define BIT_BUF_SIZE 64 /* size of buffer in bits */
#elif defined(__x86_64__) && defined(__ILP32__)
typedef unsigned long long bit_buf_type; /* type of bit-extraction buffer */
#define BIT_BUF_SIZE 64 /* size of buffer in bits */
#else #else
typedef unsigned long bit_buf_type; /* type of bit-extraction buffer */ typedef unsigned long bit_buf_type; /* type of bit-extraction buffer */

32
jdicc.c
View File

@@ -38,18 +38,18 @@ marker_is_icc(jpeg_saved_marker_ptr marker)
marker->marker == ICC_MARKER && marker->marker == ICC_MARKER &&
marker->data_length >= ICC_OVERHEAD_LEN && marker->data_length >= ICC_OVERHEAD_LEN &&
/* verify the identifying string */ /* verify the identifying string */
GETJOCTET(marker->data[0]) == 0x49 && marker->data[0] == 0x49 &&
GETJOCTET(marker->data[1]) == 0x43 && marker->data[1] == 0x43 &&
GETJOCTET(marker->data[2]) == 0x43 && marker->data[2] == 0x43 &&
GETJOCTET(marker->data[3]) == 0x5F && marker->data[3] == 0x5F &&
GETJOCTET(marker->data[4]) == 0x50 && marker->data[4] == 0x50 &&
GETJOCTET(marker->data[5]) == 0x52 && marker->data[5] == 0x52 &&
GETJOCTET(marker->data[6]) == 0x4F && marker->data[6] == 0x4F &&
GETJOCTET(marker->data[7]) == 0x46 && marker->data[7] == 0x46 &&
GETJOCTET(marker->data[8]) == 0x49 && marker->data[8] == 0x49 &&
GETJOCTET(marker->data[9]) == 0x4C && marker->data[9] == 0x4C &&
GETJOCTET(marker->data[10]) == 0x45 && marker->data[10] == 0x45 &&
GETJOCTET(marker->data[11]) == 0x0; marker->data[11] == 0x0;
} }
@@ -102,12 +102,12 @@ jpeg_read_icc_profile(j_decompress_ptr cinfo, JOCTET **icc_data_ptr,
for (marker = cinfo->marker_list; marker != NULL; marker = marker->next) { for (marker = cinfo->marker_list; marker != NULL; marker = marker->next) {
if (marker_is_icc(marker)) { if (marker_is_icc(marker)) {
if (num_markers == 0) if (num_markers == 0)
num_markers = GETJOCTET(marker->data[13]); num_markers = marker->data[13];
else if (num_markers != GETJOCTET(marker->data[13])) { else if (num_markers != marker->data[13]) {
WARNMS(cinfo, JWRN_BOGUS_ICC); /* inconsistent num_markers fields */ WARNMS(cinfo, JWRN_BOGUS_ICC); /* inconsistent num_markers fields */
return FALSE; return FALSE;
} }
seq_no = GETJOCTET(marker->data[12]); seq_no = marker->data[12];
if (seq_no <= 0 || seq_no > num_markers) { if (seq_no <= 0 || seq_no > num_markers) {
WARNMS(cinfo, JWRN_BOGUS_ICC); /* bogus sequence number */ WARNMS(cinfo, JWRN_BOGUS_ICC); /* bogus sequence number */
return FALSE; return FALSE;
@@ -154,7 +154,7 @@ jpeg_read_icc_profile(j_decompress_ptr cinfo, JOCTET **icc_data_ptr,
JOCTET FAR *src_ptr; JOCTET FAR *src_ptr;
JOCTET *dst_ptr; JOCTET *dst_ptr;
unsigned int length; unsigned int length;
seq_no = GETJOCTET(marker->data[12]); seq_no = marker->data[12];
dst_ptr = icc_data + data_offset[seq_no]; dst_ptr = icc_data + data_offset[seq_no];
src_ptr = marker->data + ICC_OVERHEAD_LEN; src_ptr = marker->data + ICC_OVERHEAD_LEN;
length = data_length[seq_no]; length = data_length[seq_no];

View File

@@ -151,7 +151,7 @@ typedef my_marker_reader *my_marker_ptr;
#define INPUT_BYTE(cinfo, V, action) \ #define INPUT_BYTE(cinfo, V, action) \
MAKESTMT( MAKE_BYTE_AVAIL(cinfo, action); \ MAKESTMT( MAKE_BYTE_AVAIL(cinfo, action); \
bytes_in_buffer--; \ bytes_in_buffer--; \
V = GETJOCTET(*next_input_byte++); ) V = *next_input_byte++; )
/* As above, but read two bytes interpreted as an unsigned 16-bit integer. /* As above, but read two bytes interpreted as an unsigned 16-bit integer.
* V should be declared unsigned int or perhaps JLONG. * V should be declared unsigned int or perhaps JLONG.
@@ -159,10 +159,10 @@ typedef my_marker_reader *my_marker_ptr;
#define INPUT_2BYTES(cinfo, V, action) \ #define INPUT_2BYTES(cinfo, V, action) \
MAKESTMT( MAKE_BYTE_AVAIL(cinfo, action); \ MAKESTMT( MAKE_BYTE_AVAIL(cinfo, action); \
bytes_in_buffer--; \ bytes_in_buffer--; \
V = ((unsigned int)GETJOCTET(*next_input_byte++)) << 8; \ V = ((unsigned int)(*next_input_byte++)) << 8; \
MAKE_BYTE_AVAIL(cinfo, action); \ MAKE_BYTE_AVAIL(cinfo, action); \
bytes_in_buffer--; \ bytes_in_buffer--; \
V += GETJOCTET(*next_input_byte++); ) V += *next_input_byte++; )
/* /*
@@ -608,18 +608,18 @@ examine_app0(j_decompress_ptr cinfo, JOCTET *data, unsigned int datalen,
JLONG totallen = (JLONG)datalen + remaining; JLONG totallen = (JLONG)datalen + remaining;
if (datalen >= APP0_DATA_LEN && if (datalen >= APP0_DATA_LEN &&
GETJOCTET(data[0]) == 0x4A && data[0] == 0x4A &&
GETJOCTET(data[1]) == 0x46 && data[1] == 0x46 &&
GETJOCTET(data[2]) == 0x49 && data[2] == 0x49 &&
GETJOCTET(data[3]) == 0x46 && data[3] == 0x46 &&
GETJOCTET(data[4]) == 0) { data[4] == 0) {
/* Found JFIF APP0 marker: save info */ /* Found JFIF APP0 marker: save info */
cinfo->saw_JFIF_marker = TRUE; cinfo->saw_JFIF_marker = TRUE;
cinfo->JFIF_major_version = GETJOCTET(data[5]); cinfo->JFIF_major_version = data[5];
cinfo->JFIF_minor_version = GETJOCTET(data[6]); cinfo->JFIF_minor_version = data[6];
cinfo->density_unit = GETJOCTET(data[7]); cinfo->density_unit = data[7];
cinfo->X_density = (GETJOCTET(data[8]) << 8) + GETJOCTET(data[9]); cinfo->X_density = (data[8] << 8) + data[9];
cinfo->Y_density = (GETJOCTET(data[10]) << 8) + GETJOCTET(data[11]); cinfo->Y_density = (data[10] << 8) + data[11];
/* Check version. /* Check version.
* Major version must be 1, anything else signals an incompatible change. * Major version must be 1, anything else signals an incompatible change.
* (We used to treat this as an error, but now it's a nonfatal warning, * (We used to treat this as an error, but now it's a nonfatal warning,
@@ -634,24 +634,22 @@ examine_app0(j_decompress_ptr cinfo, JOCTET *data, unsigned int datalen,
cinfo->JFIF_major_version, cinfo->JFIF_minor_version, cinfo->JFIF_major_version, cinfo->JFIF_minor_version,
cinfo->X_density, cinfo->Y_density, cinfo->density_unit); cinfo->X_density, cinfo->Y_density, cinfo->density_unit);
/* Validate thumbnail dimensions and issue appropriate messages */ /* Validate thumbnail dimensions and issue appropriate messages */
if (GETJOCTET(data[12]) | GETJOCTET(data[13])) if (data[12] | data[13])
TRACEMS2(cinfo, 1, JTRC_JFIF_THUMBNAIL, TRACEMS2(cinfo, 1, JTRC_JFIF_THUMBNAIL, data[12], data[13]);
GETJOCTET(data[12]), GETJOCTET(data[13]));
totallen -= APP0_DATA_LEN; totallen -= APP0_DATA_LEN;
if (totallen != if (totallen != ((JLONG)data[12] * (JLONG)data[13] * (JLONG)3))
((JLONG)GETJOCTET(data[12]) * (JLONG)GETJOCTET(data[13]) * (JLONG)3))
TRACEMS1(cinfo, 1, JTRC_JFIF_BADTHUMBNAILSIZE, (int)totallen); TRACEMS1(cinfo, 1, JTRC_JFIF_BADTHUMBNAILSIZE, (int)totallen);
} else if (datalen >= 6 && } else if (datalen >= 6 &&
GETJOCTET(data[0]) == 0x4A && data[0] == 0x4A &&
GETJOCTET(data[1]) == 0x46 && data[1] == 0x46 &&
GETJOCTET(data[2]) == 0x58 && data[2] == 0x58 &&
GETJOCTET(data[3]) == 0x58 && data[3] == 0x58 &&
GETJOCTET(data[4]) == 0) { data[4] == 0) {
/* Found JFIF "JFXX" extension APP0 marker */ /* Found JFIF "JFXX" extension APP0 marker */
/* The library doesn't actually do anything with these, /* The library doesn't actually do anything with these,
* but we try to produce a helpful trace message. * but we try to produce a helpful trace message.
*/ */
switch (GETJOCTET(data[5])) { switch (data[5]) {
case 0x10: case 0x10:
TRACEMS1(cinfo, 1, JTRC_THUMB_JPEG, (int)totallen); TRACEMS1(cinfo, 1, JTRC_THUMB_JPEG, (int)totallen);
break; break;
@@ -662,8 +660,7 @@ examine_app0(j_decompress_ptr cinfo, JOCTET *data, unsigned int datalen,
TRACEMS1(cinfo, 1, JTRC_THUMB_RGB, (int)totallen); TRACEMS1(cinfo, 1, JTRC_THUMB_RGB, (int)totallen);
break; break;
default: default:
TRACEMS2(cinfo, 1, JTRC_JFIF_EXTENSION, TRACEMS2(cinfo, 1, JTRC_JFIF_EXTENSION, data[5], (int)totallen);
GETJOCTET(data[5]), (int)totallen);
break; break;
} }
} else { } else {
@@ -684,16 +681,16 @@ examine_app14(j_decompress_ptr cinfo, JOCTET *data, unsigned int datalen,
unsigned int version, flags0, flags1, transform; unsigned int version, flags0, flags1, transform;
if (datalen >= APP14_DATA_LEN && if (datalen >= APP14_DATA_LEN &&
GETJOCTET(data[0]) == 0x41 && data[0] == 0x41 &&
GETJOCTET(data[1]) == 0x64 && data[1] == 0x64 &&
GETJOCTET(data[2]) == 0x6F && data[2] == 0x6F &&
GETJOCTET(data[3]) == 0x62 && data[3] == 0x62 &&
GETJOCTET(data[4]) == 0x65) { data[4] == 0x65) {
/* Found Adobe APP14 marker */ /* Found Adobe APP14 marker */
version = (GETJOCTET(data[5]) << 8) + GETJOCTET(data[6]); version = (data[5] << 8) + data[6];
flags0 = (GETJOCTET(data[7]) << 8) + GETJOCTET(data[8]); flags0 = (data[7] << 8) + data[8];
flags1 = (GETJOCTET(data[9]) << 8) + GETJOCTET(data[10]); flags1 = (data[9] << 8) + data[10];
transform = GETJOCTET(data[11]); transform = data[11];
TRACEMS4(cinfo, 1, JTRC_ADOBE, version, flags0, flags1, transform); TRACEMS4(cinfo, 1, JTRC_ADOBE, version, flags0, flags1, transform);
cinfo->saw_Adobe_marker = TRUE; cinfo->saw_Adobe_marker = TRUE;
cinfo->Adobe_transform = (UINT8)transform; cinfo->Adobe_transform = (UINT8)transform;

View File

@@ -5,7 +5,7 @@
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* Modified 2002-2009 by Guido Vollbeding. * Modified 2002-2009 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2009-2011, 2016, D. R. Commander. * Copyright (C) 2009-2011, 2016, 2019, D. R. Commander.
* Copyright (C) 2013, Linaro Limited. * Copyright (C) 2013, Linaro Limited.
* Copyright (C) 2015, Google, Inc. * Copyright (C) 2015, Google, Inc.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
@@ -22,7 +22,6 @@
#include "jpeglib.h" #include "jpeglib.h"
#include "jpegcomp.h" #include "jpegcomp.h"
#include "jdmaster.h" #include "jdmaster.h"
#include "jsimd.h"
/* /*
@@ -70,17 +69,6 @@ use_merged_upsample(j_decompress_ptr cinfo)
cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size || cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size ||
cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size) cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size)
return FALSE; return FALSE;
#ifdef WITH_SIMD
/* If YCbCr-to-RGB color conversion is SIMD-accelerated but merged upsampling
isn't, then disabling merged upsampling is likely to be faster when
decompressing YCbCr JPEG images. */
if (!jsimd_can_h2v2_merged_upsample() && !jsimd_can_h2v1_merged_upsample() &&
jsimd_can_ycc_rgb() && cinfo->jpeg_color_space == JCS_YCbCr &&
(cinfo->out_color_space == JCS_RGB ||
(cinfo->out_color_space >= JCS_EXT_RGB &&
cinfo->out_color_space <= JCS_EXT_ARGB)))
return FALSE;
#endif
/* ??? also need to test for upsample-time rescaling, when & if supported */ /* ??? also need to test for upsample-time rescaling, when & if supported */
return TRUE; /* by golly, it'll work... */ return TRUE; /* by golly, it'll work... */
#else #else
@@ -580,6 +568,7 @@ master_selection(j_decompress_ptr cinfo)
*/ */
cinfo->master->first_iMCU_col = 0; cinfo->master->first_iMCU_col = 0;
cinfo->master->last_iMCU_col = cinfo->MCUs_per_row - 1; cinfo->master->last_iMCU_col = cinfo->MCUs_per_row - 1;
cinfo->master->last_good_iMCU_row = 0;
#ifdef D_MULTISCAN_FILES_SUPPORTED #ifdef D_MULTISCAN_FILES_SUPPORTED
/* If jpeg_start_decompress will read the whole file, initialize /* If jpeg_start_decompress will read the whole file, initialize

View File

@@ -43,20 +43,20 @@ h2v1_merged_upsample_565_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
/* Loop for each pair of output pixels */ /* Loop for each pair of output pixels */
for (col = cinfo->output_width >> 1; col > 0; col--) { for (col = cinfo->output_width >> 1; col > 0; col--) {
/* Do the chroma part of the calculation */ /* Do the chroma part of the calculation */
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
/* Fetch 2 Y values and emit 2 pixels */ /* Fetch 2 Y values and emit 2 pixels */
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
@@ -68,12 +68,12 @@ h2v1_merged_upsample_565_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
/* If image width is odd, do the last output column separately */ /* If image width is odd, do the last output column separately */
if (cinfo->output_width & 1) { if (cinfo->output_width & 1) {
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
y = GETJSAMPLE(*inptr0); y = *inptr0;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
@@ -115,21 +115,21 @@ h2v1_merged_upsample_565D_internal(j_decompress_ptr cinfo,
/* Loop for each pair of output pixels */ /* Loop for each pair of output pixels */
for (col = cinfo->output_width >> 1; col > 0; col--) { for (col = cinfo->output_width >> 1; col > 0; col--) {
/* Do the chroma part of the calculation */ /* Do the chroma part of the calculation */
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
/* Fetch 2 Y values and emit 2 pixels */ /* Fetch 2 Y values and emit 2 pixels */
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
r = range_limit[DITHER_565_R(y + cred, d0)]; r = range_limit[DITHER_565_R(y + cred, d0)];
g = range_limit[DITHER_565_G(y + cgreen, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)];
b = range_limit[DITHER_565_B(y + cblue, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)];
d0 = DITHER_ROTATE(d0); d0 = DITHER_ROTATE(d0);
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
r = range_limit[DITHER_565_R(y + cred, d0)]; r = range_limit[DITHER_565_R(y + cred, d0)];
g = range_limit[DITHER_565_G(y + cgreen, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)];
b = range_limit[DITHER_565_B(y + cblue, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)];
@@ -142,12 +142,12 @@ h2v1_merged_upsample_565D_internal(j_decompress_ptr cinfo,
/* If image width is odd, do the last output column separately */ /* If image width is odd, do the last output column separately */
if (cinfo->output_width & 1) { if (cinfo->output_width & 1) {
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
y = GETJSAMPLE(*inptr0); y = *inptr0;
r = range_limit[DITHER_565_R(y + cred, d0)]; r = range_limit[DITHER_565_R(y + cred, d0)];
g = range_limit[DITHER_565_G(y + cgreen, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)];
b = range_limit[DITHER_565_B(y + cblue, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)];
@@ -189,20 +189,20 @@ h2v2_merged_upsample_565_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
/* Loop for each group of output pixels */ /* Loop for each group of output pixels */
for (col = cinfo->output_width >> 1; col > 0; col--) { for (col = cinfo->output_width >> 1; col > 0; col--) {
/* Do the chroma part of the calculation */ /* Do the chroma part of the calculation */
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
/* Fetch 4 Y values and emit 4 pixels */ /* Fetch 4 Y values and emit 4 pixels */
y = GETJSAMPLE(*inptr00++); y = *inptr00++;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr00++); y = *inptr00++;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
@@ -211,13 +211,13 @@ h2v2_merged_upsample_565_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
WRITE_TWO_PIXELS(outptr0, rgb); WRITE_TWO_PIXELS(outptr0, rgb);
outptr0 += 4; outptr0 += 4;
y = GETJSAMPLE(*inptr01++); y = *inptr01++;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr01++); y = *inptr01++;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
@@ -229,20 +229,20 @@ h2v2_merged_upsample_565_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
/* If image width is odd, do the last output column separately */ /* If image width is odd, do the last output column separately */
if (cinfo->output_width & 1) { if (cinfo->output_width & 1) {
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
y = GETJSAMPLE(*inptr00); y = *inptr00;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
*(INT16 *)outptr0 = (INT16)rgb; *(INT16 *)outptr0 = (INT16)rgb;
y = GETJSAMPLE(*inptr01); y = *inptr01;
r = range_limit[y + cred]; r = range_limit[y + cred];
g = range_limit[y + cgreen]; g = range_limit[y + cgreen];
b = range_limit[y + cblue]; b = range_limit[y + cblue];
@@ -287,21 +287,21 @@ h2v2_merged_upsample_565D_internal(j_decompress_ptr cinfo,
/* Loop for each group of output pixels */ /* Loop for each group of output pixels */
for (col = cinfo->output_width >> 1; col > 0; col--) { for (col = cinfo->output_width >> 1; col > 0; col--) {
/* Do the chroma part of the calculation */ /* Do the chroma part of the calculation */
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
/* Fetch 4 Y values and emit 4 pixels */ /* Fetch 4 Y values and emit 4 pixels */
y = GETJSAMPLE(*inptr00++); y = *inptr00++;
r = range_limit[DITHER_565_R(y + cred, d0)]; r = range_limit[DITHER_565_R(y + cred, d0)];
g = range_limit[DITHER_565_G(y + cgreen, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)];
b = range_limit[DITHER_565_B(y + cblue, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)];
d0 = DITHER_ROTATE(d0); d0 = DITHER_ROTATE(d0);
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr00++); y = *inptr00++;
r = range_limit[DITHER_565_R(y + cred, d0)]; r = range_limit[DITHER_565_R(y + cred, d0)];
g = range_limit[DITHER_565_G(y + cgreen, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)];
b = range_limit[DITHER_565_B(y + cblue, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)];
@@ -311,14 +311,14 @@ h2v2_merged_upsample_565D_internal(j_decompress_ptr cinfo,
WRITE_TWO_PIXELS(outptr0, rgb); WRITE_TWO_PIXELS(outptr0, rgb);
outptr0 += 4; outptr0 += 4;
y = GETJSAMPLE(*inptr01++); y = *inptr01++;
r = range_limit[DITHER_565_R(y + cred, d1)]; r = range_limit[DITHER_565_R(y + cred, d1)];
g = range_limit[DITHER_565_G(y + cgreen, d1)]; g = range_limit[DITHER_565_G(y + cgreen, d1)];
b = range_limit[DITHER_565_B(y + cblue, d1)]; b = range_limit[DITHER_565_B(y + cblue, d1)];
d1 = DITHER_ROTATE(d1); d1 = DITHER_ROTATE(d1);
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
y = GETJSAMPLE(*inptr01++); y = *inptr01++;
r = range_limit[DITHER_565_R(y + cred, d1)]; r = range_limit[DITHER_565_R(y + cred, d1)];
g = range_limit[DITHER_565_G(y + cgreen, d1)]; g = range_limit[DITHER_565_G(y + cgreen, d1)];
b = range_limit[DITHER_565_B(y + cblue, d1)]; b = range_limit[DITHER_565_B(y + cblue, d1)];
@@ -331,20 +331,20 @@ h2v2_merged_upsample_565D_internal(j_decompress_ptr cinfo,
/* If image width is odd, do the last output column separately */ /* If image width is odd, do the last output column separately */
if (cinfo->output_width & 1) { if (cinfo->output_width & 1) {
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
y = GETJSAMPLE(*inptr00); y = *inptr00;
r = range_limit[DITHER_565_R(y + cred, d0)]; r = range_limit[DITHER_565_R(y + cred, d0)];
g = range_limit[DITHER_565_G(y + cgreen, d0)]; g = range_limit[DITHER_565_G(y + cgreen, d0)];
b = range_limit[DITHER_565_B(y + cblue, d0)]; b = range_limit[DITHER_565_B(y + cblue, d0)];
rgb = PACK_SHORT_565(r, g, b); rgb = PACK_SHORT_565(r, g, b);
*(INT16 *)outptr0 = (INT16)rgb; *(INT16 *)outptr0 = (INT16)rgb;
y = GETJSAMPLE(*inptr01); y = *inptr01;
r = range_limit[DITHER_565_R(y + cred, d1)]; r = range_limit[DITHER_565_R(y + cred, d1)];
g = range_limit[DITHER_565_G(y + cgreen, d1)]; g = range_limit[DITHER_565_G(y + cgreen, d1)];
b = range_limit[DITHER_565_B(y + cblue, d1)]; b = range_limit[DITHER_565_B(y + cblue, d1)];

View File

@@ -46,13 +46,13 @@ h2v1_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
/* Loop for each pair of output pixels */ /* Loop for each pair of output pixels */
for (col = cinfo->output_width >> 1; col > 0; col--) { for (col = cinfo->output_width >> 1; col > 0; col--) {
/* Do the chroma part of the calculation */ /* Do the chroma part of the calculation */
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
/* Fetch 2 Y values and emit 2 pixels */ /* Fetch 2 Y values and emit 2 pixels */
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
outptr[RGB_RED] = range_limit[y + cred]; outptr[RGB_RED] = range_limit[y + cred];
outptr[RGB_GREEN] = range_limit[y + cgreen]; outptr[RGB_GREEN] = range_limit[y + cgreen];
outptr[RGB_BLUE] = range_limit[y + cblue]; outptr[RGB_BLUE] = range_limit[y + cblue];
@@ -60,7 +60,7 @@ h2v1_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr[RGB_ALPHA] = 0xFF; outptr[RGB_ALPHA] = 0xFF;
#endif #endif
outptr += RGB_PIXELSIZE; outptr += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr0++); y = *inptr0++;
outptr[RGB_RED] = range_limit[y + cred]; outptr[RGB_RED] = range_limit[y + cred];
outptr[RGB_GREEN] = range_limit[y + cgreen]; outptr[RGB_GREEN] = range_limit[y + cgreen];
outptr[RGB_BLUE] = range_limit[y + cblue]; outptr[RGB_BLUE] = range_limit[y + cblue];
@@ -71,12 +71,12 @@ h2v1_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
} }
/* If image width is odd, do the last output column separately */ /* If image width is odd, do the last output column separately */
if (cinfo->output_width & 1) { if (cinfo->output_width & 1) {
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
y = GETJSAMPLE(*inptr0); y = *inptr0;
outptr[RGB_RED] = range_limit[y + cred]; outptr[RGB_RED] = range_limit[y + cred];
outptr[RGB_GREEN] = range_limit[y + cgreen]; outptr[RGB_GREEN] = range_limit[y + cgreen];
outptr[RGB_BLUE] = range_limit[y + cblue]; outptr[RGB_BLUE] = range_limit[y + cblue];
@@ -120,13 +120,13 @@ h2v2_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
/* Loop for each group of output pixels */ /* Loop for each group of output pixels */
for (col = cinfo->output_width >> 1; col > 0; col--) { for (col = cinfo->output_width >> 1; col > 0; col--) {
/* Do the chroma part of the calculation */ /* Do the chroma part of the calculation */
cb = GETJSAMPLE(*inptr1++); cb = *inptr1++;
cr = GETJSAMPLE(*inptr2++); cr = *inptr2++;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
/* Fetch 4 Y values and emit 4 pixels */ /* Fetch 4 Y values and emit 4 pixels */
y = GETJSAMPLE(*inptr00++); y = *inptr00++;
outptr0[RGB_RED] = range_limit[y + cred]; outptr0[RGB_RED] = range_limit[y + cred];
outptr0[RGB_GREEN] = range_limit[y + cgreen]; outptr0[RGB_GREEN] = range_limit[y + cgreen];
outptr0[RGB_BLUE] = range_limit[y + cblue]; outptr0[RGB_BLUE] = range_limit[y + cblue];
@@ -134,7 +134,7 @@ h2v2_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr0[RGB_ALPHA] = 0xFF; outptr0[RGB_ALPHA] = 0xFF;
#endif #endif
outptr0 += RGB_PIXELSIZE; outptr0 += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr00++); y = *inptr00++;
outptr0[RGB_RED] = range_limit[y + cred]; outptr0[RGB_RED] = range_limit[y + cred];
outptr0[RGB_GREEN] = range_limit[y + cgreen]; outptr0[RGB_GREEN] = range_limit[y + cgreen];
outptr0[RGB_BLUE] = range_limit[y + cblue]; outptr0[RGB_BLUE] = range_limit[y + cblue];
@@ -142,7 +142,7 @@ h2v2_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr0[RGB_ALPHA] = 0xFF; outptr0[RGB_ALPHA] = 0xFF;
#endif #endif
outptr0 += RGB_PIXELSIZE; outptr0 += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr01++); y = *inptr01++;
outptr1[RGB_RED] = range_limit[y + cred]; outptr1[RGB_RED] = range_limit[y + cred];
outptr1[RGB_GREEN] = range_limit[y + cgreen]; outptr1[RGB_GREEN] = range_limit[y + cgreen];
outptr1[RGB_BLUE] = range_limit[y + cblue]; outptr1[RGB_BLUE] = range_limit[y + cblue];
@@ -150,7 +150,7 @@ h2v2_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
outptr1[RGB_ALPHA] = 0xFF; outptr1[RGB_ALPHA] = 0xFF;
#endif #endif
outptr1 += RGB_PIXELSIZE; outptr1 += RGB_PIXELSIZE;
y = GETJSAMPLE(*inptr01++); y = *inptr01++;
outptr1[RGB_RED] = range_limit[y + cred]; outptr1[RGB_RED] = range_limit[y + cred];
outptr1[RGB_GREEN] = range_limit[y + cgreen]; outptr1[RGB_GREEN] = range_limit[y + cgreen];
outptr1[RGB_BLUE] = range_limit[y + cblue]; outptr1[RGB_BLUE] = range_limit[y + cblue];
@@ -161,19 +161,19 @@ h2v2_merged_upsample_internal(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
} }
/* If image width is odd, do the last output column separately */ /* If image width is odd, do the last output column separately */
if (cinfo->output_width & 1) { if (cinfo->output_width & 1) {
cb = GETJSAMPLE(*inptr1); cb = *inptr1;
cr = GETJSAMPLE(*inptr2); cr = *inptr2;
cred = Crrtab[cr]; cred = Crrtab[cr];
cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS); cgreen = (int)RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr], SCALEBITS);
cblue = Cbbtab[cb]; cblue = Cbbtab[cb];
y = GETJSAMPLE(*inptr00); y = *inptr00;
outptr0[RGB_RED] = range_limit[y + cred]; outptr0[RGB_RED] = range_limit[y + cred];
outptr0[RGB_GREEN] = range_limit[y + cgreen]; outptr0[RGB_GREEN] = range_limit[y + cgreen];
outptr0[RGB_BLUE] = range_limit[y + cblue]; outptr0[RGB_BLUE] = range_limit[y + cblue];
#ifdef RGB_ALPHA #ifdef RGB_ALPHA
outptr0[RGB_ALPHA] = 0xFF; outptr0[RGB_ALPHA] = 0xFF;
#endif #endif
y = GETJSAMPLE(*inptr01); y = *inptr01;
outptr1[RGB_RED] = range_limit[y + cred]; outptr1[RGB_RED] = range_limit[y + cred];
outptr1[RGB_GREEN] = range_limit[y + cgreen]; outptr1[RGB_GREEN] = range_limit[y + cgreen];
outptr1[RGB_BLUE] = range_limit[y + cblue]; outptr1[RGB_BLUE] = range_limit[y + cblue];

View File

@@ -4,7 +4,7 @@
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1995-1997, Thomas G. Lane. * Copyright (C) 1995-1997, Thomas G. Lane.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2015-2016, 2018, D. R. Commander. * Copyright (C) 2015-2016, 2018-2020, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -41,25 +41,6 @@ typedef struct {
int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */ int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */
} savable_state; } savable_state;
/* This macro is to work around compilers with missing or broken
* structure assignment. You'll need to fix this code if you have
* such a compiler and you change MAX_COMPS_IN_SCAN.
*/
#ifndef NO_STRUCT_ASSIGN
#define ASSIGN_STATE(dest, src) ((dest) = (src))
#else
#if MAX_COMPS_IN_SCAN == 4
#define ASSIGN_STATE(dest, src) \
((dest).EOBRUN = (src).EOBRUN, \
(dest).last_dc_val[0] = (src).last_dc_val[0], \
(dest).last_dc_val[1] = (src).last_dc_val[1], \
(dest).last_dc_val[2] = (src).last_dc_val[2], \
(dest).last_dc_val[3] = (src).last_dc_val[3])
#endif
#endif
typedef struct { typedef struct {
struct jpeg_entropy_decoder pub; /* public fields */ struct jpeg_entropy_decoder pub; /* public fields */
@@ -102,7 +83,7 @@ start_pass_phuff_decoder(j_decompress_ptr cinfo)
boolean is_DC_band, bad; boolean is_DC_band, bad;
int ci, coefi, tbl; int ci, coefi, tbl;
d_derived_tbl **pdtbl; d_derived_tbl **pdtbl;
int *coef_bit_ptr; int *coef_bit_ptr, *prev_coef_bit_ptr;
jpeg_component_info *compptr; jpeg_component_info *compptr;
is_DC_band = (cinfo->Ss == 0); is_DC_band = (cinfo->Ss == 0);
@@ -143,8 +124,15 @@ start_pass_phuff_decoder(j_decompress_ptr cinfo)
for (ci = 0; ci < cinfo->comps_in_scan; ci++) { for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
int cindex = cinfo->cur_comp_info[ci]->component_index; int cindex = cinfo->cur_comp_info[ci]->component_index;
coef_bit_ptr = &cinfo->coef_bits[cindex][0]; coef_bit_ptr = &cinfo->coef_bits[cindex][0];
prev_coef_bit_ptr = &cinfo->coef_bits[cindex + cinfo->num_components][0];
if (!is_DC_band && coef_bit_ptr[0] < 0) /* AC without prior DC scan */ if (!is_DC_band && coef_bit_ptr[0] < 0) /* AC without prior DC scan */
WARNMS2(cinfo, JWRN_BOGUS_PROGRESSION, cindex, 0); WARNMS2(cinfo, JWRN_BOGUS_PROGRESSION, cindex, 0);
for (coefi = MIN(cinfo->Ss, 1); coefi <= MAX(cinfo->Se, 9); coefi++) {
if (cinfo->input_scan_number > 1)
prev_coef_bit_ptr[coefi] = coef_bit_ptr[coefi];
else
prev_coef_bit_ptr[coefi] = 0;
}
for (coefi = cinfo->Ss; coefi <= cinfo->Se; coefi++) { for (coefi = cinfo->Ss; coefi <= cinfo->Se; coefi++) {
int expected = (coef_bit_ptr[coefi] < 0) ? 0 : coef_bit_ptr[coefi]; int expected = (coef_bit_ptr[coefi] < 0) ? 0 : coef_bit_ptr[coefi];
if (cinfo->Ah != expected) if (cinfo->Ah != expected)
@@ -323,7 +311,7 @@ decode_mcu_DC_first(j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
/* Load up working state */ /* Load up working state */
BITREAD_LOAD_STATE(cinfo, entropy->bitstate); BITREAD_LOAD_STATE(cinfo, entropy->bitstate);
ASSIGN_STATE(state, entropy->saved); state = entropy->saved;
/* Outer loop handles each block in the MCU */ /* Outer loop handles each block in the MCU */
@@ -356,7 +344,7 @@ decode_mcu_DC_first(j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
/* Completed MCU, so update state */ /* Completed MCU, so update state */
BITREAD_SAVE_STATE(cinfo, entropy->bitstate); BITREAD_SAVE_STATE(cinfo, entropy->bitstate);
ASSIGN_STATE(entropy->saved, state); entropy->saved = state;
} }
/* Account for restart interval (no-op if not using restarts) */ /* Account for restart interval (no-op if not using restarts) */
@@ -676,7 +664,7 @@ jinit_phuff_decoder(j_decompress_ptr cinfo)
/* Create progression status table */ /* Create progression status table */
cinfo->coef_bits = (int (*)[DCTSIZE2]) cinfo->coef_bits = (int (*)[DCTSIZE2])
(*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE, (*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE,
cinfo->num_components * DCTSIZE2 * cinfo->num_components * 2 * DCTSIZE2 *
sizeof(int)); sizeof(int));
coef_bit_ptr = &cinfo->coef_bits[0][0]; coef_bit_ptr = &cinfo->coef_bits[0][0];
for (ci = 0; ci < cinfo->num_components; ci++) for (ci = 0; ci < cinfo->num_components; ci++)

View File

@@ -8,7 +8,7 @@
* Copyright (C) 2010, 2015-2016, D. R. Commander. * Copyright (C) 2010, 2015-2016, D. R. Commander.
* Copyright (C) 2014, MIPS Technologies, Inc., California. * Copyright (C) 2014, MIPS Technologies, Inc., California.
* Copyright (C) 2015, Google, Inc. * Copyright (C) 2015, Google, Inc.
* Copyright (C) 2019, Arm Limited. * Copyright (C) 2019-2020, Arm Limited.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
@@ -177,7 +177,7 @@ int_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
outptr = output_data[outrow]; outptr = output_data[outrow];
outend = outptr + cinfo->output_width; outend = outptr + cinfo->output_width;
while (outptr < outend) { while (outptr < outend) {
invalue = *inptr++; /* don't need GETJSAMPLE() here */ invalue = *inptr++;
for (h = h_expand; h > 0; h--) { for (h = h_expand; h > 0; h--) {
*outptr++ = invalue; *outptr++ = invalue;
} }
@@ -213,7 +213,7 @@ h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
outptr = output_data[inrow]; outptr = output_data[inrow];
outend = outptr + cinfo->output_width; outend = outptr + cinfo->output_width;
while (outptr < outend) { while (outptr < outend) {
invalue = *inptr++; /* don't need GETJSAMPLE() here */ invalue = *inptr++;
*outptr++ = invalue; *outptr++ = invalue;
*outptr++ = invalue; *outptr++ = invalue;
} }
@@ -242,7 +242,7 @@ h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
outptr = output_data[outrow]; outptr = output_data[outrow];
outend = outptr + cinfo->output_width; outend = outptr + cinfo->output_width;
while (outptr < outend) { while (outptr < outend) {
invalue = *inptr++; /* don't need GETJSAMPLE() here */ invalue = *inptr++;
*outptr++ = invalue; *outptr++ = invalue;
*outptr++ = invalue; *outptr++ = invalue;
} }
@@ -283,20 +283,20 @@ h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
inptr = input_data[inrow]; inptr = input_data[inrow];
outptr = output_data[inrow]; outptr = output_data[inrow];
/* Special case for first column */ /* Special case for first column */
invalue = GETJSAMPLE(*inptr++); invalue = *inptr++;
*outptr++ = (JSAMPLE)invalue; *outptr++ = (JSAMPLE)invalue;
*outptr++ = (JSAMPLE)((invalue * 3 + GETJSAMPLE(*inptr) + 2) >> 2); *outptr++ = (JSAMPLE)((invalue * 3 + inptr[0] + 2) >> 2);
for (colctr = compptr->downsampled_width - 2; colctr > 0; colctr--) { for (colctr = compptr->downsampled_width - 2; colctr > 0; colctr--) {
/* General case: 3/4 * nearer pixel + 1/4 * further pixel */ /* General case: 3/4 * nearer pixel + 1/4 * further pixel */
invalue = GETJSAMPLE(*inptr++) * 3; invalue = (*inptr++) * 3;
*outptr++ = (JSAMPLE)((invalue + GETJSAMPLE(inptr[-2]) + 1) >> 2); *outptr++ = (JSAMPLE)((invalue + inptr[-2] + 1) >> 2);
*outptr++ = (JSAMPLE)((invalue + GETJSAMPLE(*inptr) + 2) >> 2); *outptr++ = (JSAMPLE)((invalue + inptr[0] + 2) >> 2);
} }
/* Special case for last column */ /* Special case for last column */
invalue = GETJSAMPLE(*inptr); invalue = *inptr;
*outptr++ = (JSAMPLE)((invalue * 3 + GETJSAMPLE(inptr[-1]) + 1) >> 2); *outptr++ = (JSAMPLE)((invalue * 3 + inptr[-1] + 1) >> 2);
*outptr++ = (JSAMPLE)invalue; *outptr++ = (JSAMPLE)invalue;
} }
} }
@@ -338,7 +338,7 @@ h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
outptr = output_data[outrow++]; outptr = output_data[outrow++];
for (colctr = 0; colctr < compptr->downsampled_width; colctr++) { for (colctr = 0; colctr < compptr->downsampled_width; colctr++) {
thiscolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); thiscolsum = (*inptr0++) * 3 + (*inptr1++);
*outptr++ = (JSAMPLE)((thiscolsum + bias) >> 2); *outptr++ = (JSAMPLE)((thiscolsum + bias) >> 2);
} }
} }
@@ -381,8 +381,8 @@ h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
outptr = output_data[outrow++]; outptr = output_data[outrow++];
/* Special case for first column */ /* Special case for first column */
thiscolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); thiscolsum = (*inptr0++) * 3 + (*inptr1++);
nextcolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); nextcolsum = (*inptr0++) * 3 + (*inptr1++);
*outptr++ = (JSAMPLE)((thiscolsum * 4 + 8) >> 4); *outptr++ = (JSAMPLE)((thiscolsum * 4 + 8) >> 4);
*outptr++ = (JSAMPLE)((thiscolsum * 3 + nextcolsum + 7) >> 4); *outptr++ = (JSAMPLE)((thiscolsum * 3 + nextcolsum + 7) >> 4);
lastcolsum = thiscolsum; thiscolsum = nextcolsum; lastcolsum = thiscolsum; thiscolsum = nextcolsum;
@@ -390,7 +390,7 @@ h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
for (colctr = compptr->downsampled_width - 2; colctr > 0; colctr--) { for (colctr = compptr->downsampled_width - 2; colctr > 0; colctr--) {
/* General case: 3/4 * nearer pixel + 1/4 * further pixel in each */ /* General case: 3/4 * nearer pixel + 1/4 * further pixel in each */
/* dimension, thus 9/16, 3/16, 3/16, 1/16 overall */ /* dimension, thus 9/16, 3/16, 3/16, 1/16 overall */
nextcolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++); nextcolsum = (*inptr0++) * 3 + (*inptr1++);
*outptr++ = (JSAMPLE)((thiscolsum * 3 + lastcolsum + 8) >> 4); *outptr++ = (JSAMPLE)((thiscolsum * 3 + lastcolsum + 8) >> 4);
*outptr++ = (JSAMPLE)((thiscolsum * 3 + nextcolsum + 7) >> 4); *outptr++ = (JSAMPLE)((thiscolsum * 3 + nextcolsum + 7) >> 4);
lastcolsum = thiscolsum; thiscolsum = nextcolsum; lastcolsum = thiscolsum; thiscolsum = nextcolsum;
@@ -477,7 +477,13 @@ jinit_upsampler(j_decompress_ptr cinfo)
} else if (h_in_group == h_out_group && } else if (h_in_group == h_out_group &&
v_in_group * 2 == v_out_group && do_fancy) { v_in_group * 2 == v_out_group && do_fancy) {
/* Non-fancy upsampling is handled by the generic method */ /* Non-fancy upsampling is handled by the generic method */
upsample->methods[ci] = h1v2_fancy_upsample; #if defined(__arm__) || defined(__aarch64__) || \
defined(_M_ARM) || defined(_M_ARM64)
if (jsimd_can_h1v2_fancy_upsample())
upsample->methods[ci] = jsimd_h1v2_fancy_upsample;
else
#endif
upsample->methods[ci] = h1v2_fancy_upsample;
upsample->pub.need_context_rows = TRUE; upsample->pub.need_context_rows = TRUE;
} else if (h_in_group * 2 == h_out_group && } else if (h_in_group * 2 == h_out_group &&
v_in_group * 2 == v_out_group) { v_in_group * 2 == v_out_group) {

View File

@@ -211,6 +211,10 @@ JMESSAGE(JERR_BAD_PARAM_VALUE, "Bogus parameter value")
JMESSAGE(JERR_UNSUPPORTED_SUSPEND, "I/O suspension not supported in scan optimization") JMESSAGE(JERR_UNSUPPORTED_SUSPEND, "I/O suspension not supported in scan optimization")
JMESSAGE(JWRN_BOGUS_ICC, "Corrupt JPEG data: bad ICC marker") JMESSAGE(JWRN_BOGUS_ICC, "Corrupt JPEG data: bad ICC marker")
#if JPEG_LIB_VERSION < 70
JMESSAGE(JERR_BAD_DROP_SAMPLING,
"Component index %d: mismatching sampling ratio %d:%d, %d:%d, %c")
#endif
#ifdef JMAKE_ENUM_LIST #ifdef JMAKE_ENUM_LIST
@@ -255,8 +259,17 @@ JMESSAGE(JWRN_BOGUS_ICC, "Corrupt JPEG data: bad ICC marker")
(cinfo)->err->msg_parm.i[1] = (p2), \ (cinfo)->err->msg_parm.i[1] = (p2), \
(cinfo)->err->msg_parm.i[2] = (p3), \ (cinfo)->err->msg_parm.i[2] = (p3), \
(cinfo)->err->msg_parm.i[3] = (p4), \ (cinfo)->err->msg_parm.i[3] = (p4), \
(*(cinfo)->err->error_exit) ((j_common_ptr) (cinfo))) (*(cinfo)->err->error_exit) ((j_common_ptr)(cinfo)))
#define ERREXITS(cinfo,code,str) \ #define ERREXIT6(cinfo, code, p1, p2, p3, p4, p5, p6) \
((cinfo)->err->msg_code = (code), \
(cinfo)->err->msg_parm.i[0] = (p1), \
(cinfo)->err->msg_parm.i[1] = (p2), \
(cinfo)->err->msg_parm.i[2] = (p3), \
(cinfo)->err->msg_parm.i[3] = (p4), \
(cinfo)->err->msg_parm.i[4] = (p5), \
(cinfo)->err->msg_parm.i[5] = (p6), \
(*(cinfo)->err->error_exit) ((j_common_ptr)(cinfo)))
#define ERREXITS(cinfo, code, str) \
((cinfo)->err->msg_code = (code), \ ((cinfo)->err->msg_code = (code), \
strncpy((cinfo)->err->msg_parm.s, (str), JMSG_STR_PARM_MAX), \ strncpy((cinfo)->err->msg_parm.s, (str), JMSG_STR_PARM_MAX), \
(*(cinfo)->err->error_exit) ((j_common_ptr) (cinfo))) (*(cinfo)->err->error_exit) ((j_common_ptr) (cinfo)))
@@ -292,24 +305,24 @@ JMESSAGE(JWRN_BOGUS_ICC, "Corrupt JPEG data: bad ICC marker")
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl))) (*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)))
#define TRACEMS3(cinfo,lvl,code,p1,p2,p3) \ #define TRACEMS3(cinfo,lvl,code,p1,p2,p3) \
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \ MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); \ _mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); \
(cinfo)->err->msg_code = (code); \ (cinfo)->err->msg_code = (code); \
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); ) (*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
#define TRACEMS4(cinfo,lvl,code,p1,p2,p3,p4) \ #define TRACEMS4(cinfo,lvl,code,p1,p2,p3,p4) \
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \ MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \ _mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
(cinfo)->err->msg_code = (code); \ (cinfo)->err->msg_code = (code); \
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); ) (*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
#define TRACEMS5(cinfo,lvl,code,p1,p2,p3,p4,p5) \ #define TRACEMS5(cinfo,lvl,code,p1,p2,p3,p4,p5) \
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \ MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \ _mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
_mp[4] = (p5); \ _mp[4] = (p5); \
(cinfo)->err->msg_code = (code); \ (cinfo)->err->msg_code = (code); \
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); ) (*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
#define TRACEMS8(cinfo,lvl,code,p1,p2,p3,p4,p5,p6,p7,p8) \ #define TRACEMS8(cinfo,lvl,code,p1,p2,p3,p4,p5,p6,p7,p8) \
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \ MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \ _mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
_mp[4] = (p5); _mp[5] = (p6); _mp[6] = (p7); _mp[7] = (p8); \ _mp[4] = (p5); _mp[5] = (p6); _mp[6] = (p7); _mp[7] = (p8); \
(cinfo)->err->msg_code = (code); \ (cinfo)->err->msg_code = (code); \
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); ) (*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
#define TRACEMSS(cinfo,lvl,code,str) \ #define TRACEMSS(cinfo,lvl,code,str) \

View File

@@ -3,7 +3,7 @@
* *
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1998, Thomas G. Lane. * Copyright (C) 1991-1998, Thomas G. Lane.
* Modification developed 2002-2009 by Guido Vollbeding. * Modification developed 2002-2018 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2015, 2020, D. R. Commander. * Copyright (C) 2015, 2020, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
@@ -417,7 +417,7 @@ jpeg_idct_islow(j_decompress_ptr cinfo, jpeg_component_info *compptr,
/* /*
* Perform dequantization and inverse DCT on one block of coefficients, * Perform dequantization and inverse DCT on one block of coefficients,
* producing a 7x7 output block. * producing a reduced-size 7x7 output block.
* *
* Optimized algorithm with 12 multiplications in the 1-D kernel. * Optimized algorithm with 12 multiplications in the 1-D kernel.
* cK represents sqrt(2) * cos(K*pi/14). * cK represents sqrt(2) * cos(K*pi/14).
@@ -1258,7 +1258,7 @@ jpeg_idct_10x10(j_decompress_ptr cinfo, jpeg_component_info *compptr,
/* /*
* Perform dequantization and inverse DCT on one block of coefficients, * Perform dequantization and inverse DCT on one block of coefficients,
* producing a 11x11 output block. * producing an 11x11 output block.
* *
* Optimized algorithm with 24 multiplications in the 1-D kernel. * Optimized algorithm with 24 multiplications in the 1-D kernel.
* cK represents sqrt(2) * cos(K*pi/22). * cK represents sqrt(2) * cos(K*pi/22).
@@ -2398,7 +2398,7 @@ jpeg_idct_16x16(j_decompress_ptr cinfo, jpeg_component_info *compptr,
tmp0 = DEQUANTIZE(inptr[DCTSIZE * 0], quantptr[DCTSIZE * 0]); tmp0 = DEQUANTIZE(inptr[DCTSIZE * 0], quantptr[DCTSIZE * 0]);
tmp0 = LEFT_SHIFT(tmp0, CONST_BITS); tmp0 = LEFT_SHIFT(tmp0, CONST_BITS);
/* Add fudge factor here for final descale. */ /* Add fudge factor here for final descale. */
tmp0 += 1 << (CONST_BITS - PASS1_BITS - 1); tmp0 += ONE << (CONST_BITS - PASS1_BITS - 1);
z1 = DEQUANTIZE(inptr[DCTSIZE * 4], quantptr[DCTSIZE * 4]); z1 = DEQUANTIZE(inptr[DCTSIZE * 4], quantptr[DCTSIZE * 4]);
tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */ tmp1 = MULTIPLY(z1, FIX(1.306562965)); /* c4[16] = c2[8] */

View File

@@ -43,25 +43,11 @@
#if BITS_IN_JSAMPLE == 8 #if BITS_IN_JSAMPLE == 8
/* JSAMPLE should be the smallest type that will hold the values 0..255. /* JSAMPLE should be the smallest type that will hold the values 0..255.
* You can use a signed char by having GETJSAMPLE mask it with 0xFF.
*/ */
#ifdef HAVE_UNSIGNED_CHAR
typedef unsigned char JSAMPLE; typedef unsigned char JSAMPLE;
#define GETJSAMPLE(value) ((int)(value)) #define GETJSAMPLE(value) ((int)(value))
#else /* not HAVE_UNSIGNED_CHAR */
typedef char JSAMPLE;
#ifdef __CHAR_UNSIGNED__
#define GETJSAMPLE(value) ((int)(value))
#else
#define GETJSAMPLE(value) ((int)(value) & 0xFF)
#endif /* __CHAR_UNSIGNED__ */
#endif /* HAVE_UNSIGNED_CHAR */
#define MAXJSAMPLE 255 #define MAXJSAMPLE 255
#define CENTERJSAMPLE 128 #define CENTERJSAMPLE 128
@@ -97,22 +83,9 @@ typedef short JCOEF;
* managers, this is also the data type passed to fread/fwrite. * managers, this is also the data type passed to fread/fwrite.
*/ */
#ifdef HAVE_UNSIGNED_CHAR
typedef unsigned char JOCTET; typedef unsigned char JOCTET;
#define GETJOCTET(value) (value) #define GETJOCTET(value) (value)
#else /* not HAVE_UNSIGNED_CHAR */
typedef char JOCTET;
#ifdef __CHAR_UNSIGNED__
#define GETJOCTET(value) (value)
#else
#define GETJOCTET(value) ((value) & 0xFF)
#endif /* __CHAR_UNSIGNED__ */
#endif /* HAVE_UNSIGNED_CHAR */
/* These typedefs are used for various table entries and so forth. /* These typedefs are used for various table entries and so forth.
* They must be at least as wide as specified; but making them too big * They must be at least as wide as specified; but making them too big
@@ -123,15 +96,7 @@ typedef char JOCTET;
/* UINT8 must hold at least the values 0..255. */ /* UINT8 must hold at least the values 0..255. */
#ifdef HAVE_UNSIGNED_CHAR
typedef unsigned char UINT8; typedef unsigned char UINT8;
#else /* not HAVE_UNSIGNED_CHAR */
#ifdef __CHAR_UNSIGNED__
typedef char UINT8;
#else /* not __CHAR_UNSIGNED__ */
typedef short UINT8;
#endif /* __CHAR_UNSIGNED__ */
#endif /* HAVE_UNSIGNED_CHAR */
/* UINT16 must hold at least the values 0..65535. */ /* UINT16 must hold at least the values 0..65535. */

View File

@@ -5,7 +5,7 @@
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* Modified 1997-2009 by Guido Vollbeding. * Modified 1997-2009 by Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2015-2016, D. R. Commander. * Copyright (C) 2015-2016, 2019, D. R. Commander.
* Copyright (C) 2015, Google, Inc. * Copyright (C) 2015, Google, Inc.
* mozjpeg Modifications: * mozjpeg Modifications:
* Copyright (C) 2014, Mozilla Corporation. * Copyright (C) 2014, Mozilla Corporation.
@@ -220,6 +220,9 @@ struct jpeg_decomp_master {
JDIMENSION first_MCU_col[MAX_COMPONENTS]; JDIMENSION first_MCU_col[MAX_COMPONENTS];
JDIMENSION last_MCU_col[MAX_COMPONENTS]; JDIMENSION last_MCU_col[MAX_COMPONENTS];
boolean jinit_upsampler_no_alloc; boolean jinit_upsampler_no_alloc;
/* Last iMCU row that was successfully decoded */
JDIMENSION last_good_iMCU_row;
}; };
/* Input control module */ /* Input control module */

View File

@@ -1,4 +1,4 @@
.TH JPEGTRAN 1 "18 March 2017" .TH JPEGTRAN 1 "26 October 2020"
.SH NAME .SH NAME
jpegtran \- lossless transformation of JPEG files jpegtran \- lossless transformation of JPEG files
.SH SYNOPSIS .SH SYNOPSIS
@@ -180,6 +180,47 @@ left corner of the selected region must fall on an iMCU boundary. If it
doesn't, then it is silently moved up and/or left to the nearest iMCU boundary doesn't, then it is silently moved up and/or left to the nearest iMCU boundary
(the lower right corner is unchanged.) (the lower right corner is unchanged.)
.PP .PP
If W or H is larger than the width/height of the input image, then the output
image is expanded in size, and the expanded region is filled in with zeros
(neutral gray). Attaching an 'f' character ("flatten") to the width number
will cause each block in the expanded region to be filled in with the DC
coefficient of the nearest block in the input image rather than grayed out.
Attaching an 'r' character ("reflect") to the width number will cause the
expanded region to be filled in with repeated reflections of the input image
rather than grayed out.
.PP
A complementary lossless wipe option is provided to discard (gray out) data
inside a given image region while losslessly preserving what is outside:
.TP
.B \-wipe WxH+X+Y
Wipe (gray out) a rectangular region of width W and height H from the input
image, starting at point X,Y.
.PP
Attaching an 'f' character ("flatten") to the width number will cause the
region to be filled with the average of adjacent blocks rather than grayed out.
If the wipe region and the region outside the wipe region, when adjusted to the
nearest iMCU boundary, form two horizontally adjacent rectangles, then
attaching an 'r' character ("reflect") to the width number will cause the wipe
region to be filled with repeated reflections of the outside region rather than
grayed out.
.PP
A lossless drop option is also provided, which allows another JPEG image to be
inserted ("dropped") into the input image data at a given position, replacing
the existing image data at that position:
.TP
.B \-drop +X+Y filename
Drop (insert) another image at point X,Y
.PP
Both the input image and the drop image must have the same subsampling level.
It is best if they also have the same quantization (quality.) Otherwise, the
quantization of the output image will be adapted to accommodate the higher of
the input image quality and the drop image quality. The trim option can be
used with the drop option to requantize the drop image to match the input
image. Note that a grayscale image can be dropped into a full-color image or
vice versa, as long as the full-color image has no vertical subsampling. If
the input image is grayscale and the drop image is full-color, then the
chrominance channels from the drop image will be discarded.
.PP
Other not-strictly-lossless transformation switches are: Other not-strictly-lossless transformation switches are:
.TP .TP
.B \-grayscale .B \-grayscale
@@ -229,9 +270,31 @@ number. For example,
.B \-max 4m .B \-max 4m
selects 4000000 bytes. If more space is needed, an error will occur. selects 4000000 bytes. If more space is needed, an error will occur.
.TP .TP
.BI \-maxscans " N"
Abort if the input image contains more than
.I N
scans. This feature demonstrates a method by which applications can guard
against denial-of-service attacks instigated by specially-crafted malformed
JPEG images containing numerous scans with missing image data or image data
consisting only of "EOB runs" (a feature of progressive JPEG images that allows
potentially hundreds of thousands of adjoining zero-value pixels to be
represented using only a few bytes.) Attempting to transform such malformed
JPEG images can cause excessive CPU activity, since the decompressor must fully
process each scan (even if the scan is corrupt) before it can proceed to the
next scan.
.TP
.BI \-outfile " name" .BI \-outfile " name"
Send output image to the named file, not to standard output. Send output image to the named file, not to standard output.
.TP .TP
.BI \-report
Report transformation progress.
.TP
.BI \-strict
Treat all warnings as fatal. This feature also demonstrates a method by which
applications can guard against attacks instigated by specially-crafted
malformed JPEG images. Enabling this option will cause the decompressor to
abort if the input image contains incomplete or corrupt image data.
.TP
.B \-verbose .B \-verbose
Enable debug printout. More Enable debug printout. More
.BR \-v 's .BR \-v 's

View File

@@ -2,9 +2,9 @@
* jpegtran.c * jpegtran.c
* *
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1995-2010, Thomas G. Lane, Guido Vollbeding. * Copyright (C) 1995-2019, Thomas G. Lane, Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2010, 2014, 2017, 2020, D. R. Commander. * Copyright (C) 2010, 2014, 2017, 2019-2020, D. R. Commander.
* mozjpeg Modifications: * mozjpeg Modifications:
* Copyright (C) 2014, Mozilla Corporation. * Copyright (C) 2014, Mozilla Corporation.
* For conditions of distribution and use, see the accompanying README file. * For conditions of distribution and use, see the accompanying README file.
@@ -42,7 +42,11 @@
static const char *progname; /* program name for error messages */ static const char *progname; /* program name for error messages */
static char *icc_filename; /* for -icc switch */ static char *icc_filename; /* for -icc switch */
JDIMENSION max_scans; /* for -maxscans switch */
static char *outfilename; /* for -outfile switch */ static char *outfilename; /* for -outfile switch */
static char *dropfilename; /* for -drop switch */
boolean report; /* for -report switch */
boolean strict; /* for -strict switch */
static boolean prefer_smallest; /* use smallest of input or result file (if no image-changing options supplied) */ static boolean prefer_smallest; /* use smallest of input or result file (if no image-changing options supplied) */
static JCOPY_OPTION copyoption; /* -copy switch */ static JCOPY_OPTION copyoption; /* -copy switch */
static jpeg_transform_info transformoption; /* image transformation options */ static jpeg_transform_info transformoption; /* image transformation options */
@@ -76,8 +80,9 @@ usage(void)
fprintf(stderr, "Switches for modifying the image:\n"); fprintf(stderr, "Switches for modifying the image:\n");
#if TRANSFORMS_SUPPORTED #if TRANSFORMS_SUPPORTED
fprintf(stderr, " -crop WxH+X+Y Crop to a rectangular region\n"); fprintf(stderr, " -crop WxH+X+Y Crop to a rectangular region\n");
fprintf(stderr, " -grayscale Reduce to grayscale (omit color data)\n"); fprintf(stderr, " -drop +X+Y filename Drop (insert) another image\n");
fprintf(stderr, " -flip [horizontal|vertical] Mirror image (left-right or top-bottom)\n"); fprintf(stderr, " -flip [horizontal|vertical] Mirror image (left-right or top-bottom)\n");
fprintf(stderr, " -grayscale Reduce to grayscale (omit color data)\n");
fprintf(stderr, " -perfect Fail if there is non-transformable edge blocks\n"); fprintf(stderr, " -perfect Fail if there is non-transformable edge blocks\n");
fprintf(stderr, " -rotate [90|180|270] Rotate image (degrees clockwise)\n"); fprintf(stderr, " -rotate [90|180|270] Rotate image (degrees clockwise)\n");
#endif #endif
@@ -85,6 +90,8 @@ usage(void)
fprintf(stderr, " -transpose Transpose image\n"); fprintf(stderr, " -transpose Transpose image\n");
fprintf(stderr, " -transverse Transverse transpose image\n"); fprintf(stderr, " -transverse Transverse transpose image\n");
fprintf(stderr, " -trim Drop non-transformable edge blocks\n"); fprintf(stderr, " -trim Drop non-transformable edge blocks\n");
fprintf(stderr, " with -drop: Requantize drop file to match source file\n");
fprintf(stderr, " -wipe WxH+X+Y Wipe (gray out) a rectangular region\n");
#endif #endif
fprintf(stderr, "Switches for advanced users:\n"); fprintf(stderr, "Switches for advanced users:\n");
#ifdef C_ARITH_CODING_SUPPORTED #ifdef C_ARITH_CODING_SUPPORTED
@@ -93,7 +100,10 @@ usage(void)
fprintf(stderr, " -icc FILE Embed ICC profile contained in FILE\n"); fprintf(stderr, " -icc FILE Embed ICC profile contained in FILE\n");
fprintf(stderr, " -restart N Set restart interval in rows, or in blocks with B\n"); fprintf(stderr, " -restart N Set restart interval in rows, or in blocks with B\n");
fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n"); fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
fprintf(stderr, " -maxscans N Maximum number of scans to allow in input file\n");
fprintf(stderr, " -outfile name Specify name for output file\n"); fprintf(stderr, " -outfile name Specify name for output file\n");
fprintf(stderr, " -report Report transformation progress\n");
fprintf(stderr, " -strict Treat all warnings as fatal\n");
fprintf(stderr, " -verbose or -debug Emit debug output\n"); fprintf(stderr, " -verbose or -debug Emit debug output\n");
fprintf(stderr, " -version Print version information and exit\n"); fprintf(stderr, " -version Print version information and exit\n");
fprintf(stderr, "Switches for wizards:\n"); fprintf(stderr, "Switches for wizards:\n");
@@ -151,7 +161,10 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
simple_progressive = FALSE; simple_progressive = FALSE;
#endif #endif
icc_filename = NULL; icc_filename = NULL;
max_scans = 0;
outfilename = NULL; outfilename = NULL;
report = FALSE;
strict = FALSE;
copyoption = JCOPYOPT_DEFAULT; copyoption = JCOPYOPT_DEFAULT;
transformoption.transform = JXFORM_NONE; transformoption.transform = JXFORM_NONE;
transformoption.perfect = FALSE; transformoption.perfect = FALSE;
@@ -208,7 +221,8 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
#if TRANSFORMS_SUPPORTED #if TRANSFORMS_SUPPORTED
if (++argn >= argc) /* advance to next argument */ if (++argn >= argc) /* advance to next argument */
usage(); usage();
if (!jtransform_parse_crop_spec(&transformoption, argv[argn])) { if (transformoption.crop /* reject multiple crop/drop/wipe requests */ ||
!jtransform_parse_crop_spec(&transformoption, argv[argn])) {
fprintf(stderr, "%s: bogus -crop argument '%s'\n", fprintf(stderr, "%s: bogus -crop argument '%s'\n",
progname, argv[argn]); progname, argv[argn]);
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
@@ -218,6 +232,26 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
select_transform(JXFORM_NONE); /* force an error */ select_transform(JXFORM_NONE); /* force an error */
#endif #endif
} else if (keymatch(arg, "drop", 2)) {
#if TRANSFORMS_SUPPORTED
if (++argn >= argc) /* advance to next argument */
usage();
if (transformoption.crop /* reject multiple crop/drop/wipe requests */ ||
!jtransform_parse_crop_spec(&transformoption, argv[argn]) ||
transformoption.crop_width_set != JCROP_UNSET ||
transformoption.crop_height_set != JCROP_UNSET) {
fprintf(stderr, "%s: bogus -drop argument '%s'\n",
progname, argv[argn]);
exit(EXIT_FAILURE);
}
if (++argn >= argc) /* advance to next argument */
usage();
dropfilename = argv[argn];
select_transform(JXFORM_DROP);
#else
select_transform(JXFORM_NONE); /* force an error */
#endif
} else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) { } else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) {
/* Enable debug printouts. */ /* Enable debug printouts. */
/* On first -d, print version identification */ /* On first -d, print version identification */
@@ -282,6 +316,12 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
lval *= 1000L; lval *= 1000L;
cinfo->mem->max_memory_to_use = lval * 1000L; cinfo->mem->max_memory_to_use = lval * 1000L;
} else if (keymatch(arg, "maxscans", 4)) {
if (++argn >= argc) /* advance to next argument */
usage();
if (sscanf(argv[argn], "%u", &max_scans) != 1)
usage();
} else if (keymatch(arg, "optimize", 1) || keymatch(arg, "optimise", 1)) { } else if (keymatch(arg, "optimize", 1) || keymatch(arg, "optimise", 1)) {
/* Enable entropy parm optimization. */ /* Enable entropy parm optimization. */
#ifdef ENTROPY_OPT_SUPPORTED #ifdef ENTROPY_OPT_SUPPORTED
@@ -315,6 +355,9 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
#endif #endif
} else if (keymatch(arg, "report", 3)) {
report = TRUE;
} else if (keymatch(arg, "restart", 1)) { } else if (keymatch(arg, "restart", 1)) {
/* Restart interval in MCU rows (or in MCUs with 'b'). */ /* Restart interval in MCU rows (or in MCUs with 'b'). */
long lval; long lval;
@@ -368,6 +411,9 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
#endif #endif
} else if (keymatch(arg, "strict", 2)) {
strict = TRUE;
} else if (keymatch(arg, "transpose", 1)) { } else if (keymatch(arg, "transpose", 1)) {
/* Transpose (across UL-to-LR axis). */ /* Transpose (across UL-to-LR axis). */
select_transform(JXFORM_TRANSPOSE); select_transform(JXFORM_TRANSPOSE);
@@ -383,6 +429,21 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
transformoption.trim = TRUE; transformoption.trim = TRUE;
prefer_smallest = FALSE; prefer_smallest = FALSE;
} else if (keymatch(arg, "wipe", 1)) {
#if TRANSFORMS_SUPPORTED
if (++argn >= argc) /* advance to next argument */
usage();
if (transformoption.crop /* reject multiple crop/drop/wipe requests */ ||
!jtransform_parse_crop_spec(&transformoption, argv[argn])) {
fprintf(stderr, "%s: bogus -wipe argument '%s'\n",
progname, argv[argn]);
exit(EXIT_FAILURE);
}
select_transform(JXFORM_WIPE);
#else
select_transform(JXFORM_NONE); /* force an error */
#endif
} else { } else {
usage(); /* bogus switch */ usage(); /* bogus switch */
} }
@@ -409,6 +470,19 @@ parse_switches(j_compress_ptr cinfo, int argc, char **argv,
} }
METHODDEF(void)
my_emit_message(j_common_ptr cinfo, int msg_level)
{
if (msg_level < 0) {
/* Treat warning as fatal */
cinfo->err->error_exit(cinfo);
} else {
if (cinfo->err->trace_level >= msg_level)
cinfo->err->output_message(cinfo);
}
}
/* /*
* The main program. * The main program.
*/ */
@@ -417,11 +491,14 @@ int
main(int argc, char **argv) main(int argc, char **argv)
{ {
struct jpeg_decompress_struct srcinfo; struct jpeg_decompress_struct srcinfo;
#if TRANSFORMS_SUPPORTED
struct jpeg_decompress_struct dropinfo;
struct jpeg_error_mgr jdroperr;
FILE *drop_file;
#endif
struct jpeg_compress_struct dstinfo; struct jpeg_compress_struct dstinfo;
struct jpeg_error_mgr jsrcerr, jdsterr; struct jpeg_error_mgr jsrcerr, jdsterr;
#ifdef PROGRESS_REPORT struct cdjpeg_progress_mgr src_progress, dst_progress;
struct cdjpeg_progress_mgr progress;
#endif
jvirt_barray_ptr *src_coef_arrays; jvirt_barray_ptr *src_coef_arrays;
jvirt_barray_ptr *dst_coef_arrays; jvirt_barray_ptr *dst_coef_arrays;
int file_index; int file_index;
@@ -458,13 +535,16 @@ main(int argc, char **argv)
* values read here are mostly ignored; we will rescan the switches after * values read here are mostly ignored; we will rescan the switches after
* opening the input file. Also note that most of the switches affect the * opening the input file. Also note that most of the switches affect the
* destination JPEG object, so we parse into that and then copy over what * destination JPEG object, so we parse into that and then copy over what
* needs to affects the source too. * needs to affect the source too.
*/ */
file_index = parse_switches(&dstinfo, argc, argv, 0, FALSE); file_index = parse_switches(&dstinfo, argc, argv, 0, FALSE);
jsrcerr.trace_level = jdsterr.trace_level; jsrcerr.trace_level = jdsterr.trace_level;
srcinfo.mem->max_memory_to_use = dstinfo.mem->max_memory_to_use; srcinfo.mem->max_memory_to_use = dstinfo.mem->max_memory_to_use;
if (strict)
jsrcerr.emit_message = my_emit_message;
#ifdef TWO_FILE_COMMANDLINE #ifdef TWO_FILE_COMMANDLINE
/* Must have either -outfile switch or explicit output file name */ /* Must have either -outfile switch or explicit output file name */
if (outfilename == NULL) { if (outfilename == NULL) {
@@ -530,8 +610,29 @@ main(int argc, char **argv)
copyoption = JCOPYOPT_ALL_EXCEPT_ICC; copyoption = JCOPYOPT_ALL_EXCEPT_ICC;
} }
#ifdef PROGRESS_REPORT if (report) {
start_progress_monitor((j_common_ptr)&dstinfo, &progress); start_progress_monitor((j_common_ptr)&dstinfo, &dst_progress);
dst_progress.report = report;
}
if (report || max_scans != 0) {
start_progress_monitor((j_common_ptr)&srcinfo, &src_progress);
src_progress.report = report;
src_progress.max_scans = max_scans;
}
#if TRANSFORMS_SUPPORTED
/* Open the drop file. */
if (dropfilename != NULL) {
if ((drop_file = fopen(dropfilename, READ_BINARY)) == NULL) {
fprintf(stderr, "%s: can't open %s for reading\n", progname,
dropfilename);
exit(EXIT_FAILURE);
}
dropinfo.err = jpeg_std_error(&jdroperr);
jpeg_create_decompress(&dropinfo);
jpeg_stdio_src(&dropinfo, drop_file);
} else {
drop_file = NULL;
}
#endif #endif
/* Specify data source for decompression */ /* Specify data source for decompression */
@@ -569,6 +670,17 @@ main(int argc, char **argv)
/* Read file header */ /* Read file header */
(void)jpeg_read_header(&srcinfo, TRUE); (void)jpeg_read_header(&srcinfo, TRUE);
#if TRANSFORMS_SUPPORTED
if (dropfilename != NULL) {
(void)jpeg_read_header(&dropinfo, TRUE);
transformoption.crop_width = dropinfo.image_width;
transformoption.crop_width_set = JCROP_POS;
transformoption.crop_height = dropinfo.image_height;
transformoption.crop_height_set = JCROP_POS;
transformoption.drop_ptr = &dropinfo;
}
#endif
/* Any space needed by a transform option must be requested before /* Any space needed by a transform option must be requested before
* jpeg_read_coefficients so that memory allocation will be done right. * jpeg_read_coefficients so that memory allocation will be done right.
*/ */
@@ -584,6 +696,12 @@ main(int argc, char **argv)
/* Read source file as DCT coefficients */ /* Read source file as DCT coefficients */
src_coef_arrays = jpeg_read_coefficients(&srcinfo); src_coef_arrays = jpeg_read_coefficients(&srcinfo);
#if TRANSFORMS_SUPPORTED
if (dropfilename != NULL) {
transformoption.drop_coef_arrays = jpeg_read_coefficients(&dropinfo);
}
#endif
/* Initialize destination compression parameters from source values */ /* Initialize destination compression parameters from source values */
jpeg_copy_critical_parameters(&srcinfo, &dstinfo); jpeg_copy_critical_parameters(&srcinfo, &dstinfo);
@@ -672,27 +790,40 @@ main(int argc, char **argv)
else else
fprintf(stderr, "%s: can't write to stdout\n", progname); fprintf(stderr, "%s: can't write to stdout\n", progname);
} }
jpeg_destroy_compress(&dstinfo);
#if TRANSFORMS_SUPPORTED
if (dropfilename != NULL) {
(void)jpeg_finish_decompress(&dropinfo);
jpeg_destroy_decompress(&dropinfo);
} }
#endif #endif
jpeg_destroy_compress(&dstinfo);
(void)jpeg_finish_decompress(&srcinfo); (void)jpeg_finish_decompress(&srcinfo);
jpeg_destroy_decompress(&srcinfo); jpeg_destroy_decompress(&srcinfo);
/* Close output file, if we opened it */ /* Close output file, if we opened it */
if (fp != stdout) if (fp != stdout)
fclose(fp); fclose(fp);
#if TRANSFORMS_SUPPORTED
#ifdef PROGRESS_REPORT if (drop_file != NULL)
end_progress_monitor((j_common_ptr)&dstinfo); fclose(drop_file);
#endif #endif
if (report)
end_progress_monitor((j_common_ptr)&dstinfo);
if (report || max_scans != 0)
end_progress_monitor((j_common_ptr)&srcinfo);
free(inbuffer); free(inbuffer);
free(outbuffer); free(outbuffer);
free(icc_profile); free(icc_profile);
/* All done. */ /* All done. */
#if TRANSFORMS_SUPPORTED
if (dropfilename != NULL)
exit(jsrcerr.num_warnings + jdroperr.num_warnings +
jdsterr.num_warnings ? EXIT_WARNING : EXIT_SUCCESS);
#endif
exit(jsrcerr.num_warnings + jdsterr.num_warnings ? exit(jsrcerr.num_warnings + jdsterr.num_warnings ?
EXIT_WARNING : EXIT_SUCCESS); EXIT_WARNING : EXIT_SUCCESS);
return 0; /* suppress no-return-value warnings */ return 0; /* suppress no-return-value warnings */

View File

@@ -479,7 +479,7 @@ color_quantize(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
for (col = width; col > 0; col--) { for (col = width; col > 0; col--) {
pixcode = 0; pixcode = 0;
for (ci = 0; ci < nc; ci++) { for (ci = 0; ci < nc; ci++) {
pixcode += GETJSAMPLE(colorindex[ci][GETJSAMPLE(*ptrin++)]); pixcode += colorindex[ci][*ptrin++];
} }
*ptrout++ = (JSAMPLE)pixcode; *ptrout++ = (JSAMPLE)pixcode;
} }
@@ -506,9 +506,9 @@ color_quantize3(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
ptrin = input_buf[row]; ptrin = input_buf[row];
ptrout = output_buf[row]; ptrout = output_buf[row];
for (col = width; col > 0; col--) { for (col = width; col > 0; col--) {
pixcode = GETJSAMPLE(colorindex0[GETJSAMPLE(*ptrin++)]); pixcode = colorindex0[*ptrin++];
pixcode += GETJSAMPLE(colorindex1[GETJSAMPLE(*ptrin++)]); pixcode += colorindex1[*ptrin++];
pixcode += GETJSAMPLE(colorindex2[GETJSAMPLE(*ptrin++)]); pixcode += colorindex2[*ptrin++];
*ptrout++ = (JSAMPLE)pixcode; *ptrout++ = (JSAMPLE)pixcode;
} }
} }
@@ -552,7 +552,7 @@ quantize_ord_dither(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
* required amount of padding. * required amount of padding.
*/ */
*output_ptr += *output_ptr +=
colorindex_ci[GETJSAMPLE(*input_ptr) + dither[col_index]]; colorindex_ci[*input_ptr + dither[col_index]];
input_ptr += nc; input_ptr += nc;
output_ptr++; output_ptr++;
col_index = (col_index + 1) & ODITHER_MASK; col_index = (col_index + 1) & ODITHER_MASK;
@@ -595,12 +595,9 @@ quantize3_ord_dither(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
col_index = 0; col_index = 0;
for (col = width; col > 0; col--) { for (col = width; col > 0; col--) {
pixcode = pixcode = colorindex0[(*input_ptr++) + dither0[col_index]];
GETJSAMPLE(colorindex0[GETJSAMPLE(*input_ptr++) + dither0[col_index]]); pixcode += colorindex1[(*input_ptr++) + dither1[col_index]];
pixcode += pixcode += colorindex2[(*input_ptr++) + dither2[col_index]];
GETJSAMPLE(colorindex1[GETJSAMPLE(*input_ptr++) + dither1[col_index]]);
pixcode +=
GETJSAMPLE(colorindex2[GETJSAMPLE(*input_ptr++) + dither2[col_index]]);
*output_ptr++ = (JSAMPLE)pixcode; *output_ptr++ = (JSAMPLE)pixcode;
col_index = (col_index + 1) & ODITHER_MASK; col_index = (col_index + 1) & ODITHER_MASK;
} }
@@ -677,15 +674,15 @@ quantize_fs_dither(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
* The maximum error is +- MAXJSAMPLE; this sets the required size * The maximum error is +- MAXJSAMPLE; this sets the required size
* of the range_limit array. * of the range_limit array.
*/ */
cur += GETJSAMPLE(*input_ptr); cur += *input_ptr;
cur = GETJSAMPLE(range_limit[cur]); cur = range_limit[cur];
/* Select output value, accumulate into output code for this pixel */ /* Select output value, accumulate into output code for this pixel */
pixcode = GETJSAMPLE(colorindex_ci[cur]); pixcode = colorindex_ci[cur];
*output_ptr += (JSAMPLE)pixcode; *output_ptr += (JSAMPLE)pixcode;
/* Compute actual representation error at this pixel */ /* Compute actual representation error at this pixel */
/* Note: we can do this even though we don't have the final */ /* Note: we can do this even though we don't have the final */
/* pixel code, because the colormap is orthogonal. */ /* pixel code, because the colormap is orthogonal. */
cur -= GETJSAMPLE(colormap_ci[pixcode]); cur -= colormap_ci[pixcode];
/* Compute error fractions to be propagated to adjacent pixels. /* Compute error fractions to be propagated to adjacent pixels.
* Add these into the running sums, and simultaneously shift the * Add these into the running sums, and simultaneously shift the
* next-line error sums left by 1 column. * next-line error sums left by 1 column.

View File

@@ -215,9 +215,9 @@ prescan_quantize(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
ptr = input_buf[row]; ptr = input_buf[row];
for (col = width; col > 0; col--) { for (col = width; col > 0; col--) {
/* get pixel value and index into the histogram */ /* get pixel value and index into the histogram */
histp = &histogram[GETJSAMPLE(ptr[0]) >> C0_SHIFT] histp = &histogram[ptr[0] >> C0_SHIFT]
[GETJSAMPLE(ptr[1]) >> C1_SHIFT] [ptr[1] >> C1_SHIFT]
[GETJSAMPLE(ptr[2]) >> C2_SHIFT]; [ptr[2] >> C2_SHIFT];
/* increment, check for overflow and undo increment if so. */ /* increment, check for overflow and undo increment if so. */
if (++(*histp) <= 0) if (++(*histp) <= 0)
(*histp)--; (*histp)--;
@@ -665,7 +665,7 @@ find_nearby_colors(j_decompress_ptr cinfo, int minc0, int minc1, int minc2,
for (i = 0; i < numcolors; i++) { for (i = 0; i < numcolors; i++) {
/* We compute the squared-c0-distance term, then add in the other two. */ /* We compute the squared-c0-distance term, then add in the other two. */
x = GETJSAMPLE(cinfo->colormap[0][i]); x = cinfo->colormap[0][i];
if (x < minc0) { if (x < minc0) {
tdist = (x - minc0) * C0_SCALE; tdist = (x - minc0) * C0_SCALE;
min_dist = tdist * tdist; min_dist = tdist * tdist;
@@ -688,7 +688,7 @@ find_nearby_colors(j_decompress_ptr cinfo, int minc0, int minc1, int minc2,
} }
} }
x = GETJSAMPLE(cinfo->colormap[1][i]); x = cinfo->colormap[1][i];
if (x < minc1) { if (x < minc1) {
tdist = (x - minc1) * C1_SCALE; tdist = (x - minc1) * C1_SCALE;
min_dist += tdist * tdist; min_dist += tdist * tdist;
@@ -710,7 +710,7 @@ find_nearby_colors(j_decompress_ptr cinfo, int minc0, int minc1, int minc2,
} }
} }
x = GETJSAMPLE(cinfo->colormap[2][i]); x = cinfo->colormap[2][i];
if (x < minc2) { if (x < minc2) {
tdist = (x - minc2) * C2_SCALE; tdist = (x - minc2) * C2_SCALE;
min_dist += tdist * tdist; min_dist += tdist * tdist;
@@ -788,13 +788,13 @@ find_best_colors(j_decompress_ptr cinfo, int minc0, int minc1, int minc2,
#define STEP_C2 ((1 << C2_SHIFT) * C2_SCALE) #define STEP_C2 ((1 << C2_SHIFT) * C2_SCALE)
for (i = 0; i < numcolors; i++) { for (i = 0; i < numcolors; i++) {
icolor = GETJSAMPLE(colorlist[i]); icolor = colorlist[i];
/* Compute (square of) distance from minc0/c1/c2 to this color */ /* Compute (square of) distance from minc0/c1/c2 to this color */
inc0 = (minc0 - GETJSAMPLE(cinfo->colormap[0][icolor])) * C0_SCALE; inc0 = (minc0 - cinfo->colormap[0][icolor]) * C0_SCALE;
dist0 = inc0 * inc0; dist0 = inc0 * inc0;
inc1 = (minc1 - GETJSAMPLE(cinfo->colormap[1][icolor])) * C1_SCALE; inc1 = (minc1 - cinfo->colormap[1][icolor]) * C1_SCALE;
dist0 += inc1 * inc1; dist0 += inc1 * inc1;
inc2 = (minc2 - GETJSAMPLE(cinfo->colormap[2][icolor])) * C2_SCALE; inc2 = (minc2 - cinfo->colormap[2][icolor]) * C2_SCALE;
dist0 += inc2 * inc2; dist0 += inc2 * inc2;
/* Form the initial difference increments */ /* Form the initial difference increments */
inc0 = inc0 * (2 * STEP_C0) + STEP_C0 * STEP_C0; inc0 = inc0 * (2 * STEP_C0) + STEP_C0 * STEP_C0;
@@ -879,7 +879,7 @@ fill_inverse_cmap(j_decompress_ptr cinfo, int c0, int c1, int c2)
for (ic1 = 0; ic1 < BOX_C1_ELEMS; ic1++) { for (ic1 = 0; ic1 < BOX_C1_ELEMS; ic1++) {
cachep = &histogram[c0 + ic0][c1 + ic1][c2]; cachep = &histogram[c0 + ic0][c1 + ic1][c2];
for (ic2 = 0; ic2 < BOX_C2_ELEMS; ic2++) { for (ic2 = 0; ic2 < BOX_C2_ELEMS; ic2++) {
*cachep++ = (histcell)(GETJSAMPLE(*cptr++) + 1); *cachep++ = (histcell)((*cptr++) + 1);
} }
} }
} }
@@ -909,9 +909,9 @@ pass2_no_dither(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
outptr = output_buf[row]; outptr = output_buf[row];
for (col = width; col > 0; col--) { for (col = width; col > 0; col--) {
/* get pixel value and index into the cache */ /* get pixel value and index into the cache */
c0 = GETJSAMPLE(*inptr++) >> C0_SHIFT; c0 = (*inptr++) >> C0_SHIFT;
c1 = GETJSAMPLE(*inptr++) >> C1_SHIFT; c1 = (*inptr++) >> C1_SHIFT;
c2 = GETJSAMPLE(*inptr++) >> C2_SHIFT; c2 = (*inptr++) >> C2_SHIFT;
cachep = &histogram[c0][c1][c2]; cachep = &histogram[c0][c1][c2];
/* If we have not seen this color before, find nearest colormap entry */ /* If we have not seen this color before, find nearest colormap entry */
/* and update the cache */ /* and update the cache */
@@ -996,12 +996,12 @@ pass2_fs_dither(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
* The maximum error is +- MAXJSAMPLE (or less with error limiting); * The maximum error is +- MAXJSAMPLE (or less with error limiting);
* this sets the required size of the range_limit array. * this sets the required size of the range_limit array.
*/ */
cur0 += GETJSAMPLE(inptr[0]); cur0 += inptr[0];
cur1 += GETJSAMPLE(inptr[1]); cur1 += inptr[1];
cur2 += GETJSAMPLE(inptr[2]); cur2 += inptr[2];
cur0 = GETJSAMPLE(range_limit[cur0]); cur0 = range_limit[cur0];
cur1 = GETJSAMPLE(range_limit[cur1]); cur1 = range_limit[cur1];
cur2 = GETJSAMPLE(range_limit[cur2]); cur2 = range_limit[cur2];
/* Index into the cache with adjusted pixel value */ /* Index into the cache with adjusted pixel value */
cachep = cachep =
&histogram[cur0 >> C0_SHIFT][cur1 >> C1_SHIFT][cur2 >> C2_SHIFT]; &histogram[cur0 >> C0_SHIFT][cur1 >> C1_SHIFT][cur2 >> C2_SHIFT];
@@ -1015,9 +1015,9 @@ pass2_fs_dither(j_decompress_ptr cinfo, JSAMPARRAY input_buf,
register int pixcode = *cachep - 1; register int pixcode = *cachep - 1;
*outptr = (JSAMPLE)pixcode; *outptr = (JSAMPLE)pixcode;
/* Compute representation error for this pixel */ /* Compute representation error for this pixel */
cur0 -= GETJSAMPLE(colormap0[pixcode]); cur0 -= colormap0[pixcode];
cur1 -= GETJSAMPLE(colormap1[pixcode]); cur1 -= colormap1[pixcode];
cur2 -= GETJSAMPLE(colormap2[pixcode]); cur2 -= colormap2[pixcode];
} }
/* Compute error fractions to be propagated to adjacent pixels. /* Compute error fractions to be propagated to adjacent pixels.
* Add these into the running sums, and simultaneously shift the * Add these into the running sums, and simultaneously shift the

View File

@@ -4,6 +4,7 @@
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2011, 2014, D. R. Commander. * Copyright (C) 2011, 2014, D. R. Commander.
* Copyright (C) 2015-2016, 2018, Matthieu Darbois. * Copyright (C) 2015-2016, 2018, Matthieu Darbois.
* Copyright (C) 2020, Arm Limited.
* *
* Based on the x86 SIMD extension for IJG JPEG library, * Based on the x86 SIMD extension for IJG JPEG library,
* Copyright (C) 1999-2006, MIYASAKA Masaru. * Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -75,6 +76,7 @@ EXTERN(void) jsimd_int_upsample(j_decompress_ptr cinfo,
EXTERN(int) jsimd_can_h2v2_fancy_upsample(void); EXTERN(int) jsimd_can_h2v2_fancy_upsample(void);
EXTERN(int) jsimd_can_h2v1_fancy_upsample(void); EXTERN(int) jsimd_can_h2v1_fancy_upsample(void);
EXTERN(int) jsimd_can_h1v2_fancy_upsample(void);
EXTERN(void) jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, EXTERN(void) jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo,
jpeg_component_info *compptr, jpeg_component_info *compptr,
@@ -84,6 +86,10 @@ EXTERN(void) jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo,
jpeg_component_info *compptr, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr); JSAMPARRAY *output_data_ptr);
EXTERN(void) jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo,
jpeg_component_info *compptr,
JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr);
EXTERN(int) jsimd_can_h2v2_merged_upsample(void); EXTERN(int) jsimd_can_h2v2_merged_upsample(void);
EXTERN(int) jsimd_can_h2v1_merged_upsample(void); EXTERN(int) jsimd_can_h2v1_merged_upsample(void);

View File

@@ -4,6 +4,7 @@
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2009-2011, 2014, D. R. Commander. * Copyright (C) 2009-2011, 2014, D. R. Commander.
* Copyright (C) 2015-2016, 2018, Matthieu Darbois. * Copyright (C) 2015-2016, 2018, Matthieu Darbois.
* Copyright (C) 2020, Arm Limited.
* *
* Based on the x86 SIMD extension for IJG JPEG library, * Based on the x86 SIMD extension for IJG JPEG library,
* Copyright (C) 1999-2006, MIYASAKA Masaru. * Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -169,6 +170,12 @@ jsimd_can_h2v1_fancy_upsample(void)
return 0; return 0;
} }
GLOBAL(int)
jsimd_can_h1v2_fancy_upsample(void)
{
return 0;
}
GLOBAL(void) GLOBAL(void)
jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
@@ -181,6 +188,12 @@ jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
{ {
} }
GLOBAL(void)
jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{
}
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_merged_upsample(void) jsimd_can_h2v2_merged_upsample(void)
{ {

View File

@@ -2,9 +2,9 @@
* jversion.h * jversion.h
* *
* This file was part of the Independent JPEG Group's software: * This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-2012, Thomas G. Lane, Guido Vollbeding. * Copyright (C) 1991-2020, Thomas G. Lane, Guido Vollbeding.
* libjpeg-turbo Modifications: * libjpeg-turbo Modifications:
* Copyright (C) 2010, 2012-2020, D. R. Commander. * Copyright (C) 2010, 2012-2021, D. R. Commander.
* mozjpeg Modifications: * mozjpeg Modifications:
* Copyright (C) 2014, Mozilla Corporation. * Copyright (C) 2014, Mozilla Corporation.
* For conditions of distribution and use, see the accompanying README file. * For conditions of distribution and use, see the accompanying README file.
@@ -38,9 +38,9 @@
*/ */
#define JCOPYRIGHT \ #define JCOPYRIGHT \
"Copyright (C) 2009-2020 D. R. Commander\n" \ "Copyright (C) 2009-2021 D. R. Commander\n" \
"Copyright (C) 2015, 2020 Google, Inc.\n" \ "Copyright (C) 2015, 2020 Google, Inc.\n" \
"Copyright (C) 2019 Arm Limited\n" \ "Copyright (C) 2019-2020 Arm Limited\n" \
"Copyright (C) 2015-2016, 2018 Matthieu Darbois\n" \ "Copyright (C) 2015-2016, 2018 Matthieu Darbois\n" \
"Copyright (C) 2011-2016 Siarhei Siamashka\n" \ "Copyright (C) 2011-2016 Siarhei Siamashka\n" \
"Copyright (C) 2015 Intel Corporation\n" \ "Copyright (C) 2015 Intel Corporation\n" \
@@ -49,7 +49,7 @@
"Copyright (C) 2009, 2012 Pierre Ossman for Cendio AB\n" \ "Copyright (C) 2009, 2012 Pierre Ossman for Cendio AB\n" \
"Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies)\n" \ "Copyright (C) 2009-2011 Nokia Corporation and/or its subsidiary(-ies)\n" \
"Copyright (C) 1999-2006 MIYASAKA Masaru\n" \ "Copyright (C) 1999-2006 MIYASAKA Masaru\n" \
"Copyright (C) 1991-2017 Thomas G. Lane, Guido Vollbeding" "Copyright (C) 1991-2020 Thomas G. Lane, Guido Vollbeding"
#define JCOPYRIGHT_SHORT \ #define JCOPYRIGHT_SHORT \
"Copyright (C) 1991-2020 The libjpeg-turbo Project and many others" "Copyright (C) 1991-2021 The libjpeg-turbo Project and many others"

40
rdbmp.c
View File

@@ -12,7 +12,7 @@
* *
* This file contains routines to read input images in Microsoft "BMP" * This file contains routines to read input images in Microsoft "BMP"
* format (MS Windows 3.x, OS/2 1.x, and OS/2 2.x flavors). * format (MS Windows 3.x, OS/2 1.x, and OS/2 2.x flavors).
* Currently, only 8-bit and 24-bit images are supported, not 1-bit or * Currently, only 8-, 24-, and 32-bit images are supported, not 1-bit or
* 4-bit (feeding such low-depth images into JPEG would be silly anyway). * 4-bit (feeding such low-depth images into JPEG would be silly anyway).
* Also, we don't support RLE-compressed files. * Also, we don't support RLE-compressed files.
* *
@@ -34,18 +34,8 @@
/* Macros to deal with unsigned chars as efficiently as compiler allows */ /* Macros to deal with unsigned chars as efficiently as compiler allows */
#ifdef HAVE_UNSIGNED_CHAR
typedef unsigned char U_CHAR; typedef unsigned char U_CHAR;
#define UCH(x) ((int)(x)) #define UCH(x) ((int)(x))
#else /* !HAVE_UNSIGNED_CHAR */
#ifdef __CHAR_UNSIGNED__
typedef char U_CHAR;
#define UCH(x) ((int)(x))
#else
typedef char U_CHAR;
#define UCH(x) ((int)(x) & 0xFF)
#endif
#endif /* HAVE_UNSIGNED_CHAR */
#define ReadOK(file, buffer, len) \ #define ReadOK(file, buffer, len) \
@@ -71,7 +61,7 @@ typedef struct _bmp_source_struct {
JDIMENSION source_row; /* Current source row number */ JDIMENSION source_row; /* Current source row number */
JDIMENSION row_width; /* Physical width of scanlines in file */ JDIMENSION row_width; /* Physical width of scanlines in file */
int bits_per_pixel; /* remembers 8- or 24-bit format */ int bits_per_pixel; /* remembers 8-, 24-, or 32-bit format */
int cmap_length; /* colormap length */ int cmap_length; /* colormap length */
boolean use_inversion_array; /* TRUE = preload the whole image, which is boolean use_inversion_array; /* TRUE = preload the whole image, which is
@@ -179,14 +169,14 @@ get_8bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
outptr = source->pub.buffer[0]; outptr = source->pub.buffer[0];
if (cinfo->in_color_space == JCS_GRAYSCALE) { if (cinfo->in_color_space == JCS_GRAYSCALE) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
t = GETJSAMPLE(*inptr++); t = *inptr++;
if (t >= cmaplen) if (t >= cmaplen)
ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); ERREXIT(cinfo, JERR_BMP_OUTOFRANGE);
*outptr++ = colormap[0][t]; *outptr++ = colormap[0][t];
} }
} else if (cinfo->in_color_space == JCS_CMYK) { } else if (cinfo->in_color_space == JCS_CMYK) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
t = GETJSAMPLE(*inptr++); t = *inptr++;
if (t >= cmaplen) if (t >= cmaplen)
ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); ERREXIT(cinfo, JERR_BMP_OUTOFRANGE);
rgb_to_cmyk(colormap[0][t], colormap[1][t], colormap[2][t], outptr, rgb_to_cmyk(colormap[0][t], colormap[1][t], colormap[2][t], outptr,
@@ -202,7 +192,7 @@ get_8bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
if (aindex >= 0) { if (aindex >= 0) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
t = GETJSAMPLE(*inptr++); t = *inptr++;
if (t >= cmaplen) if (t >= cmaplen)
ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); ERREXIT(cinfo, JERR_BMP_OUTOFRANGE);
outptr[rindex] = colormap[0][t]; outptr[rindex] = colormap[0][t];
@@ -213,7 +203,7 @@ get_8bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
} }
} else { } else {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
t = GETJSAMPLE(*inptr++); t = *inptr++;
if (t >= cmaplen) if (t >= cmaplen)
ERREXIT(cinfo, JERR_BMP_OUTOFRANGE); ERREXIT(cinfo, JERR_BMP_OUTOFRANGE);
outptr[rindex] = colormap[0][t]; outptr[rindex] = colormap[0][t];
@@ -258,7 +248,6 @@ get_24bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
MEMCOPY(outptr, inptr, source->row_width); MEMCOPY(outptr, inptr, source->row_width);
} else if (cinfo->in_color_space == JCS_CMYK) { } else if (cinfo->in_color_space == JCS_CMYK) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
/* can omit GETJSAMPLE() safely */
JSAMPLE b = *inptr++, g = *inptr++, r = *inptr++; JSAMPLE b = *inptr++, g = *inptr++, r = *inptr++;
rgb_to_cmyk(r, g, b, outptr, outptr + 1, outptr + 2, outptr + 3); rgb_to_cmyk(r, g, b, outptr, outptr + 1, outptr + 2, outptr + 3);
outptr += 4; outptr += 4;
@@ -272,7 +261,7 @@ get_24bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
if (aindex >= 0) { if (aindex >= 0) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ outptr[bindex] = *inptr++;
outptr[gindex] = *inptr++; outptr[gindex] = *inptr++;
outptr[rindex] = *inptr++; outptr[rindex] = *inptr++;
outptr[aindex] = 0xFF; outptr[aindex] = 0xFF;
@@ -280,7 +269,7 @@ get_24bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
} }
} else { } else {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ outptr[bindex] = *inptr++;
outptr[gindex] = *inptr++; outptr[gindex] = *inptr++;
outptr[rindex] = *inptr++; outptr[rindex] = *inptr++;
outptr += ps; outptr += ps;
@@ -323,7 +312,6 @@ get_32bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
MEMCOPY(outptr, inptr, source->row_width); MEMCOPY(outptr, inptr, source->row_width);
} else if (cinfo->in_color_space == JCS_CMYK) { } else if (cinfo->in_color_space == JCS_CMYK) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
/* can omit GETJSAMPLE() safely */
JSAMPLE b = *inptr++, g = *inptr++, r = *inptr++; JSAMPLE b = *inptr++, g = *inptr++, r = *inptr++;
rgb_to_cmyk(r, g, b, outptr, outptr + 1, outptr + 2, outptr + 3); rgb_to_cmyk(r, g, b, outptr, outptr + 1, outptr + 2, outptr + 3);
inptr++; /* skip the 4th byte (Alpha channel) */ inptr++; /* skip the 4th byte (Alpha channel) */
@@ -338,7 +326,7 @@ get_32bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
if (aindex >= 0) { if (aindex >= 0) {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ outptr[bindex] = *inptr++;
outptr[gindex] = *inptr++; outptr[gindex] = *inptr++;
outptr[rindex] = *inptr++; outptr[rindex] = *inptr++;
outptr[aindex] = *inptr++; outptr[aindex] = *inptr++;
@@ -346,7 +334,7 @@ get_32bit_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
} }
} else { } else {
for (col = cinfo->image_width; col > 0; col--) { for (col = cinfo->image_width; col > 0; col--) {
outptr[bindex] = *inptr++; /* can omit GETJSAMPLE() safely */ outptr[bindex] = *inptr++;
outptr[gindex] = *inptr++; outptr[gindex] = *inptr++;
outptr[rindex] = *inptr++; outptr[rindex] = *inptr++;
inptr++; /* skip the 4th byte (Alpha channel) */ inptr++; /* skip the 4th byte (Alpha channel) */
@@ -481,7 +469,9 @@ start_input_bmp(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
TRACEMS2(cinfo, 1, JTRC_BMP_OS2_MAPPED, biWidth, biHeight); TRACEMS2(cinfo, 1, JTRC_BMP_OS2_MAPPED, biWidth, biHeight);
break; break;
case 24: /* RGB image */ case 24: /* RGB image */
TRACEMS2(cinfo, 1, JTRC_BMP_OS2, biWidth, biHeight); case 32: /* RGB image + Alpha channel */
TRACEMS3(cinfo, 1, JTRC_BMP_OS2, biWidth, biHeight,
source->bits_per_pixel);
break; break;
default: default:
ERREXIT(cinfo, JERR_BMP_BADDEPTH); ERREXIT(cinfo, JERR_BMP_BADDEPTH);
@@ -508,10 +498,8 @@ start_input_bmp(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
TRACEMS2(cinfo, 1, JTRC_BMP_MAPPED, biWidth, biHeight); TRACEMS2(cinfo, 1, JTRC_BMP_MAPPED, biWidth, biHeight);
break; break;
case 24: /* RGB image */ case 24: /* RGB image */
TRACEMS2(cinfo, 1, JTRC_BMP, biWidth, biHeight);
break;
case 32: /* RGB image + Alpha channel */ case 32: /* RGB image + Alpha channel */
TRACEMS2(cinfo, 1, JTRC_BMP, biWidth, biHeight); TRACEMS3(cinfo, 1, JTRC_BMP, biWidth, biHeight, source->bits_per_pixel);
break; break;
default: default:
ERREXIT(cinfo, JERR_BMP_BADDEPTH); ERREXIT(cinfo, JERR_BMP_BADDEPTH);

View File

@@ -54,9 +54,8 @@ add_map_entry(j_decompress_ptr cinfo, int R, int G, int B)
/* Check for duplicate color. */ /* Check for duplicate color. */
for (index = 0; index < ncolors; index++) { for (index = 0; index < ncolors; index++) {
if (GETJSAMPLE(colormap0[index]) == R && if (colormap0[index] == R && colormap1[index] == G &&
GETJSAMPLE(colormap1[index]) == G && colormap2[index] == B)
GETJSAMPLE(colormap2[index]) == B)
return; /* color is already in map */ return; /* color is already in map */
} }

671
rdgif.c
View File

@@ -1,29 +1,663 @@
/* /*
* rdgif.c * rdgif.c
* *
* This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1997, Thomas G. Lane. * Copyright (C) 1991-1997, Thomas G. Lane.
* This file is part of the Independent JPEG Group's software. * Modified 2019 by Guido Vollbeding.
* libjpeg-turbo Modifications:
* Copyright (C) 2021, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg * For conditions of distribution and use, see the accompanying README.ijg
* file. * file.
* *
* This file contains routines to read input images in GIF format. * This file contains routines to read input images in GIF format.
* *
***************************************************************************** * These routines may need modification for non-Unix environments or
* NOTE: to avoid entanglements with Unisys' patent on LZW compression, * * specialized applications. As they stand, they assume input from
* the ability to read GIF files has been removed from the IJG distribution. * * an ordinary stdio stream. They further assume that reading begins
* Sorry about that. * * at the start of the file; start_input may need work if the
***************************************************************************** * user interface has already read some data (e.g., to determine that
* * the file is indeed GIF format).
* We are required to state that */
* "The Graphics Interchange Format(c) is the Copyright property of
* CompuServe Incorporated. GIF(sm) is a Service Mark property of /*
* CompuServe Incorporated." * This code is loosely based on giftoppm from the PBMPLUS distribution
* of Feb. 1991. That file contains the following copyright notice:
* +-------------------------------------------------------------------+
* | Copyright 1990, David Koblas. |
* | Permission to use, copy, modify, and distribute this software |
* | and its documentation for any purpose and without fee is hereby |
* | granted, provided that the above copyright notice appear in all |
* | copies and that both that copyright notice and this permission |
* | notice appear in supporting documentation. This software is |
* | provided "as is" without express or implied warranty. |
* +-------------------------------------------------------------------+
*/ */
#include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */ #include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */
#ifdef GIF_SUPPORTED #ifdef GIF_SUPPORTED
/* Macros to deal with unsigned chars as efficiently as compiler allows */
typedef unsigned char U_CHAR;
#define UCH(x) ((int)(x))
#define ReadOK(file, buffer, len) \
(JFREAD(file, buffer, len) == ((size_t)(len)))
#define MAXCOLORMAPSIZE 256 /* max # of colors in a GIF colormap */
#define NUMCOLORS 3 /* # of colors */
#define CM_RED 0 /* color component numbers */
#define CM_GREEN 1
#define CM_BLUE 2
#define MAX_LZW_BITS 12 /* maximum LZW code size */
#define LZW_TABLE_SIZE (1 << MAX_LZW_BITS) /* # of possible LZW symbols */
/* Macros for extracting header data --- note we assume chars may be signed */
#define LM_to_uint(array, offset) \
((unsigned int)UCH(array[offset]) + \
(((unsigned int)UCH(array[offset + 1])) << 8))
#define BitSet(byte, bit) ((byte) & (bit))
#define INTERLACE 0x40 /* mask for bit signifying interlaced image */
#define COLORMAPFLAG 0x80 /* mask for bit signifying colormap presence */
/*
* LZW decompression tables look like this:
* symbol_head[K] = prefix symbol of any LZW symbol K (0..LZW_TABLE_SIZE-1)
* symbol_tail[K] = suffix byte of any LZW symbol K (0..LZW_TABLE_SIZE-1)
* Note that entries 0..end_code of the above tables are not used,
* since those symbols represent raw bytes or special codes.
*
* The stack represents the not-yet-used expansion of the last LZW symbol.
* In the worst case, a symbol could expand to as many bytes as there are
* LZW symbols, so we allocate LZW_TABLE_SIZE bytes for the stack.
* (This is conservative since that number includes the raw-byte symbols.)
*/
/* Private version of data source object */
typedef struct {
struct cjpeg_source_struct pub; /* public fields */
j_compress_ptr cinfo; /* back link saves passing separate parm */
JSAMPARRAY colormap; /* GIF colormap (converted to my format) */
/* State for GetCode and LZWReadByte */
U_CHAR code_buf[256 + 4]; /* current input data block */
int last_byte; /* # of bytes in code_buf */
int last_bit; /* # of bits in code_buf */
int cur_bit; /* next bit index to read */
boolean first_time; /* flags first call to GetCode */
boolean out_of_blocks; /* TRUE if hit terminator data block */
int input_code_size; /* codesize given in GIF file */
int clear_code, end_code; /* values for Clear and End codes */
int code_size; /* current actual code size */
int limit_code; /* 2^code_size */
int max_code; /* first unused code value */
/* Private state for LZWReadByte */
int oldcode; /* previous LZW symbol */
int firstcode; /* first byte of oldcode's expansion */
/* LZW symbol table and expansion stack */
UINT16 *symbol_head; /* => table of prefix symbols */
UINT8 *symbol_tail; /* => table of suffix bytes */
UINT8 *symbol_stack; /* => stack for symbol expansions */
UINT8 *sp; /* stack pointer */
/* State for interlaced image processing */
boolean is_interlaced; /* TRUE if have interlaced image */
jvirt_sarray_ptr interlaced_image; /* full image in interlaced order */
JDIMENSION cur_row_number; /* need to know actual row number */
JDIMENSION pass2_offset; /* # of pixel rows in pass 1 */
JDIMENSION pass3_offset; /* # of pixel rows in passes 1&2 */
JDIMENSION pass4_offset; /* # of pixel rows in passes 1,2,3 */
} gif_source_struct;
typedef gif_source_struct *gif_source_ptr;
/* Forward declarations */
METHODDEF(JDIMENSION) get_pixel_rows(j_compress_ptr cinfo,
cjpeg_source_ptr sinfo);
METHODDEF(JDIMENSION) load_interlaced_image(j_compress_ptr cinfo,
cjpeg_source_ptr sinfo);
METHODDEF(JDIMENSION) get_interlaced_row(j_compress_ptr cinfo,
cjpeg_source_ptr sinfo);
LOCAL(int)
ReadByte(gif_source_ptr sinfo)
/* Read next byte from GIF file */
{
register FILE *infile = sinfo->pub.input_file;
register int c;
if ((c = getc(infile)) == EOF)
ERREXIT(sinfo->cinfo, JERR_INPUT_EOF);
return c;
}
LOCAL(int)
GetDataBlock(gif_source_ptr sinfo, U_CHAR *buf)
/* Read a GIF data block, which has a leading count byte */
/* A zero-length block marks the end of a data block sequence */
{
int count;
count = ReadByte(sinfo);
if (count > 0) {
if (!ReadOK(sinfo->pub.input_file, buf, count))
ERREXIT(sinfo->cinfo, JERR_INPUT_EOF);
}
return count;
}
LOCAL(void)
SkipDataBlocks(gif_source_ptr sinfo)
/* Skip a series of data blocks, until a block terminator is found */
{
U_CHAR buf[256];
while (GetDataBlock(sinfo, buf) > 0)
/* skip */;
}
LOCAL(void)
ReInitLZW(gif_source_ptr sinfo)
/* (Re)initialize LZW state; shared code for startup and Clear processing */
{
sinfo->code_size = sinfo->input_code_size + 1;
sinfo->limit_code = sinfo->clear_code << 1; /* 2^code_size */
sinfo->max_code = sinfo->clear_code + 2; /* first unused code value */
sinfo->sp = sinfo->symbol_stack; /* init stack to empty */
}
LOCAL(void)
InitLZWCode(gif_source_ptr sinfo)
/* Initialize for a series of LZWReadByte (and hence GetCode) calls */
{
/* GetCode initialization */
sinfo->last_byte = 2; /* make safe to "recopy last two bytes" */
sinfo->code_buf[0] = 0;
sinfo->code_buf[1] = 0;
sinfo->last_bit = 0; /* nothing in the buffer */
sinfo->cur_bit = 0; /* force buffer load on first call */
sinfo->first_time = TRUE;
sinfo->out_of_blocks = FALSE;
/* LZWReadByte initialization: */
/* compute special code values (note that these do not change later) */
sinfo->clear_code = 1 << sinfo->input_code_size;
sinfo->end_code = sinfo->clear_code + 1;
ReInitLZW(sinfo);
}
LOCAL(int)
GetCode(gif_source_ptr sinfo)
/* Fetch the next code_size bits from the GIF data */
/* We assume code_size is less than 16 */
{
register int accum;
int offs, count;
while (sinfo->cur_bit + sinfo->code_size > sinfo->last_bit) {
/* Time to reload the buffer */
/* First time, share code with Clear case */
if (sinfo->first_time) {
sinfo->first_time = FALSE;
return sinfo->clear_code;
}
if (sinfo->out_of_blocks) {
WARNMS(sinfo->cinfo, JWRN_GIF_NOMOREDATA);
return sinfo->end_code; /* fake something useful */
}
/* preserve last two bytes of what we have -- assume code_size <= 16 */
sinfo->code_buf[0] = sinfo->code_buf[sinfo->last_byte-2];
sinfo->code_buf[1] = sinfo->code_buf[sinfo->last_byte-1];
/* Load more bytes; set flag if we reach the terminator block */
if ((count = GetDataBlock(sinfo, &sinfo->code_buf[2])) == 0) {
sinfo->out_of_blocks = TRUE;
WARNMS(sinfo->cinfo, JWRN_GIF_NOMOREDATA);
return sinfo->end_code; /* fake something useful */
}
/* Reset counters */
sinfo->cur_bit = (sinfo->cur_bit - sinfo->last_bit) + 16;
sinfo->last_byte = 2 + count;
sinfo->last_bit = sinfo->last_byte * 8;
}
/* Form up next 24 bits in accum */
offs = sinfo->cur_bit >> 3; /* byte containing cur_bit */
accum = UCH(sinfo->code_buf[offs + 2]);
accum <<= 8;
accum |= UCH(sinfo->code_buf[offs + 1]);
accum <<= 8;
accum |= UCH(sinfo->code_buf[offs]);
/* Right-align cur_bit in accum, then mask off desired number of bits */
accum >>= (sinfo->cur_bit & 7);
sinfo->cur_bit += sinfo->code_size;
return accum & ((1 << sinfo->code_size) - 1);
}
LOCAL(int)
LZWReadByte(gif_source_ptr sinfo)
/* Read an LZW-compressed byte */
{
register int code; /* current working code */
int incode; /* saves actual input code */
/* If any codes are stacked from a previously read symbol, return them */
if (sinfo->sp > sinfo->symbol_stack)
return (int)(*(--sinfo->sp));
/* Time to read a new symbol */
code = GetCode(sinfo);
if (code == sinfo->clear_code) {
/* Reinit state, swallow any extra Clear codes, and */
/* return next code, which is expected to be a raw byte. */
ReInitLZW(sinfo);
do {
code = GetCode(sinfo);
} while (code == sinfo->clear_code);
if (code > sinfo->clear_code) { /* make sure it is a raw byte */
WARNMS(sinfo->cinfo, JWRN_GIF_BADDATA);
code = 0; /* use something valid */
}
/* make firstcode, oldcode valid! */
sinfo->firstcode = sinfo->oldcode = code;
return code;
}
if (code == sinfo->end_code) {
/* Skip the rest of the image, unless GetCode already read terminator */
if (!sinfo->out_of_blocks) {
SkipDataBlocks(sinfo);
sinfo->out_of_blocks = TRUE;
}
/* Complain that there's not enough data */
WARNMS(sinfo->cinfo, JWRN_GIF_ENDCODE);
/* Pad data with 0's */
return 0; /* fake something usable */
}
/* Got normal raw byte or LZW symbol */
incode = code; /* save for a moment */
if (code >= sinfo->max_code) { /* special case for not-yet-defined symbol */
/* code == max_code is OK; anything bigger is bad data */
if (code > sinfo->max_code) {
WARNMS(sinfo->cinfo, JWRN_GIF_BADDATA);
incode = 0; /* prevent creation of loops in symbol table */
}
/* this symbol will be defined as oldcode/firstcode */
*(sinfo->sp++) = (UINT8)sinfo->firstcode;
code = sinfo->oldcode;
}
/* If it's a symbol, expand it into the stack */
while (code >= sinfo->clear_code) {
*(sinfo->sp++) = sinfo->symbol_tail[code]; /* tail is a byte value */
code = sinfo->symbol_head[code]; /* head is another LZW symbol */
}
/* At this point code just represents a raw byte */
sinfo->firstcode = code; /* save for possible future use */
/* If there's room in table... */
if ((code = sinfo->max_code) < LZW_TABLE_SIZE) {
/* Define a new symbol = prev sym + head of this sym's expansion */
sinfo->symbol_head[code] = (UINT16)sinfo->oldcode;
sinfo->symbol_tail[code] = (UINT8)sinfo->firstcode;
sinfo->max_code++;
/* Is it time to increase code_size? */
if (sinfo->max_code >= sinfo->limit_code &&
sinfo->code_size < MAX_LZW_BITS) {
sinfo->code_size++;
sinfo->limit_code <<= 1; /* keep equal to 2^code_size */
}
}
sinfo->oldcode = incode; /* save last input symbol for future use */
return sinfo->firstcode; /* return first byte of symbol's expansion */
}
LOCAL(void)
ReadColorMap(gif_source_ptr sinfo, int cmaplen, JSAMPARRAY cmap)
/* Read a GIF colormap */
{
int i;
for (i = 0; i < cmaplen; i++) {
#if BITS_IN_JSAMPLE == 8
#define UPSCALE(x) (x)
#else
#define UPSCALE(x) ((x) << (BITS_IN_JSAMPLE - 8))
#endif
cmap[CM_RED][i] = (JSAMPLE)UPSCALE(ReadByte(sinfo));
cmap[CM_GREEN][i] = (JSAMPLE)UPSCALE(ReadByte(sinfo));
cmap[CM_BLUE][i] = (JSAMPLE)UPSCALE(ReadByte(sinfo));
}
}
LOCAL(void)
DoExtension(gif_source_ptr sinfo)
/* Process an extension block */
/* Currently we ignore 'em all */
{
int extlabel;
/* Read extension label byte */
extlabel = ReadByte(sinfo);
TRACEMS1(sinfo->cinfo, 1, JTRC_GIF_EXTENSION, extlabel);
/* Skip the data block(s) associated with the extension */
SkipDataBlocks(sinfo);
}
/*
* Read the file header; return image size and component count.
*/
METHODDEF(void)
start_input_gif(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
gif_source_ptr source = (gif_source_ptr)sinfo;
U_CHAR hdrbuf[10]; /* workspace for reading control blocks */
unsigned int width, height; /* image dimensions */
int colormaplen, aspectRatio;
int c;
/* Read and verify GIF Header */
if (!ReadOK(source->pub.input_file, hdrbuf, 6))
ERREXIT(cinfo, JERR_GIF_NOT);
if (hdrbuf[0] != 'G' || hdrbuf[1] != 'I' || hdrbuf[2] != 'F')
ERREXIT(cinfo, JERR_GIF_NOT);
/* Check for expected version numbers.
* If unknown version, give warning and try to process anyway;
* this is per recommendation in GIF89a standard.
*/
if ((hdrbuf[3] != '8' || hdrbuf[4] != '7' || hdrbuf[5] != 'a') &&
(hdrbuf[3] != '8' || hdrbuf[4] != '9' || hdrbuf[5] != 'a'))
TRACEMS3(cinfo, 1, JTRC_GIF_BADVERSION, hdrbuf[3], hdrbuf[4], hdrbuf[5]);
/* Read and decipher Logical Screen Descriptor */
if (!ReadOK(source->pub.input_file, hdrbuf, 7))
ERREXIT(cinfo, JERR_INPUT_EOF);
width = LM_to_uint(hdrbuf, 0);
height = LM_to_uint(hdrbuf, 2);
if (width == 0 || height == 0)
ERREXIT(cinfo, JERR_GIF_EMPTY);
/* we ignore the color resolution, sort flag, and background color index */
aspectRatio = UCH(hdrbuf[6]);
if (aspectRatio != 0 && aspectRatio != 49)
TRACEMS(cinfo, 1, JTRC_GIF_NONSQUARE);
/* Allocate space to store the colormap */
source->colormap = (*cinfo->mem->alloc_sarray)
((j_common_ptr)cinfo, JPOOL_IMAGE, (JDIMENSION)MAXCOLORMAPSIZE,
(JDIMENSION)NUMCOLORS);
colormaplen = 0; /* indicate initialization */
/* Read global colormap if header indicates it is present */
if (BitSet(hdrbuf[4], COLORMAPFLAG)) {
colormaplen = 2 << (hdrbuf[4] & 0x07);
ReadColorMap(source, colormaplen, source->colormap);
}
/* Scan until we reach start of desired image.
* We don't currently support skipping images, but could add it easily.
*/
for (;;) {
c = ReadByte(source);
if (c == ';') /* GIF terminator?? */
ERREXIT(cinfo, JERR_GIF_IMAGENOTFOUND);
if (c == '!') { /* Extension */
DoExtension(source);
continue;
}
if (c != ',') { /* Not an image separator? */
WARNMS1(cinfo, JWRN_GIF_CHAR, c);
continue;
}
/* Read and decipher Local Image Descriptor */
if (!ReadOK(source->pub.input_file, hdrbuf, 9))
ERREXIT(cinfo, JERR_INPUT_EOF);
/* we ignore top/left position info, also sort flag */
width = LM_to_uint(hdrbuf, 4);
height = LM_to_uint(hdrbuf, 6);
if (width == 0 || height == 0)
ERREXIT(cinfo, JERR_GIF_EMPTY);
source->is_interlaced = (BitSet(hdrbuf[8], INTERLACE) != 0);
/* Read local colormap if header indicates it is present */
/* Note: if we wanted to support skipping images, */
/* we'd need to skip rather than read colormap for ignored images */
if (BitSet(hdrbuf[8], COLORMAPFLAG)) {
colormaplen = 2 << (hdrbuf[8] & 0x07);
ReadColorMap(source, colormaplen, source->colormap);
}
source->input_code_size = ReadByte(source); /* get min-code-size byte */
if (source->input_code_size < 2 || source->input_code_size > 8)
ERREXIT1(cinfo, JERR_GIF_CODESIZE, source->input_code_size);
/* Reached desired image, so break out of loop */
/* If we wanted to skip this image, */
/* we'd call SkipDataBlocks and then continue the loop */
break;
}
/* Prepare to read selected image: first initialize LZW decompressor */
source->symbol_head = (UINT16 *)
(*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE,
LZW_TABLE_SIZE * sizeof(UINT16));
source->symbol_tail = (UINT8 *)
(*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE,
LZW_TABLE_SIZE * sizeof(UINT8));
source->symbol_stack = (UINT8 *)
(*cinfo->mem->alloc_large) ((j_common_ptr)cinfo, JPOOL_IMAGE,
LZW_TABLE_SIZE * sizeof(UINT8));
InitLZWCode(source);
/*
* If image is interlaced, we read it into a full-size sample array,
* decompressing as we go; then get_interlaced_row selects rows from the
* sample array in the proper order.
*/
if (source->is_interlaced) {
/* We request the virtual array now, but can't access it until virtual
* arrays have been allocated. Hence, the actual work of reading the
* image is postponed until the first call to get_pixel_rows.
*/
source->interlaced_image = (*cinfo->mem->request_virt_sarray)
((j_common_ptr)cinfo, JPOOL_IMAGE, FALSE,
(JDIMENSION)width, (JDIMENSION)height, (JDIMENSION)1);
if (cinfo->progress != NULL) {
cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress;
progress->total_extra_passes++; /* count file input as separate pass */
}
source->pub.get_pixel_rows = load_interlaced_image;
} else {
source->pub.get_pixel_rows = get_pixel_rows;
}
/* Create compressor input buffer. */
source->pub.buffer = (*cinfo->mem->alloc_sarray)
((j_common_ptr)cinfo, JPOOL_IMAGE, (JDIMENSION)width * NUMCOLORS,
(JDIMENSION)1);
source->pub.buffer_height = 1;
/* Pad colormap for safety. */
for (c = colormaplen; c < source->clear_code; c++) {
source->colormap[CM_RED][c] =
source->colormap[CM_GREEN][c] =
source->colormap[CM_BLUE][c] = CENTERJSAMPLE;
}
/* Return info about the image. */
cinfo->in_color_space = JCS_RGB;
cinfo->input_components = NUMCOLORS;
cinfo->data_precision = BITS_IN_JSAMPLE; /* we always rescale data to this */
cinfo->image_width = width;
cinfo->image_height = height;
TRACEMS3(cinfo, 1, JTRC_GIF, width, height, colormaplen);
}
/*
* Read one row of pixels.
* This version is used for noninterlaced GIF images:
* we read directly from the GIF file.
*/
METHODDEF(JDIMENSION)
get_pixel_rows(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
gif_source_ptr source = (gif_source_ptr)sinfo;
register int c;
register JSAMPROW ptr;
register JDIMENSION col;
register JSAMPARRAY colormap = source->colormap;
ptr = source->pub.buffer[0];
for (col = cinfo->image_width; col > 0; col--) {
c = LZWReadByte(source);
*ptr++ = colormap[CM_RED][c];
*ptr++ = colormap[CM_GREEN][c];
*ptr++ = colormap[CM_BLUE][c];
}
return 1;
}
/*
* Read one row of pixels.
* This version is used for the first call on get_pixel_rows when
* reading an interlaced GIF file: we read the whole image into memory.
*/
METHODDEF(JDIMENSION)
load_interlaced_image(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
gif_source_ptr source = (gif_source_ptr)sinfo;
register JSAMPROW sptr;
register JDIMENSION col;
JDIMENSION row;
cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress;
/* Read the interlaced image into the virtual array we've created. */
for (row = 0; row < cinfo->image_height; row++) {
if (progress != NULL) {
progress->pub.pass_counter = (long)row;
progress->pub.pass_limit = (long)cinfo->image_height;
(*progress->pub.progress_monitor) ((j_common_ptr)cinfo);
}
sptr = *(*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->interlaced_image, row, (JDIMENSION)1,
TRUE);
for (col = cinfo->image_width; col > 0; col--) {
*sptr++ = (JSAMPLE)LZWReadByte(source);
}
}
if (progress != NULL)
progress->completed_extra_passes++;
/* Replace method pointer so subsequent calls don't come here. */
source->pub.get_pixel_rows = get_interlaced_row;
/* Initialize for get_interlaced_row, and perform first call on it. */
source->cur_row_number = 0;
source->pass2_offset = (cinfo->image_height + 7) / 8;
source->pass3_offset = source->pass2_offset + (cinfo->image_height + 3) / 8;
source->pass4_offset = source->pass3_offset + (cinfo->image_height + 1) / 4;
return get_interlaced_row(cinfo, sinfo);
}
/*
* Read one row of pixels.
* This version is used for interlaced GIF images:
* we read from the virtual array.
*/
METHODDEF(JDIMENSION)
get_interlaced_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
gif_source_ptr source = (gif_source_ptr)sinfo;
register int c;
register JSAMPROW sptr, ptr;
register JDIMENSION col;
register JSAMPARRAY colormap = source->colormap;
JDIMENSION irow;
/* Figure out which row of interlaced image is needed, and access it. */
switch ((int)(source->cur_row_number & 7)) {
case 0: /* first-pass row */
irow = source->cur_row_number >> 3;
break;
case 4: /* second-pass row */
irow = (source->cur_row_number >> 3) + source->pass2_offset;
break;
case 2: /* third-pass row */
case 6:
irow = (source->cur_row_number >> 2) + source->pass3_offset;
break;
default: /* fourth-pass row */
irow = (source->cur_row_number >> 1) + source->pass4_offset;
}
sptr = *(*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->interlaced_image, irow, (JDIMENSION)1,
FALSE);
/* Scan the row, expand colormap, and output */
ptr = source->pub.buffer[0];
for (col = cinfo->image_width; col > 0; col--) {
c = *sptr++;
*ptr++ = colormap[CM_RED][c];
*ptr++ = colormap[CM_GREEN][c];
*ptr++ = colormap[CM_BLUE][c];
}
source->cur_row_number++; /* for next time */
return 1;
}
/*
* Finish up at the end of the file.
*/
METHODDEF(void)
finish_input_gif(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
/* no work */
}
/* /*
* The module selection routine for GIF format input. * The module selection routine for GIF format input.
*/ */
@@ -31,9 +665,18 @@
GLOBAL(cjpeg_source_ptr) GLOBAL(cjpeg_source_ptr)
jinit_read_gif(j_compress_ptr cinfo) jinit_read_gif(j_compress_ptr cinfo)
{ {
fprintf(stderr, "GIF input is unsupported for legal reasons. Sorry.\n"); gif_source_ptr source;
exit(EXIT_FAILURE);
return NULL; /* keep compiler happy */ /* Create module interface object */
source = (gif_source_ptr)
(*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE,
sizeof(gif_source_struct));
source->cinfo = cinfo; /* make back link for subroutines */
/* Fill in method ptrs, except get_pixel_rows which start_input sets */
source->pub.start_input = start_input_gif;
source->pub.finish_input = finish_input_gif;
return (cjpeg_source_ptr)source;
} }
#endif /* GIF_SUPPORTED */ #endif /* GIF_SUPPORTED */

10
rdppm.c
View File

@@ -43,18 +43,8 @@
/* Macros to deal with unsigned chars as efficiently as compiler allows */ /* Macros to deal with unsigned chars as efficiently as compiler allows */
#ifdef HAVE_UNSIGNED_CHAR
typedef unsigned char U_CHAR; typedef unsigned char U_CHAR;
#define UCH(x) ((int)(x)) #define UCH(x) ((int)(x))
#else /* !HAVE_UNSIGNED_CHAR */
#ifdef __CHAR_UNSIGNED__
typedef char U_CHAR;
#define UCH(x) ((int)(x))
#else
typedef char U_CHAR;
#define UCH(x) ((int)(x) & 0xFF)
#endif
#endif /* HAVE_UNSIGNED_CHAR */
#define ReadOK(file, buffer, len) \ #define ReadOK(file, buffer, len) \

389
rdrle.c
View File

@@ -1,389 +0,0 @@
/*
* rdrle.c
*
* This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1996, Thomas G. Lane.
* It was modified by The libjpeg-turbo Project to include only code and
* information relevant to libjpeg-turbo.
* For conditions of distribution and use, see the accompanying README.ijg
* file.
*
* This file contains routines to read input images in Utah RLE format.
* The Utah Raster Toolkit library is required (version 3.1 or later).
*
* These routines may need modification for non-Unix environments or
* specialized applications. As they stand, they assume input from
* an ordinary stdio stream. They further assume that reading begins
* at the start of the file; start_input may need work if the
* user interface has already read some data (e.g., to determine that
* the file is indeed RLE format).
*
* Based on code contributed by Mike Lijewski,
* with updates from Robert Hutchinson.
*/
#include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */
#ifdef RLE_SUPPORTED
/* rle.h is provided by the Utah Raster Toolkit. */
#include <rle.h>
/*
* We assume that JSAMPLE has the same representation as rle_pixel,
* to wit, "unsigned char". Hence we can't cope with 12- or 16-bit samples.
*/
#if BITS_IN_JSAMPLE != 8
Sorry, this code only copes with 8-bit JSAMPLEs. /* deliberate syntax err */
#endif
/*
* We support the following types of RLE files:
*
* GRAYSCALE - 8 bits, no colormap
* MAPPEDGRAY - 8 bits, 1 channel colomap
* PSEUDOCOLOR - 8 bits, 3 channel colormap
* TRUECOLOR - 24 bits, 3 channel colormap
* DIRECTCOLOR - 24 bits, no colormap
*
* For now, we ignore any alpha channel in the image.
*/
typedef enum
{ GRAYSCALE, MAPPEDGRAY, PSEUDOCOLOR, TRUECOLOR, DIRECTCOLOR } rle_kind;
/*
* Since RLE stores scanlines bottom-to-top, we have to invert the image
* to conform to JPEG's top-to-bottom order. To do this, we read the
* incoming image into a virtual array on the first get_pixel_rows call,
* then fetch the required row from the virtual array on subsequent calls.
*/
typedef struct _rle_source_struct *rle_source_ptr;
typedef struct _rle_source_struct {
struct cjpeg_source_struct pub; /* public fields */
rle_kind visual; /* actual type of input file */
jvirt_sarray_ptr image; /* virtual array to hold the image */
JDIMENSION row; /* current row # in the virtual array */
rle_hdr header; /* Input file information */
rle_pixel **rle_row; /* holds a row returned by rle_getrow() */
} rle_source_struct;
/*
* Read the file header; return image size and component count.
*/
METHODDEF(void)
start_input_rle(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
rle_source_ptr source = (rle_source_ptr)sinfo;
JDIMENSION width, height;
#ifdef PROGRESS_REPORT
cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress;
#endif
/* Use RLE library routine to get the header info */
source->header = *rle_hdr_init(NULL);
source->header.rle_file = source->pub.input_file;
switch (rle_get_setup(&(source->header))) {
case RLE_SUCCESS:
/* A-OK */
break;
case RLE_NOT_RLE:
ERREXIT(cinfo, JERR_RLE_NOT);
break;
case RLE_NO_SPACE:
ERREXIT(cinfo, JERR_RLE_MEM);
break;
case RLE_EMPTY:
ERREXIT(cinfo, JERR_RLE_EMPTY);
break;
case RLE_EOF:
ERREXIT(cinfo, JERR_RLE_EOF);
break;
default:
ERREXIT(cinfo, JERR_RLE_BADERROR);
break;
}
/* Figure out what we have, set private vars and return values accordingly */
width = source->header.xmax - source->header.xmin + 1;
height = source->header.ymax - source->header.ymin + 1;
source->header.xmin = 0; /* realign horizontally */
source->header.xmax = width - 1;
cinfo->image_width = width;
cinfo->image_height = height;
cinfo->data_precision = 8; /* we can only handle 8 bit data */
if (source->header.ncolors == 1 && source->header.ncmap == 0) {
source->visual = GRAYSCALE;
TRACEMS2(cinfo, 1, JTRC_RLE_GRAY, width, height);
} else if (source->header.ncolors == 1 && source->header.ncmap == 1) {
source->visual = MAPPEDGRAY;
TRACEMS3(cinfo, 1, JTRC_RLE_MAPGRAY, width, height,
1 << source->header.cmaplen);
} else if (source->header.ncolors == 1 && source->header.ncmap == 3) {
source->visual = PSEUDOCOLOR;
TRACEMS3(cinfo, 1, JTRC_RLE_MAPPED, width, height,
1 << source->header.cmaplen);
} else if (source->header.ncolors == 3 && source->header.ncmap == 3) {
source->visual = TRUECOLOR;
TRACEMS3(cinfo, 1, JTRC_RLE_FULLMAP, width, height,
1 << source->header.cmaplen);
} else if (source->header.ncolors == 3 && source->header.ncmap == 0) {
source->visual = DIRECTCOLOR;
TRACEMS2(cinfo, 1, JTRC_RLE, width, height);
} else
ERREXIT(cinfo, JERR_RLE_UNSUPPORTED);
if (source->visual == GRAYSCALE || source->visual == MAPPEDGRAY) {
cinfo->in_color_space = JCS_GRAYSCALE;
cinfo->input_components = 1;
} else {
cinfo->in_color_space = JCS_RGB;
cinfo->input_components = 3;
}
/*
* A place to hold each scanline while it's converted.
* (GRAYSCALE scanlines don't need converting)
*/
if (source->visual != GRAYSCALE) {
source->rle_row = (rle_pixel **)(*cinfo->mem->alloc_sarray)
((j_common_ptr)cinfo, JPOOL_IMAGE,
(JDIMENSION)width, (JDIMENSION)cinfo->input_components);
}
/* request a virtual array to hold the image */
source->image = (*cinfo->mem->request_virt_sarray)
((j_common_ptr)cinfo, JPOOL_IMAGE, FALSE,
(JDIMENSION)(width * source->header.ncolors),
(JDIMENSION)height, (JDIMENSION)1);
#ifdef PROGRESS_REPORT
if (progress != NULL) {
/* count file input as separate pass */
progress->total_extra_passes++;
}
#endif
source->pub.buffer_height = 1;
}
/*
* Read one row of pixels.
* Called only after load_image has read the image into the virtual array.
* Used for GRAYSCALE, MAPPEDGRAY, TRUECOLOR, and DIRECTCOLOR images.
*/
METHODDEF(JDIMENSION)
get_rle_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
rle_source_ptr source = (rle_source_ptr)sinfo;
source->row--;
source->pub.buffer = (*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->image, source->row, (JDIMENSION)1, FALSE);
return 1;
}
/*
* Read one row of pixels.
* Called only after load_image has read the image into the virtual array.
* Used for PSEUDOCOLOR images.
*/
METHODDEF(JDIMENSION)
get_pseudocolor_row(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
rle_source_ptr source = (rle_source_ptr)sinfo;
JSAMPROW src_row, dest_row;
JDIMENSION col;
rle_map *colormap;
int val;
colormap = source->header.cmap;
dest_row = source->pub.buffer[0];
source->row--;
src_row = *(*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->image, source->row, (JDIMENSION)1, FALSE);
for (col = cinfo->image_width; col > 0; col--) {
val = GETJSAMPLE(*src_row++);
*dest_row++ = (JSAMPLE)(colormap[val ] >> 8);
*dest_row++ = (JSAMPLE)(colormap[val + 256] >> 8);
*dest_row++ = (JSAMPLE)(colormap[val + 512] >> 8);
}
return 1;
}
/*
* Load the image into a virtual array. We have to do this because RLE
* files start at the lower left while the JPEG standard has them starting
* in the upper left. This is called the first time we want to get a row
* of input. What we do is load the RLE data into the array and then call
* the appropriate routine to read one row from the array. Before returning,
* we set source->pub.get_pixel_rows so that subsequent calls go straight to
* the appropriate row-reading routine.
*/
METHODDEF(JDIMENSION)
load_image(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
rle_source_ptr source = (rle_source_ptr)sinfo;
JDIMENSION row, col;
JSAMPROW scanline, red_ptr, green_ptr, blue_ptr;
rle_pixel **rle_row;
rle_map *colormap;
char channel;
#ifdef PROGRESS_REPORT
cd_progress_ptr progress = (cd_progress_ptr)cinfo->progress;
#endif
colormap = source->header.cmap;
rle_row = source->rle_row;
/* Read the RLE data into our virtual array.
* We assume here that rle_pixel is represented the same as JSAMPLE.
*/
RLE_CLR_BIT(source->header, RLE_ALPHA); /* don't read the alpha channel */
#ifdef PROGRESS_REPORT
if (progress != NULL) {
progress->pub.pass_limit = cinfo->image_height;
progress->pub.pass_counter = 0;
(*progress->pub.progress_monitor) ((j_common_ptr)cinfo);
}
#endif
switch (source->visual) {
case GRAYSCALE:
case PSEUDOCOLOR:
for (row = 0; row < cinfo->image_height; row++) {
rle_row = (rle_pixel **)(*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->image, row, (JDIMENSION)1, TRUE);
rle_getrow(&source->header, rle_row);
#ifdef PROGRESS_REPORT
if (progress != NULL) {
progress->pub.pass_counter++;
(*progress->pub.progress_monitor) ((j_common_ptr)cinfo);
}
#endif
}
break;
case MAPPEDGRAY:
case TRUECOLOR:
for (row = 0; row < cinfo->image_height; row++) {
scanline = *(*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->image, row, (JDIMENSION)1, TRUE);
rle_row = source->rle_row;
rle_getrow(&source->header, rle_row);
for (col = 0; col < cinfo->image_width; col++) {
for (channel = 0; channel < source->header.ncolors; channel++) {
*scanline++ = (JSAMPLE)
(colormap[GETJSAMPLE(rle_row[channel][col]) + 256 * channel] >> 8);
}
}
#ifdef PROGRESS_REPORT
if (progress != NULL) {
progress->pub.pass_counter++;
(*progress->pub.progress_monitor) ((j_common_ptr)cinfo);
}
#endif
}
break;
case DIRECTCOLOR:
for (row = 0; row < cinfo->image_height; row++) {
scanline = *(*cinfo->mem->access_virt_sarray)
((j_common_ptr)cinfo, source->image, row, (JDIMENSION)1, TRUE);
rle_getrow(&source->header, rle_row);
red_ptr = rle_row[0];
green_ptr = rle_row[1];
blue_ptr = rle_row[2];
for (col = cinfo->image_width; col > 0; col--) {
*scanline++ = *red_ptr++;
*scanline++ = *green_ptr++;
*scanline++ = *blue_ptr++;
}
#ifdef PROGRESS_REPORT
if (progress != NULL) {
progress->pub.pass_counter++;
(*progress->pub.progress_monitor) ((j_common_ptr)cinfo);
}
#endif
}
}
#ifdef PROGRESS_REPORT
if (progress != NULL)
progress->completed_extra_passes++;
#endif
/* Set up to call proper row-extraction routine in future */
if (source->visual == PSEUDOCOLOR) {
source->pub.buffer = source->rle_row;
source->pub.get_pixel_rows = get_pseudocolor_row;
} else {
source->pub.get_pixel_rows = get_rle_row;
}
source->row = cinfo->image_height;
/* And fetch the topmost (bottommost) row */
return (*source->pub.get_pixel_rows) (cinfo, sinfo);
}
/*
* Finish up at the end of the file.
*/
METHODDEF(void)
finish_input_rle(j_compress_ptr cinfo, cjpeg_source_ptr sinfo)
{
/* no work */
}
/*
* The module selection routine for RLE format input.
*/
GLOBAL(cjpeg_source_ptr)
jinit_read_rle(j_compress_ptr cinfo)
{
rle_source_ptr source;
/* Create module interface object */
source = (rle_source_ptr)
(*cinfo->mem->alloc_small) ((j_common_ptr)cinfo, JPOOL_IMAGE,
sizeof(rle_source_struct));
/* Fill in method ptrs */
source->pub.start_input = start_input_rle;
source->pub.finish_input = finish_input_rle;
source->pub.get_pixel_rows = load_image;
return (cjpeg_source_ptr)source;
}
#endif /* RLE_SUPPORTED */

View File

@@ -28,18 +28,8 @@
/* Macros to deal with unsigned chars as efficiently as compiler allows */ /* Macros to deal with unsigned chars as efficiently as compiler allows */
#ifdef HAVE_UNSIGNED_CHAR
typedef unsigned char U_CHAR; typedef unsigned char U_CHAR;
#define UCH(x) ((int)(x)) #define UCH(x) ((int)(x))
#else /* !HAVE_UNSIGNED_CHAR */
#ifdef __CHAR_UNSIGNED__
typedef char U_CHAR;
#define UCH(x) ((int)(x))
#else
typedef char U_CHAR;
#define UCH(x) ((int)(x) & 0xFF)
#endif
#endif /* HAVE_UNSIGNED_CHAR */
#define ReadOK(file, buffer, len) \ #define ReadOK(file, buffer, len) \

4
release/Config.cmake.in Normal file
View File

@@ -0,0 +1,4 @@
@PACKAGE_INIT@
include("${CMAKE_CURRENT_LIST_DIR}/@CMAKE_PROJECT_NAME@Targets.cmake")
check_required_components("@CMAKE_PROJECT_NAME@")

View File

@@ -1,4 +1,4 @@
libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86 and x86-64 systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines. In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs. libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86, x86-64, and Arm systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal. On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines. In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API. libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface. libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API. libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface.

View File

@@ -1,18 +0,0 @@
{\rtf1\ansi\ansicpg1252\cocoartf1404\cocoasubrtf460
{\fonttbl\f0\fnil\fcharset0 Menlo-Regular;}
{\colortbl;\red255\green255\blue255;\red0\green0\blue0;\red203\green233\blue242;}
\deftab720
\pard\tx529\tx1059\tx1589\tx2119\tx2649\tx3178\tx3708\tx4238\tx4768\tx5298\tx5827\tx6357\tx6887\tx7417\tx7947\tx8476\tx9006\tx9536\tx10066\tx10596\tx11125\tx11655\tx12185\tx12715\tx13245\tx13774\tx14304\tx14834\tx15364\tx15894\tx16423\tx16953\tx17483\tx18013\tx18543\tx19072\tx19602\tx20132\tx20662\tx21192\tx21722\tx22251\tx22781\tx23311\tx23841\tx24371\tx24900\tx25430\tx25960\tx26490\tx27020\tx27549\tx28079\tx28609\tx29139\tx29669\tx30198\tx30728\tx31258\tx31788\tx32318\tx32847\tx33377\tx33907\tx34437\tx34967\tx35496\tx36026\tx36556\tx37086\tx37616\tx38145\tx38675\tx39205\tx39735\tx40265\tx40794\tx41324\tx41854\tx42384\tx42914\tx43443\tx43973\tx44503\tx45033\tx45563\tx46093\tx46622\tx47152\tx47682\tx48212\tx48742\tx49271\tx49801\tx50331\tx50861\tx51391\tx51920\tx52450\tx52980\pardeftab720\li529\fi-530\partightenfactor0
\f0\fs22 \cf2 \CocoaLigature0 TThi /opt/mozjpeg/bin/uninstall\
\pard\tx529\tx1059\tx1589\tx2119\tx2649\tx3178\tx3708\tx4238\tx4768\tx5298\tx5827\tx6357\tx6887\tx7417\tx7947\tx8476\tx9006\tx9536\tx10066\tx10596\tx11125\tx11655\tx12185\tx12715\tx13245\tx13774\tx14304\tx14834\tx15364\tx15894\tx16423\tx16953\tx17483\tx18013\tx18543\tx19072\tx19602\tx20132\tx20662\tx21192\tx21722\tx22251\tx22781\tx23311\tx23841\tx24371\tx24900\tx25430\tx25960\tx26490\tx27020\tx27549\tx28079\tx28609\tx29139\tx29669\tx30198\tx30728\tx31258\tx31788\tx32318\tx32847\tx33377\tx33907\tx34437\tx34967\tx35496\tx36026\tx36556\tx37086\tx37616\tx38145\tx38675\tx39205\tx39735\tx40265\tx40794\tx41324\tx41854\tx42384\tx42914\tx43443\tx43973\tx44503\tx45033\tx45563\tx46093\tx46622\tx47152\tx47682\tx48212\tx48742\tx49271\tx49801\tx50331\tx50861\tx51391\tx51920\tx52450\tx52980\pardeftab720\li662\fi-663\partightenfactor0
\cf2 installer will install the mozjpeg SDK and run-time libraries onto your computer so that you can use mozjpeg to build new applications. To remove the mozjpeg package, run\
\pard\tx529\tx1059\tx1589\tx2119\tx2649\tx3178\tx3708\tx4238\tx4768\tx5298\tx5827\tx6357\tx6887\tx7417\tx7947\tx8476\tx9006\tx9536\tx10066\tx10596\tx11125\tx11655\tx12185\tx12715\tx13245\tx13774\tx14304\tx14834\tx15364\tx15894\tx16423\tx16953\tx17483\tx18013\tx18543\tx19072\tx19602\tx20132\tx20662\tx21192\tx21722\tx22251\tx22781\tx23311\tx23841\tx24371\tx24900\tx25430\tx25960\tx26490\tx27020\tx27549\tx28079\tx28609\tx29139\tx29669\tx30198\tx30728\tx31258\tx31788\tx32318\tx32847\tx33377\tx33907\tx34437\tx34967\tx35496\tx36026\tx36556\tx37086\tx37616\tx38145\tx38675\tx39205\tx39735\tx40265\tx40794\tx41324\tx41854\tx42384\tx42914\tx43443\tx43973\tx44503\tx45033\tx45563\tx46093\tx46622\tx47152\tx47682\tx48212\tx48742\tx49271\tx49801\tx50331\tx50861\tx51391\tx51920\tx52450\tx52980\pardeftab720\li529\fi-530\partightenfactor0
\cf2 is installer will install the \cb3 mozjpeg\cb1 SDK and run-time libraries onto your computer so that you can use \cb3 mozjpeg\cb1 to build new applications. To remove the \cb3 mozjpeg\cb1 package, run\
\
\pard\tx529\tx1059\tx1589\tx2119\tx2649\tx3178\tx3708\tx4238\tx4768\tx5298\tx5827\tx6357\tx6887\tx7417\tx7947\tx8476\tx9006\tx9536\tx10066\tx10596\tx11125\tx11655\tx12185\tx12715\tx13245\tx13774\tx14304\tx14834\tx15364\tx15894\tx16423\tx16953\tx17483\tx18013\tx18543\tx19072\tx19602\tx20132\tx20662\tx21192\tx21722\tx22251\tx22781\tx23311\tx23841\tx24371\tx24900\tx25430\tx25960\tx26490\tx27020\tx27549\tx28079\tx28609\tx29139\tx29669\tx30198\tx30728\tx31258\tx31788\tx32318\tx32847\tx33377\tx33907\tx34437\tx34967\tx35496\tx36026\tx36556\tx37086\tx37616\tx38145\tx38675\tx39205\tx39735\tx40265\tx40794\tx41324\tx41854\tx42384\tx42914\tx43443\tx43973\tx44503\tx45033\tx45563\tx46093\tx46622\tx47152\tx47682\tx48212\tx48742\tx49271\tx49801\tx50331\tx50861\tx51391\tx51920\tx52450\tx52980\pardeftab720\li794\fi-795\partightenfactor0
\cf2 /opt/\cb3 mozjpeg\cb1 /bin/uninstall\
\pard\tx529\tx1059\tx1589\tx2119\tx2649\tx3178\tx3708\tx4238\tx4768\tx5298\tx5827\tx6357\tx6887\tx7417\tx7947\tx8476\tx9006\tx9536\tx10066\tx10596\tx11125\tx11655\tx12185\tx12715\tx13245\tx13774\tx14304\tx14834\tx15364\tx15894\tx16423\tx16953\tx17483\tx18013\tx18543\tx19072\tx19602\tx20132\tx20662\tx21192\tx21722\tx22251\tx22781\tx23311\tx23841\tx24371\tx24900\tx25430\tx25960\tx26490\tx27020\tx27549\tx28079\tx28609\tx29139\tx29669\tx30198\tx30728\tx31258\tx31788\tx32318\tx32847\tx33377\tx33907\tx34437\tx34967\tx35496\tx36026\tx36556\tx37086\tx37616\tx38145\tx38675\tx39205\tx39735\tx40265\tx40794\tx41324\tx41854\tx42384\tx42914\tx43443\tx43973\tx44503\tx45033\tx45563\tx46093\tx46622\tx47152\tx47682\tx48212\tx48742\tx49271\tx49801\tx50331\tx50861\tx51391\tx51920\tx52450\tx52980\pardeftab720\li560\fi-561\partightenfactor0
\cf2 \
from the command line.\
}

17
release/Welcome.rtf.in Normal file
View File

@@ -0,0 +1,17 @@
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf360
{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fmodern\fcharset0 CourierNewPSMT;}
{\colortbl;\red255\green255\blue255;}
\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\deftab720
\pard\pardeftab720\ql\qnatural
\f0\fs24 \cf0 This installer will install the libjpeg-turbo SDK and run-time libraries onto your computer so that you can use libjpeg-turbo to build new applications or accelerate existing ones. To remove the libjpeg-turbo package, run\
\
\pard\pardeftab720\ql\qnatural
\f1 \cf0 @CMAKE_INSTALL_FULL_BINDIR@/uninstall\
\pard\pardeftab720\ql\qnatural
\f0 \cf0 \
from the command line.\
}

View File

@@ -71,6 +71,11 @@ Section "@CMAKE_PROJECT_NAME@ SDK for @INST_PLATFORM@ (required)"
SetOutPath $INSTDIR\lib\pkgconfig SetOutPath $INSTDIR\lib\pkgconfig
File "@CMAKE_CURRENT_BINARY_DIR@\pkgscripts\libjpeg.pc" File "@CMAKE_CURRENT_BINARY_DIR@\pkgscripts\libjpeg.pc"
File "@CMAKE_CURRENT_BINARY_DIR@\pkgscripts\libturbojpeg.pc" File "@CMAKE_CURRENT_BINARY_DIR@\pkgscripts\libturbojpeg.pc"
SetOutPath $INSTDIR\lib\cmake\@CMAKE_PROJECT_NAME@
File "@CMAKE_CURRENT_BINARY_DIR@\pkgscripts\@CMAKE_PROJECT_NAME@Config.cmake"
File "@CMAKE_CURRENT_BINARY_DIR@\pkgscripts\@CMAKE_PROJECT_NAME@ConfigVersion.cmake"
File "@CMAKE_CURRENT_BINARY_DIR@\win\@CMAKE_PROJECT_NAME@Targets.cmake"
File "@CMAKE_CURRENT_BINARY_DIR@\win\@CMAKE_PROJECT_NAME@Targets-release.cmake"
!ifdef JAVA !ifdef JAVA
SetOutPath $INSTDIR\classes SetOutPath $INSTDIR\classes
File "@CMAKE_CURRENT_BINARY_DIR@\java\turbojpeg.jar" File "@CMAKE_CURRENT_BINARY_DIR@\java\turbojpeg.jar"
@@ -141,6 +146,10 @@ Section "Uninstall"
!endif !endif
Delete $INSTDIR\lib\pkgconfig\libjpeg.pc Delete $INSTDIR\lib\pkgconfig\libjpeg.pc
Delete $INSTDIR\lib\pkgconfig\libturbojpeg.pc Delete $INSTDIR\lib\pkgconfig\libturbojpeg.pc
Delete $INSTDIR\lib\cmake\@CMAKE_PROJECT_NAME@\@CMAKE_PROJECT_NAME@Config.cmake
Delete $INSTDIR\lib\cmake\@CMAKE_PROJECT_NAME@\@CMAKE_PROJECT_NAME@ConfigVersion.cmake
Delete $INSTDIR\lib\cmake\@CMAKE_PROJECT_NAME@\@CMAKE_PROJECT_NAME@Targets.cmake
Delete $INSTDIR\lib\cmake\@CMAKE_PROJECT_NAME@\@CMAKE_PROJECT_NAME@Targets-release.cmake
!ifdef JAVA !ifdef JAVA
Delete $INSTDIR\classes\turbojpeg.jar Delete $INSTDIR\classes\turbojpeg.jar
!endif !endif
@@ -176,6 +185,8 @@ Section "Uninstall"
RMDir "$INSTDIR\include" RMDir "$INSTDIR\include"
RMDir "$INSTDIR\lib\pkgconfig" RMDir "$INSTDIR\lib\pkgconfig"
RMDir "$INSTDIR\lib\cmake\@CMAKE_PROJECT_NAME@"
RMDir "$INSTDIR\lib\cmake"
RMDir "$INSTDIR\lib" RMDir "$INSTDIR\lib"
RMDir "$INSTDIR\doc" RMDir "$INSTDIR\doc"
!ifdef GCC !ifdef GCC

View File

@@ -1,66 +0,0 @@
#!/bin/sh
set -u
set -e
trap onexit INT
trap onexit TERM
trap onexit EXIT
TMPDIR=
onexit()
{
if [ ! "$TMPDIR" = "" ]; then
rm -rf $TMPDIR
fi
}
safedirmove ()
{
if [ "$1" = "$2" ]; then
return 0
fi
if [ "$1" = "" -o ! -d "$1" ]; then
echo safedirmove: source dir $1 is not valid
return 1
fi
if [ "$2" = "" -o -e "$2" ]; then
echo safedirmove: dest dir $2 is not valid
return 1
fi
if [ "$3" = "" -o -e "$3" ]; then
echo safedirmove: tmp dir $3 is not valid
return 1
fi
mkdir -p $3
mv $1/* $3/
rmdir $1
mkdir -p $2
mv $3/* $2/
rmdir $3
return 0
}
PKGNAME=@PKGNAME@
VERSION=@VERSION@
BUILD=@BUILD@
PREFIX=@CMAKE_INSTALL_PREFIX@
DOCDIR=@CMAKE_INSTALL_FULL_DOCDIR@
LIBDIR=@CMAKE_INSTALL_FULL_LIBDIR@
umask 022
rm -f $PKGNAME-$VERSION-$BUILD.tar.bz2
TMPDIR=`mktemp -d /tmp/ljtbuild.XXXXXX`
__PWD=`pwd`
make install DESTDIR=$TMPDIR/pkg
if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$DOCDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/doc" ]; then
safedirmove $TMPDIR/pkg$DOCDIR $TMPDIR/pkg/usr/share/doc/$PKGNAME-$VERSION $TMPDIR/__tmpdoc
ln -fs /usr/share/doc/$PKGNAME-$VERSION $TMPDIR/pkg$DOCDIR
fi
cd $TMPDIR/pkg
tar cfj ../$PKGNAME-$VERSION-$BUILD.tar.bz2 *
cd $__PWD
mv $TMPDIR/*.tar.bz2 .
exit 0

View File

@@ -67,7 +67,7 @@ makedeb()
mkdir $TMPDIR/DEBIAN mkdir $TMPDIR/DEBIAN
if [ $SUPPLEMENT = 1 ]; then if [ $SUPPLEMENT = 1 ]; then
make install DESTDIR=$TMPDIR DESTDIR=$TMPDIR @CMAKE_MAKE_PROGRAM@ install
rm -rf $TMPDIR$BINDIR rm -rf $TMPDIR$BINDIR
if [ "$DATAROOTDIR" != "$PREFIX" ]; then if [ "$DATAROOTDIR" != "$PREFIX" ]; then
rm -rf $TMPDIR$DATAROOTDIR rm -rf $TMPDIR$DATAROOTDIR
@@ -79,7 +79,7 @@ makedeb()
rm -rf $TMPDIR$INCLUDEDIR rm -rf $TMPDIR$INCLUDEDIR
rm -rf $TMPDIR$MANDIR rm -rf $TMPDIR$MANDIR
else else
make install DESTDIR=$TMPDIR DESTDIR=$TMPDIR @CMAKE_MAKE_PROGRAM@ install
if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$DOCDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/doc" ]; then if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$DOCDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/doc" ]; then
safedirmove $TMPDIR/$DOCDIR $TMPDIR/usr/share/doc/$PKGNAME-$VERSION $TMPDIR/__tmpdoc safedirmove $TMPDIR/$DOCDIR $TMPDIR/usr/share/doc/$PKGNAME-$VERSION $TMPDIR/__tmpdoc
ln -fs /usr/share/doc/$DIRNAME-$VERSION $TMPDIR$DOCDIR ln -fs /usr/share/doc/$DIRNAME-$VERSION $TMPDIR$DOCDIR

View File

@@ -43,23 +43,18 @@ safedirmove ()
usage() usage()
{ {
echo "$0 [universal] [-lipo [path to lipo]]" echo "$0 [-lipo [path to lipo]]"
exit 1 exit 1
} }
UNIVERSAL=0
PKGNAME=@PKGNAME@ PKGNAME=@PKGNAME@
VERSION=@VERSION@ VERSION=@VERSION@
BUILD=@BUILD@ BUILD=@BUILD@
SRCDIR=@CMAKE_CURRENT_SOURCE_DIR@ SRCDIR=@CMAKE_CURRENT_SOURCE_DIR@
BUILDDIR32=@OSX_32BIT_BUILD@ BUILDDIRARMV8=@ARMV8_BUILD@
BUILDDIRARMV7=@IOS_ARMV7_BUILD@
BUILDDIRARMV7S=@IOS_ARMV7S_BUILD@
BUILDDIRARMV8=@IOS_ARMV8_BUILD@
WITH_JAVA=@WITH_JAVA@ WITH_JAVA=@WITH_JAVA@
OSX_APP_CERT_NAME="@OSX_APP_CERT_NAME@" MACOS_APP_CERT_NAME="@MACOS_APP_CERT_NAME@"
OSX_INST_CERT_NAME="@OSX_INST_CERT_NAME@" MACOS_INST_CERT_NAME="@MACOS_INST_CERT_NAME@"
LIPO=lipo LIPO=lipo
PREFIX=@CMAKE_INSTALL_PREFIX@ PREFIX=@CMAKE_INSTALL_PREFIX@
@@ -82,9 +77,6 @@ while [ $# -gt 0 ]; do
fi fi
fi fi
;; ;;
universal)
UNIVERSAL=1
;;
esac esac
shift shift
done done
@@ -98,7 +90,7 @@ TMPDIR=`mktemp -d /tmp/$PKGNAME-build.XXXXXX`
PKGROOT=$TMPDIR/pkg/Package_Root PKGROOT=$TMPDIR/pkg/Package_Root
mkdir -p $PKGROOT mkdir -p $PKGROOT
make install DESTDIR=$PKGROOT DESTDIR=$PKGROOT @CMAKE_MAKE_PROGRAM@ install
if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$DOCDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/doc" ]; then if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$DOCDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/doc" ]; then
mkdir -p $PKGROOT/Library/Documentation mkdir -p $PKGROOT/Library/Documentation
@@ -106,62 +98,7 @@ if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$DOCDIR" = "@CMAKE_INSTALL
ln -fs /Library/Documentation/$PKGNAME $PKGROOT$DOCDIR ln -fs /Library/Documentation/$PKGNAME $PKGROOT$DOCDIR
fi fi
if [ $UNIVERSAL = 1 -a "$BUILDDIR32" != "" ]; then install_subbuild()
if [ ! -d $BUILDDIR32 ]; then
echo ERROR: 32-bit build directory $BUILDDIR32 does not exist
exit 1
fi
if [ ! -f $BUILDDIR32/Makefile ]; then
echo ERROR: 32-bit build directory $BUILDDIR32 is not configured
exit 1
fi
mkdir -p $TMPDIR/dist.x86
pushd $BUILDDIR32
make install DESTDIR=$TMPDIR/dist.x86
popd
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$LIBDIR/$LIBJPEG_DSO_NAME \
-arch x86_64 $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME \
-output $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$LIBDIR/libjpeg.a \
-arch x86_64 $PKGROOT/$LIBDIR/libjpeg.a \
-output $PKGROOT/$LIBDIR/libjpeg.a
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$LIBDIR/$TURBOJPEG_DSO_NAME \
-arch x86_64 $PKGROOT/$LIBDIR/$TURBOJPEG_DSO_NAME \
-output $PKGROOT/$LIBDIR/$TURBOJPEG_DSO_NAME
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$LIBDIR/libturbojpeg.a \
-arch x86_64 $PKGROOT/$LIBDIR/libturbojpeg.a \
-output $PKGROOT/$LIBDIR/libturbojpeg.a
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$BINDIR/cjpeg \
-arch x86_64 $PKGROOT/$BINDIR/cjpeg \
-output $PKGROOT/$BINDIR/cjpeg
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$BINDIR/djpeg \
-arch x86_64 $PKGROOT/$BINDIR/djpeg \
-output $PKGROOT/$BINDIR/djpeg
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$BINDIR/jpegtran \
-arch x86_64 $PKGROOT/$BINDIR/jpegtran \
-output $PKGROOT/$BINDIR/jpegtran
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$BINDIR/tjbench \
-arch x86_64 $PKGROOT/$BINDIR/tjbench \
-output $PKGROOT/$BINDIR/tjbench
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$BINDIR/rdjpgcom \
-arch x86_64 $PKGROOT/$BINDIR/rdjpgcom \
-output $PKGROOT/$BINDIR/rdjpgcom
$LIPO -create \
-arch i386 $TMPDIR/dist.x86/$BINDIR/wrjpgcom \
-arch x86_64 $PKGROOT/$BINDIR/wrjpgcom \
-output $PKGROOT/$BINDIR/wrjpgcom
fi
install_ios()
{ {
BUILDDIR=$1 BUILDDIR=$1
ARCHNAME=$2 ARCHNAME=$2
@@ -172,13 +109,13 @@ install_ios()
echo ERROR: $ARCHNAME build directory $BUILDDIR does not exist echo ERROR: $ARCHNAME build directory $BUILDDIR does not exist
exit 1 exit 1
fi fi
if [ ! -f $BUILDDIR/Makefile ]; then if [ ! -f $BUILDDIR/Makefile -a ! -f $BUILDDIR/build.ninja ]; then
echo ERROR: $ARCHNAME build directory $BUILDDIR is not configured echo ERROR: $ARCHNAME build directory $BUILDDIR is not configured
exit 1 exit 1
fi fi
mkdir -p $TMPDIR/dist.$DIRNAME mkdir -p $TMPDIR/dist.$DIRNAME
pushd $BUILDDIR pushd $BUILDDIR
make install DESTDIR=$TMPDIR/dist.$DIRNAME DESTDIR=$TMPDIR/dist.$DIRNAME @CMAKE_MAKE_PROGRAM@ install
popd popd
$LIPO -create \ $LIPO -create \
$PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME \ $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME \
@@ -222,28 +159,14 @@ install_ios()
-output $PKGROOT/$BINDIR/wrjpgcom -output $PKGROOT/$BINDIR/wrjpgcom
} }
if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV7" != "" ]; then if [ "$BUILDDIRARMV8" != "" ]; then
install_ios $BUILDDIRARMV7 Armv7 armv7 arm install_subbuild $BUILDDIRARMV8 Armv8 armv8 arm64
fi
if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV7S" != "" ]; then
install_ios $BUILDDIRARMV7S Armv7s armv7s arm
fi
if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV8" != "" ]; then
install_ios $BUILDDIRARMV8 Armv8 armv8 arm64
fi fi
install_name_tool -id $LIBDIR/$LIBJPEG_DSO_NAME $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME install_name_tool -id $LIBDIR/$LIBJPEG_DSO_NAME $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME
install_name_tool -id $LIBDIR/$TURBOJPEG_DSO_NAME $PKGROOT/$LIBDIR/$TURBOJPEG_DSO_NAME install_name_tool -id $LIBDIR/$TURBOJPEG_DSO_NAME $PKGROOT/$LIBDIR/$TURBOJPEG_DSO_NAME
if [ $WITH_JAVA = 1 ]; then
ln -fs $TURBOJPEG_DSO_NAME $PKGROOT/$LIBDIR/libturbojpeg.jnilib
fi
if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$LIBDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/lib" ]; then if [ "$PREFIX" = "@CMAKE_INSTALL_DEFAULT_PREFIX@" -a "$LIBDIR" = "@CMAKE_INSTALL_DEFAULT_PREFIX@/lib" ]; then
if [ ! -h $PKGROOT/$PREFIX/lib32 ]; then
ln -fs lib $PKGROOT/$PREFIX/lib32
fi
if [ ! -h $PKGROOT/$PREFIX/lib64 ]; then if [ ! -h $PKGROOT/$PREFIX/lib64 ]; then
ln -fs lib $PKGROOT/$PREFIX/lib64 ln -fs lib $PKGROOT/$PREFIX/lib64
fi fi
@@ -255,28 +178,28 @@ install -m 755 pkgscripts/uninstall $PKGROOT/$BINDIR/
find $PKGROOT -type f | while read file; do xattr -c $file; done find $PKGROOT -type f | while read file; do xattr -c $file; done
cp $SRCDIR/release/License.rtf $SRCDIR/release/Welcome.rtf $SRCDIR/release/ReadMe.txt $TMPDIR/pkg/ cp $SRCDIR/release/License.rtf pkgscripts/Welcome.rtf $SRCDIR/release/ReadMe.txt $TMPDIR/pkg/
mkdir $TMPDIR/dmg mkdir $TMPDIR/dmg
pkgbuild --root $PKGROOT --version $VERSION.$BUILD --identifier @PKGID@ \ pkgbuild --root $PKGROOT --version $VERSION.$BUILD --identifier @PKGID@ \
$TMPDIR/pkg/$PKGNAME.pkg $TMPDIR/pkg/$PKGNAME.pkg
SUFFIX= SUFFIX=
if [ "$OSX_INST_CERT_NAME" != "" ]; then if [ "$MACOS_INST_CERT_NAME" != "" ]; then
SUFFIX=-unsigned SUFFIX=-unsigned
fi fi
productbuild --distribution pkgscripts/Distribution.xml \ productbuild --distribution pkgscripts/Distribution.xml \
--package-path $TMPDIR/pkg/ --resources $TMPDIR/pkg/ \ --package-path $TMPDIR/pkg/ --resources $TMPDIR/pkg/ \
$TMPDIR/dmg/$PKGNAME$SUFFIX.pkg $TMPDIR/dmg/$PKGNAME$SUFFIX.pkg
if [ "$OSX_INST_CERT_NAME" != "" ]; then if [ "$MACOS_INST_CERT_NAME" != "" ]; then
productsign --sign "$OSX_INST_CERT_NAME" --timestamp \ productsign --sign "$MACOS_INST_CERT_NAME" --timestamp \
$TMPDIR/dmg/$PKGNAME$SUFFIX.pkg $TMPDIR/dmg/$PKGNAME.pkg $TMPDIR/dmg/$PKGNAME$SUFFIX.pkg $TMPDIR/dmg/$PKGNAME.pkg
rm -r $TMPDIR/dmg/$PKGNAME$SUFFIX.pkg rm -r $TMPDIR/dmg/$PKGNAME$SUFFIX.pkg
pkgutil --check-signature $TMPDIR/dmg/$PKGNAME.pkg pkgutil --check-signature $TMPDIR/dmg/$PKGNAME.pkg
fi fi
hdiutil create -fs HFS+ -volname $PKGNAME-$VERSION \ hdiutil create -fs HFS+ -volname $PKGNAME-$VERSION \
-srcfolder "$TMPDIR/dmg" $TMPDIR/$PKGNAME-$VERSION.dmg -srcfolder "$TMPDIR/dmg" $TMPDIR/$PKGNAME-$VERSION.dmg
if [ "$OSX_APP_CERT_NAME" != "" ]; then if [ "$MACOS_APP_CERT_NAME" != "" ]; then
codesign -s "$OSX_APP_CERT_NAME" --timestamp $TMPDIR/$PKGNAME-$VERSION.dmg codesign -s "$MACOS_APP_CERT_NAME" --timestamp $TMPDIR/$PKGNAME-$VERSION.dmg
codesign -vv $TMPDIR/$PKGNAME-$VERSION.dmg codesign -vv $TMPDIR/$PKGNAME-$VERSION.dmg
fi fi
cp $TMPDIR/$PKGNAME-$VERSION.dmg . cp $TMPDIR/$PKGNAME-$VERSION.dmg .

View File

@@ -32,7 +32,7 @@ rm -f $PKGNAME-$VERSION-$OS-$ARCH.tar.bz2
TMPDIR=`mktemp -d /tmp/$PKGNAME-build.XXXXXX` TMPDIR=`mktemp -d /tmp/$PKGNAME-build.XXXXXX`
mkdir -p $TMPDIR/install mkdir -p $TMPDIR/install
make install DESTDIR=$TMPDIR/install DESTDIR=$TMPDIR/install @CMAKE_MAKE_PROGRAM@ install
echo tartest >$TMPDIR/tartest echo tartest >$TMPDIR/tartest
GNUTAR=0 GNUTAR=0
BSDTAR=0 BSDTAR=0

View File

@@ -53,7 +53,7 @@ Provides: %{name} = %{version}-%{release}, @CMAKE_PROJECT_NAME@ = %{version}-%{r
%description %description
libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
MIPS systems, as well as progressive JPEG compression on x86 and x86-64 MIPS systems, as well as progressive JPEG compression on x86, x86-64, and Arm
systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, systems. On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
all else being equal. On other types of systems, libjpeg-turbo can still all else being equal. On other types of systems, libjpeg-turbo can still
outperform libjpeg by a significant amount, by virtue of its highly-optimized outperform libjpeg by a significant amount, by virtue of its highly-optimized
@@ -102,7 +102,7 @@ broader range of users and developers.
%install %install
rm -rf $RPM_BUILD_ROOT rm -rf $RPM_BUILD_ROOT
make install DESTDIR=$RPM_BUILD_ROOT DESTDIR=$RPM_BUILD_ROOT @CMAKE_MAKE_PROGRAM@ install
/sbin/ldconfig -n $RPM_BUILD_ROOT%{_libdir} /sbin/ldconfig -n $RPM_BUILD_ROOT%{_libdir}
#-->%if 0 #-->%if 0
@@ -184,6 +184,9 @@ rm -rf $RPM_BUILD_ROOT
%endif %endif
%dir %{_libdir}/pkgconfig %dir %{_libdir}/pkgconfig
%{_libdir}/pkgconfig/libjpeg.pc %{_libdir}/pkgconfig/libjpeg.pc
%dir %{_libdir}/cmake
%dir %{_libdir}/cmake/@CMAKE_PROJECT_NAME@
%{_libdir}/cmake/@CMAKE_PROJECT_NAME@
%if "%{_with_turbojpeg}" == "1" %if "%{_with_turbojpeg}" == "1"
%if "%{_enable_shared}" == "1" || "%{_with_java}" == "1" %if "%{_enable_shared}" == "1" || "%{_with_java}" == "1"
%{_libdir}/libturbojpeg.so.@TURBOJPEG_SO_VERSION@ %{_libdir}/libturbojpeg.so.@TURBOJPEG_SO_VERSION@

View File

@@ -1,4 +1,5 @@
# Copyright (C)2009-2011, 2013, 2016 D. R. Commander. All Rights Reserved. # Copyright (C)2009-2011, 2013, 2016, 2020 D. R. Commander.
# All Rights Reserved.
# #
# Redistribution and use in source and binary forms, with or without # Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met: # modification, are permitted provided that the following conditions are met:
@@ -70,6 +71,12 @@ fi
if [ -d $LIBDIR/pkgconfig ]; then if [ -d $LIBDIR/pkgconfig ]; then
rmdir $LIBDIR/pkgconfig 2>&1 || EXITSTATUS=-1 rmdir $LIBDIR/pkgconfig 2>&1 || EXITSTATUS=-1
fi fi
if [ -d $LIBDIR/cmake/@CMAKE_PROJECT_NAME@ ]; then
rmdir $LIBDIR/cmake/@CMAKE_PROJECT_NAME@ || EXITSTATUS=-1
fi
if [ -d $LIBDIR/cmake ]; then
rmdir $LIBDIR/cmake || EXITSTATUS=-1
fi
if [ -d $LIBDIR ]; then if [ -d $LIBDIR ]; then
rmdir $LIBDIR 2>&1 || EXITSTATUS=-1 rmdir $LIBDIR 2>&1 || EXITSTATUS=-1
fi fi
@@ -90,7 +97,7 @@ fi
if [ -d $MANDIR ]; then if [ -d $MANDIR ]; then
rmdir $MANDIR 2>&1 || EXITSTATUS=-1 rmdir $MANDIR 2>&1 || EXITSTATUS=-1
fi fi
if [ -d $JAVADIR ]; then if [ -d "$JAVADIR" ]; then
rmdir $JAVADIR 2>&1 || EXITSTATUS=-1 rmdir $JAVADIR 2>&1 || EXITSTATUS=-1
fi fi
if [ -d $DATAROOTDIR -a "$DATAROOTDIR" != "$PREFIX" ]; then if [ -d $DATAROOTDIR -a "$DATAROOTDIR" != "$PREFIX" ]; then

View File

@@ -112,10 +112,13 @@ set_property(TARGET jpegtran PROPERTY COMPILE_FLAGS "${USE_SETMODE}")
add_executable(jcstest ../jcstest.c) add_executable(jcstest ../jcstest.c)
target_link_libraries(jcstest jpeg) target_link_libraries(jcstest jpeg)
install(TARGETS jpeg cjpeg djpeg jpegtran install(TARGETS jpeg EXPORT ${CMAKE_PROJECT_NAME}Targets
INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}) RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
install(TARGETS cjpeg djpeg jpegtran
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR})
if(NOT CMAKE_VERSION VERSION_LESS "3.1" AND MSVC AND if(NOT CMAKE_VERSION VERSION_LESS "3.1" AND MSVC AND
CMAKE_C_LINKER_SUPPORTS_PDB) CMAKE_C_LINKER_SUPPORTS_PDB)
install(FILES "$<TARGET_PDB_FILE:jpeg>" install(FILES "$<TARGET_PDB_FILE:jpeg>"

View File

@@ -30,6 +30,9 @@ if(CPU_TYPE STREQUAL "x86_64")
if(CYGWIN) if(CYGWIN)
set(CMAKE_ASM_NASM_OBJECT_FORMAT win64) set(CMAKE_ASM_NASM_OBJECT_FORMAT win64)
endif() endif()
if(CMAKE_C_COMPILER_ABI MATCHES "ELF X32")
set(CMAKE_ASM_NASM_OBJECT_FORMAT elfx32)
endif()
elseif(CPU_TYPE STREQUAL "i386") elseif(CPU_TYPE STREQUAL "i386")
if(BORLAND) if(BORLAND)
set(CMAKE_ASM_NASM_OBJECT_FORMAT obj) set(CMAKE_ASM_NASM_OBJECT_FORMAT obj)
@@ -205,64 +208,76 @@ endif()
############################################################################### ###############################################################################
# Arm (GAS) # Arm (Intrinsics or GAS)
############################################################################### ###############################################################################
elseif(CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm") elseif(CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm")
enable_language(ASM) include(CheckSymbolExists)
if(BITS EQUAL 32)
set(CMAKE_REQUIRED_FLAGS -mfpu=neon)
endif()
check_symbol_exists(vld1_s16_x3 arm_neon.h HAVE_VLD1_S16_X3)
check_symbol_exists(vld1_u16_x2 arm_neon.h HAVE_VLD1_U16_X2)
check_symbol_exists(vld1q_u8_x4 arm_neon.h HAVE_VLD1Q_U8_X4)
if(BITS EQUAL 32)
unset(CMAKE_REQUIRED_FLAGS)
endif()
configure_file(arm/neon-compat.h.in arm/neon-compat.h @ONLY)
include_directories(${CMAKE_CURRENT_BINARY_DIR}/arm)
set(CMAKE_ASM_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_ASM_FLAGS}") # GCC (as of this writing) and some older versions of Clang do not have a full
# or optimal set of Neon intrinsics, so for performance reasons, when using
string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC) # those compilers, we default to using the older GAS implementation of the Neon
set(EFFECTIVE_ASM_FLAGS "${CMAKE_ASM_FLAGS} ${CMAKE_ASM_FLAGS_${CMAKE_BUILD_TYPE_UC}}") # SIMD extensions for certain algorithms. The presence or absence of the three
message(STATUS "CMAKE_ASM_FLAGS = ${EFFECTIVE_ASM_FLAGS}") # intrinsics we tested above is a reasonable proxy for this. We always default
# to using the full Neon intrinsics implementation when building for macOS or
# Test whether we need gas-preprocessor.pl # iOS, to avoid the need for gas-preprocessor.
if(CPU_TYPE STREQUAL "arm") if((HAVE_VLD1_S16_X3 AND HAVE_VLD1_U16_X2 AND HAVE_VLD1Q_U8_X4) OR APPLE)
file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/gastest.S " set(DEFAULT_NEON_INTRINSICS 1)
.text
.fpu neon
.arch armv7a
.object_arch armv4
.arm
pld [r0]
vmovn.u16 d0, q0")
else() else()
file(WRITE ${CMAKE_CURRENT_BINARY_DIR}/gastest.S " set(DEFAULT_NEON_INTRINSICS 0)
.text endif()
MYVAR .req x0 option(NEON_INTRINSICS
movi v0.16b, #100 "Because GCC (as of this writing) and some older versions of Clang do not have a full or optimal set of Neon intrinsics, for performance reasons, the default when building libjpeg-turbo with those compilers is to continue using the older GAS implementation of the Neon SIMD extensions for certain algorithms. Setting this option forces the full Neon intrinsics implementation to be used with all compilers. Unsetting this option forces the hybrid GAS/intrinsics implementation to be used with all compilers."
mov MYVAR, #100 ${DEFAULT_NEON_INTRINSICS})
.unreq MYVAR") boolean_number(NEON_INTRINSICS PARENT_SCOPE)
if(NEON_INTRINSICS)
add_definitions(-DNEON_INTRINSICS)
message(STATUS "Use full Neon SIMD intrinsics implementation (NEON_INTRINSICS = ${NEON_INTRINSICS})")
else()
message(STATUS "Use partial Neon SIMD intrinsics implementation (NEON_INTRINSICS = ${NEON_INTRINSICS})")
endif() endif()
separate_arguments(CMAKE_ASM_FLAGS_SEP UNIX_COMMAND "${CMAKE_ASM_FLAGS}") set(SIMD_SOURCES arm/jcgray-neon.c arm/jcphuff-neon.c arm/jcsample-neon.c
arm/jdmerge-neon.c arm/jdsample-neon.c arm/jfdctfst-neon.c
arm/jidctred-neon.c arm/jquanti-neon.c)
if(NEON_INTRINSICS)
set(SIMD_SOURCES ${SIMD_SOURCES} arm/jccolor-neon.c arm/jidctint-neon.c)
endif()
if(NEON_INTRINSICS OR BITS EQUAL 64)
set(SIMD_SOURCES ${SIMD_SOURCES} arm/jidctfst-neon.c)
endif()
if(NEON_INTRINSICS OR BITS EQUAL 32)
set(SIMD_SOURCES ${SIMD_SOURCES} arm/aarch${BITS}/jchuff-neon.c
arm/jdcolor-neon.c arm/jfdctint-neon.c)
endif()
if(BITS EQUAL 32)
set_source_files_properties(${SIMD_SOURCES} COMPILE_FLAGS -mfpu=neon)
endif()
if(NOT NEON_INTRINSICS)
enable_language(ASM)
execute_process(COMMAND ${CMAKE_ASM_COMPILER} ${CMAKE_ASM_FLAGS_SEP} set(CMAKE_ASM_FLAGS "${CMAKE_C_FLAGS} ${CMAKE_ASM_FLAGS}")
-x assembler-with-cpp -c ${CMAKE_CURRENT_BINARY_DIR}/gastest.S
RESULT_VARIABLE RESULT OUTPUT_VARIABLE OUTPUT ERROR_VARIABLE ERROR) string(TOUPPER ${CMAKE_BUILD_TYPE} CMAKE_BUILD_TYPE_UC)
if(NOT RESULT EQUAL 0) set(EFFECTIVE_ASM_FLAGS "${CMAKE_ASM_FLAGS} ${CMAKE_ASM_FLAGS_${CMAKE_BUILD_TYPE_UC}}")
message(STATUS "GAS appears to be broken. Trying gas-preprocessor.pl ...") message(STATUS "CMAKE_ASM_FLAGS = ${EFFECTIVE_ASM_FLAGS}")
execute_process(COMMAND gas-preprocessor.pl ${CMAKE_ASM_COMPILER}
${CMAKE_ASM_FLAGS_SEP} -x assembler-with-cpp -c set(SIMD_SOURCES ${SIMD_SOURCES} arm/aarch${BITS}/jsimd_neon.S)
${CMAKE_CURRENT_BINARY_DIR}/gastest.S
RESULT_VARIABLE RESULT OUTPUT_VARIABLE OUTPUT ERROR_VARIABLE ERROR)
if(NOT RESULT EQUAL 0)
simd_fail("SIMD extensions disabled: GAS is not working properly")
return()
else()
message(STATUS "Using gas-preprocessor.pl")
configure_file(gas-preprocessor.in gas-preprocessor @ONLY)
set(CMAKE_ASM_COMPILER ${CMAKE_CURRENT_BINARY_DIR}/gas-preprocessor)
endif()
else()
message(STATUS "GAS is working properly")
endif() endif()
file(REMOVE ${CMAKE_CURRENT_BINARY_DIR}/gastest.S) add_library(simd OBJECT ${SIMD_SOURCES} arm/aarch${BITS}/jsimd.c)
add_library(simd OBJECT ${CPU_TYPE}/jsimd_neon.S ${CPU_TYPE}/jsimd.c)
if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED) if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED)
set_target_properties(simd PROPERTIES POSITION_INDEPENDENT_CODE 1) set_target_properties(simd PROPERTIES POSITION_INDEPENDENT_CODE 1)
@@ -311,14 +326,35 @@ if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED)
endif() endif()
############################################################################### ###############################################################################
# Loongson (Intrinsics) # MIPS64 (Intrinsics)
############################################################################### ###############################################################################
elseif(CPU_TYPE STREQUAL "loongson") elseif(CPU_TYPE STREQUAL "loongson" OR CPU_TYPE MATCHES "mips64*")
set(SIMD_SOURCES loongson/jccolor-mmi.c loongson/jcsample-mmi.c set(CMAKE_REQUIRED_FLAGS -Wa,-mloongson-mmi,-mloongson-ext)
loongson/jdcolor-mmi.c loongson/jdsample-mmi.c loongson/jfdctint-mmi.c
loongson/jidctint-mmi.c loongson/jquanti-mmi.c) check_c_source_compiles("
int main(void) {
int c = 0, a = 0, b = 0;
asm (
\"paddb %0, %1, %2\"
: \"=f\" (c)
: \"f\" (a), \"f\" (b)
);
return c;
}" HAVE_MMI)
unset(CMAKE_REQUIRED_FLAGS)
if(NOT HAVE_MMI)
simd_fail("SIMD extensions not available for this CPU")
return()
endif()
set(SIMD_SOURCES mips64/jccolor-mmi.c mips64/jcgray-mmi.c mips64/jcsample-mmi.c
mips64/jdcolor-mmi.c mips64/jdmerge-mmi.c mips64/jdsample-mmi.c
mips64/jfdctfst-mmi.c mips64/jfdctint-mmi.c mips64/jidctfst-mmi.c
mips64/jidctint-mmi.c mips64/jquanti-mmi.c)
if(CMAKE_COMPILER_IS_GNUCC) if(CMAKE_COMPILER_IS_GNUCC)
foreach(file ${SIMD_SOURCES}) foreach(file ${SIMD_SOURCES})
@@ -326,8 +362,12 @@ if(CMAKE_COMPILER_IS_GNUCC)
" -fno-strict-aliasing") " -fno-strict-aliasing")
endforeach() endforeach()
endif() endif()
foreach(file ${SIMD_SOURCES})
set_property(SOURCE ${file} APPEND_STRING PROPERTY COMPILE_FLAGS
" -Wa,-mloongson-mmi,-mloongson-ext")
endforeach()
add_library(simd OBJECT ${SIMD_SOURCES} loongson/jsimd.c) add_library(simd OBJECT ${SIMD_SOURCES} mips64/jsimd.c)
if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED) if(CMAKE_POSITION_INDEPENDENT_CODE OR ENABLE_SHARED)
set_target_properties(simd PROPERTIES POSITION_INDEPENDENT_CODE 1) set_target_properties(simd PROPERTIES POSITION_INDEPENDENT_CODE 1)

View File

@@ -0,0 +1,148 @@
/*
* jccolext-neon.c - colorspace conversion (32-bit Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
/* This file is included by jccolor-neon.c */
/* RGB -> YCbCr conversion is defined by the following equations:
* Y = 0.29900 * R + 0.58700 * G + 0.11400 * B
* Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128
* Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128
*
* Avoid floating point arithmetic by using shifted integer constants:
* 0.29899597 = 19595 * 2^-16
* 0.58700561 = 38470 * 2^-16
* 0.11399841 = 7471 * 2^-16
* 0.16874695 = 11059 * 2^-16
* 0.33125305 = 21709 * 2^-16
* 0.50000000 = 32768 * 2^-16
* 0.41868592 = 27439 * 2^-16
* 0.08131409 = 5329 * 2^-16
* These constants are defined in jccolor-neon.c
*
* We add the fixed-point equivalent of 0.5 to Cb and Cr, which effectively
* rounds up or down the result via integer truncation.
*/
void jsimd_rgb_ycc_convert_neon(JDIMENSION image_width, JSAMPARRAY input_buf,
JSAMPIMAGE output_buf, JDIMENSION output_row,
int num_rows)
{
/* Pointer to RGB(X/A) input data */
JSAMPROW inptr;
/* Pointers to Y, Cb, and Cr output data */
JSAMPROW outptr0, outptr1, outptr2;
/* Allocate temporary buffer for final (image_width % 8) pixels in row. */
ALIGN(16) uint8_t tmp_buf[8 * RGB_PIXELSIZE];
/* Set up conversion constants. */
#ifdef HAVE_VLD1_U16_X2
const uint16x4x2_t consts = vld1_u16_x2(jsimd_rgb_ycc_neon_consts);
#else
/* GCC does not currently support the intrinsic vld1_<type>_x2(). */
const uint16x4_t consts1 = vld1_u16(jsimd_rgb_ycc_neon_consts);
const uint16x4_t consts2 = vld1_u16(jsimd_rgb_ycc_neon_consts + 4);
const uint16x4x2_t consts = { { consts1, consts2 } };
#endif
const uint32x4_t scaled_128_5 = vdupq_n_u32((128 << 16) + 32767);
while (--num_rows >= 0) {
inptr = *input_buf++;
outptr0 = output_buf[0][output_row];
outptr1 = output_buf[1][output_row];
outptr2 = output_buf[2][output_row];
output_row++;
int cols_remaining = image_width;
for (; cols_remaining > 0; cols_remaining -= 8) {
/* To prevent buffer overread by the vector load instructions, the last
* (image_width % 8) columns of data are first memcopied to a temporary
* buffer large enough to accommodate the vector load.
*/
if (cols_remaining < 8) {
memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE);
inptr = tmp_buf;
}
#if RGB_PIXELSIZE == 4
uint8x8x4_t input_pixels = vld4_u8(inptr);
#else
uint8x8x3_t input_pixels = vld3_u8(inptr);
#endif
uint16x8_t r = vmovl_u8(input_pixels.val[RGB_RED]);
uint16x8_t g = vmovl_u8(input_pixels.val[RGB_GREEN]);
uint16x8_t b = vmovl_u8(input_pixels.val[RGB_BLUE]);
/* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */
uint32x4_t y_low = vmull_lane_u16(vget_low_u16(r), consts.val[0], 0);
y_low = vmlal_lane_u16(y_low, vget_low_u16(g), consts.val[0], 1);
y_low = vmlal_lane_u16(y_low, vget_low_u16(b), consts.val[0], 2);
uint32x4_t y_high = vmull_lane_u16(vget_high_u16(r), consts.val[0], 0);
y_high = vmlal_lane_u16(y_high, vget_high_u16(g), consts.val[0], 1);
y_high = vmlal_lane_u16(y_high, vget_high_u16(b), consts.val[0], 2);
/* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */
uint32x4_t cb_low = scaled_128_5;
cb_low = vmlsl_lane_u16(cb_low, vget_low_u16(r), consts.val[0], 3);
cb_low = vmlsl_lane_u16(cb_low, vget_low_u16(g), consts.val[1], 0);
cb_low = vmlal_lane_u16(cb_low, vget_low_u16(b), consts.val[1], 1);
uint32x4_t cb_high = scaled_128_5;
cb_high = vmlsl_lane_u16(cb_high, vget_high_u16(r), consts.val[0], 3);
cb_high = vmlsl_lane_u16(cb_high, vget_high_u16(g), consts.val[1], 0);
cb_high = vmlal_lane_u16(cb_high, vget_high_u16(b), consts.val[1], 1);
/* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */
uint32x4_t cr_low = scaled_128_5;
cr_low = vmlal_lane_u16(cr_low, vget_low_u16(r), consts.val[1], 1);
cr_low = vmlsl_lane_u16(cr_low, vget_low_u16(g), consts.val[1], 2);
cr_low = vmlsl_lane_u16(cr_low, vget_low_u16(b), consts.val[1], 3);
uint32x4_t cr_high = scaled_128_5;
cr_high = vmlal_lane_u16(cr_high, vget_high_u16(r), consts.val[1], 1);
cr_high = vmlsl_lane_u16(cr_high, vget_high_u16(g), consts.val[1], 2);
cr_high = vmlsl_lane_u16(cr_high, vget_high_u16(b), consts.val[1], 3);
/* Descale Y values (rounding right shift) and narrow to 16-bit. */
uint16x8_t y_u16 = vcombine_u16(vrshrn_n_u32(y_low, 16),
vrshrn_n_u32(y_high, 16));
/* Descale Cb values (right shift) and narrow to 16-bit. */
uint16x8_t cb_u16 = vcombine_u16(vshrn_n_u32(cb_low, 16),
vshrn_n_u32(cb_high, 16));
/* Descale Cr values (right shift) and narrow to 16-bit. */
uint16x8_t cr_u16 = vcombine_u16(vshrn_n_u32(cr_low, 16),
vshrn_n_u32(cr_high, 16));
/* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer
* overwrite is permitted up to the next multiple of ALIGN_SIZE bytes.
*/
vst1_u8(outptr0, vmovn_u16(y_u16));
vst1_u8(outptr1, vmovn_u16(cb_u16));
vst1_u8(outptr2, vmovn_u16(cr_u16));
/* Increment pointers. */
inptr += (8 * RGB_PIXELSIZE);
outptr0 += 8;
outptr1 += 8;
outptr2 += 8;
}
}
}

View File

@@ -0,0 +1,334 @@
/*
* jchuff-neon.c - Huffman entropy encoding (32-bit Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*
* NOTE: All referenced figures are from
* Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994.
*/
#define JPEG_INTERNALS
#include "../../../jinclude.h"
#include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../jsimd.h"
#include "../jchuff.h"
#include "neon-compat.h"
#include <limits.h>
#include <arm_neon.h>
JOCTET *jsimd_huff_encode_one_block_neon(void *state, JOCTET *buffer,
JCOEFPTR block, int last_dc_val,
c_derived_tbl *dctbl,
c_derived_tbl *actbl)
{
uint8_t block_nbits[DCTSIZE2];
uint16_t block_diff[DCTSIZE2];
/* Load rows of coefficients from DCT block in zig-zag order. */
/* Compute DC coefficient difference value. (F.1.1.5.1) */
int16x8_t row0 = vdupq_n_s16(block[0] - last_dc_val);
row0 = vld1q_lane_s16(block + 1, row0, 1);
row0 = vld1q_lane_s16(block + 8, row0, 2);
row0 = vld1q_lane_s16(block + 16, row0, 3);
row0 = vld1q_lane_s16(block + 9, row0, 4);
row0 = vld1q_lane_s16(block + 2, row0, 5);
row0 = vld1q_lane_s16(block + 3, row0, 6);
row0 = vld1q_lane_s16(block + 10, row0, 7);
int16x8_t row1 = vld1q_dup_s16(block + 17);
row1 = vld1q_lane_s16(block + 24, row1, 1);
row1 = vld1q_lane_s16(block + 32, row1, 2);
row1 = vld1q_lane_s16(block + 25, row1, 3);
row1 = vld1q_lane_s16(block + 18, row1, 4);
row1 = vld1q_lane_s16(block + 11, row1, 5);
row1 = vld1q_lane_s16(block + 4, row1, 6);
row1 = vld1q_lane_s16(block + 5, row1, 7);
int16x8_t row2 = vld1q_dup_s16(block + 12);
row2 = vld1q_lane_s16(block + 19, row2, 1);
row2 = vld1q_lane_s16(block + 26, row2, 2);
row2 = vld1q_lane_s16(block + 33, row2, 3);
row2 = vld1q_lane_s16(block + 40, row2, 4);
row2 = vld1q_lane_s16(block + 48, row2, 5);
row2 = vld1q_lane_s16(block + 41, row2, 6);
row2 = vld1q_lane_s16(block + 34, row2, 7);
int16x8_t row3 = vld1q_dup_s16(block + 27);
row3 = vld1q_lane_s16(block + 20, row3, 1);
row3 = vld1q_lane_s16(block + 13, row3, 2);
row3 = vld1q_lane_s16(block + 6, row3, 3);
row3 = vld1q_lane_s16(block + 7, row3, 4);
row3 = vld1q_lane_s16(block + 14, row3, 5);
row3 = vld1q_lane_s16(block + 21, row3, 6);
row3 = vld1q_lane_s16(block + 28, row3, 7);
int16x8_t abs_row0 = vabsq_s16(row0);
int16x8_t abs_row1 = vabsq_s16(row1);
int16x8_t abs_row2 = vabsq_s16(row2);
int16x8_t abs_row3 = vabsq_s16(row3);
int16x8_t row0_lz = vclzq_s16(abs_row0);
int16x8_t row1_lz = vclzq_s16(abs_row1);
int16x8_t row2_lz = vclzq_s16(abs_row2);
int16x8_t row3_lz = vclzq_s16(abs_row3);
/* Compute number of bits required to represent each coefficient. */
uint8x8_t row0_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row0_lz)));
uint8x8_t row1_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row1_lz)));
uint8x8_t row2_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row2_lz)));
uint8x8_t row3_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row3_lz)));
vst1_u8(block_nbits + 0 * DCTSIZE, row0_nbits);
vst1_u8(block_nbits + 1 * DCTSIZE, row1_nbits);
vst1_u8(block_nbits + 2 * DCTSIZE, row2_nbits);
vst1_u8(block_nbits + 3 * DCTSIZE, row3_nbits);
uint16x8_t row0_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row0, 15)),
vnegq_s16(row0_lz));
uint16x8_t row1_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row1, 15)),
vnegq_s16(row1_lz));
uint16x8_t row2_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row2, 15)),
vnegq_s16(row2_lz));
uint16x8_t row3_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row3, 15)),
vnegq_s16(row3_lz));
uint16x8_t row0_diff = veorq_u16(vreinterpretq_u16_s16(abs_row0), row0_mask);
uint16x8_t row1_diff = veorq_u16(vreinterpretq_u16_s16(abs_row1), row1_mask);
uint16x8_t row2_diff = veorq_u16(vreinterpretq_u16_s16(abs_row2), row2_mask);
uint16x8_t row3_diff = veorq_u16(vreinterpretq_u16_s16(abs_row3), row3_mask);
/* Store diff values for rows 0, 1, 2, and 3. */
vst1q_u16(block_diff + 0 * DCTSIZE, row0_diff);
vst1q_u16(block_diff + 1 * DCTSIZE, row1_diff);
vst1q_u16(block_diff + 2 * DCTSIZE, row2_diff);
vst1q_u16(block_diff + 3 * DCTSIZE, row3_diff);
/* Load last four rows of coefficients from DCT block in zig-zag order. */
int16x8_t row4 = vld1q_dup_s16(block + 35);
row4 = vld1q_lane_s16(block + 42, row4, 1);
row4 = vld1q_lane_s16(block + 49, row4, 2);
row4 = vld1q_lane_s16(block + 56, row4, 3);
row4 = vld1q_lane_s16(block + 57, row4, 4);
row4 = vld1q_lane_s16(block + 50, row4, 5);
row4 = vld1q_lane_s16(block + 43, row4, 6);
row4 = vld1q_lane_s16(block + 36, row4, 7);
int16x8_t row5 = vld1q_dup_s16(block + 29);
row5 = vld1q_lane_s16(block + 22, row5, 1);
row5 = vld1q_lane_s16(block + 15, row5, 2);
row5 = vld1q_lane_s16(block + 23, row5, 3);
row5 = vld1q_lane_s16(block + 30, row5, 4);
row5 = vld1q_lane_s16(block + 37, row5, 5);
row5 = vld1q_lane_s16(block + 44, row5, 6);
row5 = vld1q_lane_s16(block + 51, row5, 7);
int16x8_t row6 = vld1q_dup_s16(block + 58);
row6 = vld1q_lane_s16(block + 59, row6, 1);
row6 = vld1q_lane_s16(block + 52, row6, 2);
row6 = vld1q_lane_s16(block + 45, row6, 3);
row6 = vld1q_lane_s16(block + 38, row6, 4);
row6 = vld1q_lane_s16(block + 31, row6, 5);
row6 = vld1q_lane_s16(block + 39, row6, 6);
row6 = vld1q_lane_s16(block + 46, row6, 7);
int16x8_t row7 = vld1q_dup_s16(block + 53);
row7 = vld1q_lane_s16(block + 60, row7, 1);
row7 = vld1q_lane_s16(block + 61, row7, 2);
row7 = vld1q_lane_s16(block + 54, row7, 3);
row7 = vld1q_lane_s16(block + 47, row7, 4);
row7 = vld1q_lane_s16(block + 55, row7, 5);
row7 = vld1q_lane_s16(block + 62, row7, 6);
row7 = vld1q_lane_s16(block + 63, row7, 7);
int16x8_t abs_row4 = vabsq_s16(row4);
int16x8_t abs_row5 = vabsq_s16(row5);
int16x8_t abs_row6 = vabsq_s16(row6);
int16x8_t abs_row7 = vabsq_s16(row7);
int16x8_t row4_lz = vclzq_s16(abs_row4);
int16x8_t row5_lz = vclzq_s16(abs_row5);
int16x8_t row6_lz = vclzq_s16(abs_row6);
int16x8_t row7_lz = vclzq_s16(abs_row7);
/* Compute number of bits required to represent each coefficient. */
uint8x8_t row4_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row4_lz)));
uint8x8_t row5_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row5_lz)));
uint8x8_t row6_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row6_lz)));
uint8x8_t row7_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row7_lz)));
vst1_u8(block_nbits + 4 * DCTSIZE, row4_nbits);
vst1_u8(block_nbits + 5 * DCTSIZE, row5_nbits);
vst1_u8(block_nbits + 6 * DCTSIZE, row6_nbits);
vst1_u8(block_nbits + 7 * DCTSIZE, row7_nbits);
uint16x8_t row4_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row4, 15)),
vnegq_s16(row4_lz));
uint16x8_t row5_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row5, 15)),
vnegq_s16(row5_lz));
uint16x8_t row6_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row6, 15)),
vnegq_s16(row6_lz));
uint16x8_t row7_mask =
vshlq_u16(vreinterpretq_u16_s16(vshrq_n_s16(row7, 15)),
vnegq_s16(row7_lz));
uint16x8_t row4_diff = veorq_u16(vreinterpretq_u16_s16(abs_row4), row4_mask);
uint16x8_t row5_diff = veorq_u16(vreinterpretq_u16_s16(abs_row5), row5_mask);
uint16x8_t row6_diff = veorq_u16(vreinterpretq_u16_s16(abs_row6), row6_mask);
uint16x8_t row7_diff = veorq_u16(vreinterpretq_u16_s16(abs_row7), row7_mask);
/* Store diff values for rows 4, 5, 6, and 7. */
vst1q_u16(block_diff + 4 * DCTSIZE, row4_diff);
vst1q_u16(block_diff + 5 * DCTSIZE, row5_diff);
vst1q_u16(block_diff + 6 * DCTSIZE, row6_diff);
vst1q_u16(block_diff + 7 * DCTSIZE, row7_diff);
/* Construct bitmap to accelerate encoding of AC coefficients. A set bit
* means that the corresponding coefficient != 0.
*/
uint8x8_t row0_nbits_gt0 = vcgt_u8(row0_nbits, vdup_n_u8(0));
uint8x8_t row1_nbits_gt0 = vcgt_u8(row1_nbits, vdup_n_u8(0));
uint8x8_t row2_nbits_gt0 = vcgt_u8(row2_nbits, vdup_n_u8(0));
uint8x8_t row3_nbits_gt0 = vcgt_u8(row3_nbits, vdup_n_u8(0));
uint8x8_t row4_nbits_gt0 = vcgt_u8(row4_nbits, vdup_n_u8(0));
uint8x8_t row5_nbits_gt0 = vcgt_u8(row5_nbits, vdup_n_u8(0));
uint8x8_t row6_nbits_gt0 = vcgt_u8(row6_nbits, vdup_n_u8(0));
uint8x8_t row7_nbits_gt0 = vcgt_u8(row7_nbits, vdup_n_u8(0));
/* { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 } */
const uint8x8_t bitmap_mask =
vreinterpret_u8_u64(vmov_n_u64(0x0102040810204080));
row0_nbits_gt0 = vand_u8(row0_nbits_gt0, bitmap_mask);
row1_nbits_gt0 = vand_u8(row1_nbits_gt0, bitmap_mask);
row2_nbits_gt0 = vand_u8(row2_nbits_gt0, bitmap_mask);
row3_nbits_gt0 = vand_u8(row3_nbits_gt0, bitmap_mask);
row4_nbits_gt0 = vand_u8(row4_nbits_gt0, bitmap_mask);
row5_nbits_gt0 = vand_u8(row5_nbits_gt0, bitmap_mask);
row6_nbits_gt0 = vand_u8(row6_nbits_gt0, bitmap_mask);
row7_nbits_gt0 = vand_u8(row7_nbits_gt0, bitmap_mask);
uint8x8_t bitmap_rows_10 = vpadd_u8(row1_nbits_gt0, row0_nbits_gt0);
uint8x8_t bitmap_rows_32 = vpadd_u8(row3_nbits_gt0, row2_nbits_gt0);
uint8x8_t bitmap_rows_54 = vpadd_u8(row5_nbits_gt0, row4_nbits_gt0);
uint8x8_t bitmap_rows_76 = vpadd_u8(row7_nbits_gt0, row6_nbits_gt0);
uint8x8_t bitmap_rows_3210 = vpadd_u8(bitmap_rows_32, bitmap_rows_10);
uint8x8_t bitmap_rows_7654 = vpadd_u8(bitmap_rows_76, bitmap_rows_54);
uint8x8_t bitmap = vpadd_u8(bitmap_rows_7654, bitmap_rows_3210);
/* Shift left to remove DC bit. */
bitmap = vreinterpret_u8_u64(vshl_n_u64(vreinterpret_u64_u8(bitmap), 1));
/* Move bitmap to 32-bit scalar registers. */
uint32_t bitmap_1_32 = vget_lane_u32(vreinterpret_u32_u8(bitmap), 1);
uint32_t bitmap_33_63 = vget_lane_u32(vreinterpret_u32_u8(bitmap), 0);
/* Set up state and bit buffer for output bitstream. */
working_state *state_ptr = (working_state *)state;
int free_bits = state_ptr->cur.free_bits;
size_t put_buffer = state_ptr->cur.put_buffer;
/* Encode DC coefficient. */
unsigned int nbits = block_nbits[0];
/* Emit Huffman-coded symbol and additional diff bits. */
unsigned int diff = block_diff[0];
PUT_CODE(dctbl->ehufco[nbits], dctbl->ehufsi[nbits], diff)
/* Encode AC coefficients. */
unsigned int r = 0; /* r = run length of zeros */
unsigned int i = 1; /* i = number of coefficients encoded */
/* Code and size information for a run length of 16 zero coefficients */
const unsigned int code_0xf0 = actbl->ehufco[0xf0];
const unsigned int size_0xf0 = actbl->ehufsi[0xf0];
while (bitmap_1_32 != 0) {
r = BUILTIN_CLZ(bitmap_1_32);
i += r;
bitmap_1_32 <<= r;
nbits = block_nbits[i];
diff = block_diff[i];
while (r > 15) {
/* If run length > 15, emit special run-length-16 codes. */
PUT_BITS(code_0xf0, size_0xf0)
r -= 16;
}
/* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */
unsigned int rs = (r << 4) + nbits;
PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff)
i++;
bitmap_1_32 <<= 1;
}
r = 33 - i;
i = 33;
while (bitmap_33_63 != 0) {
unsigned int leading_zeros = BUILTIN_CLZ(bitmap_33_63);
r += leading_zeros;
i += leading_zeros;
bitmap_33_63 <<= leading_zeros;
nbits = block_nbits[i];
diff = block_diff[i];
while (r > 15) {
/* If run length > 15, emit special run-length-16 codes. */
PUT_BITS(code_0xf0, size_0xf0)
r -= 16;
}
/* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */
unsigned int rs = (r << 4) + nbits;
PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff)
r = 0;
i++;
bitmap_33_63 <<= 1;
}
/* If the last coefficient(s) were zero, emit an end-of-block (EOB) code.
* The value of RS for the EOB code is 0.
*/
if (i != 64) {
PUT_BITS(actbl->ehufco[0], actbl->ehufsi[0])
}
state_ptr->cur.put_buffer = put_buffer;
state_ptr->cur.free_bits = free_bits;
return buffer;
}

View File

@@ -6,6 +6,7 @@
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, D. R. Commander. * Copyright (C) 2009-2011, 2013-2014, 2016, 2018, D. R. Commander.
* Copyright (C) 2015-2016, 2018, Matthieu Darbois. * Copyright (C) 2015-2016, 2018, Matthieu Darbois.
* Copyright (C) 2019, Google LLC. * Copyright (C) 2019, Google LLC.
* Copyright (C) 2020, Arm Limited.
* *
* Based on the x86 SIMD extension for IJG JPEG library, * Based on the x86 SIMD extension for IJG JPEG library,
* Copyright (C) 1999-2006, MIYASAKA Masaru. * Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -17,12 +18,12 @@
*/ */
#define JPEG_INTERNALS #define JPEG_INTERNALS
#include "../../jinclude.h" #include "../../../jinclude.h"
#include "../../jpeglib.h" #include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../jsimd.h" #include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include <stdio.h> #include <stdio.h>
#include <string.h> #include <string.h>
@@ -164,6 +165,19 @@ jsimd_can_rgb_ycc(void)
GLOBAL(int) GLOBAL(int)
jsimd_can_rgb_gray(void) jsimd_can_rgb_gray(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -246,6 +260,37 @@ jsimd_rgb_gray_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf,
JSAMPIMAGE output_buf, JDIMENSION output_row, JSAMPIMAGE output_buf, JDIMENSION output_row,
int num_rows) int num_rows)
{ {
void (*neonfct) (JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);
switch (cinfo->in_color_space) {
case JCS_EXT_RGB:
neonfct = jsimd_extrgb_gray_convert_neon;
break;
case JCS_EXT_RGBX:
case JCS_EXT_RGBA:
neonfct = jsimd_extrgbx_gray_convert_neon;
break;
case JCS_EXT_BGR:
neonfct = jsimd_extbgr_gray_convert_neon;
break;
case JCS_EXT_BGRX:
case JCS_EXT_BGRA:
neonfct = jsimd_extbgrx_gray_convert_neon;
break;
case JCS_EXT_XBGR:
case JCS_EXT_ABGR:
neonfct = jsimd_extxbgr_gray_convert_neon;
break;
case JCS_EXT_XRGB:
case JCS_EXT_ARGB:
neonfct = jsimd_extxrgb_gray_convert_neon;
break;
default:
neonfct = jsimd_extrgb_gray_convert_neon;
break;
}
neonfct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);
} }
GLOBAL(void) GLOBAL(void)
@@ -298,12 +343,38 @@ jsimd_ycc_rgb565_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_downsample(void) jsimd_can_h2v2_downsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (DCTSIZE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v1_downsample(void) jsimd_can_h2v1_downsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (DCTSIZE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -311,23 +382,50 @@ GLOBAL(void)
jsimd_h2v2_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v2_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY output_data) JSAMPARRAY input_data, JSAMPARRAY output_data)
{ {
jsimd_h2v2_downsample_neon(cinfo->image_width, cinfo->max_v_samp_factor,
compptr->v_samp_factor, compptr->width_in_blocks,
input_data, output_data);
} }
GLOBAL(void) GLOBAL(void)
jsimd_h2v1_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v1_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY output_data) JSAMPARRAY input_data, JSAMPARRAY output_data)
{ {
jsimd_h2v1_downsample_neon(cinfo->image_width, cinfo->max_v_samp_factor,
compptr->v_samp_factor, compptr->width_in_blocks,
input_data, output_data);
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_upsample(void) jsimd_can_h2v2_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v1_upsample(void) jsimd_can_h2v1_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -335,17 +433,32 @@ GLOBAL(void)
jsimd_h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v2_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width,
input_data, output_data_ptr);
} }
GLOBAL(void) GLOBAL(void)
jsimd_h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v1_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width,
input_data, output_data_ptr);
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_fancy_upsample(void) jsimd_can_h2v2_fancy_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -366,10 +479,30 @@ jsimd_can_h2v1_fancy_upsample(void)
return 0; return 0;
} }
GLOBAL(int)
jsimd_can_h1v2_fancy_upsample(void)
{
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0;
}
GLOBAL(void) GLOBAL(void)
jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v2_fancy_upsample_neon(cinfo->max_v_samp_factor,
compptr->downsampled_width, input_data,
output_data_ptr);
} }
GLOBAL(void) GLOBAL(void)
@@ -381,15 +514,46 @@ jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
output_data_ptr); output_data_ptr);
} }
GLOBAL(void)
jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{
jsimd_h1v2_fancy_upsample_neon(cinfo->max_v_samp_factor,
compptr->downsampled_width, input_data,
output_data_ptr);
}
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_merged_upsample(void) jsimd_can_h2v2_merged_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v1_merged_upsample(void) jsimd_can_h2v1_merged_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -397,12 +561,74 @@ GLOBAL(void)
jsimd_h2v2_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, jsimd_h2v2_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf)
{ {
void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY);
switch (cinfo->out_color_space) {
case JCS_EXT_RGB:
neonfct = jsimd_h2v2_extrgb_merged_upsample_neon;
break;
case JCS_EXT_RGBX:
case JCS_EXT_RGBA:
neonfct = jsimd_h2v2_extrgbx_merged_upsample_neon;
break;
case JCS_EXT_BGR:
neonfct = jsimd_h2v2_extbgr_merged_upsample_neon;
break;
case JCS_EXT_BGRX:
case JCS_EXT_BGRA:
neonfct = jsimd_h2v2_extbgrx_merged_upsample_neon;
break;
case JCS_EXT_XBGR:
case JCS_EXT_ABGR:
neonfct = jsimd_h2v2_extxbgr_merged_upsample_neon;
break;
case JCS_EXT_XRGB:
case JCS_EXT_ARGB:
neonfct = jsimd_h2v2_extxrgb_merged_upsample_neon;
break;
default:
neonfct = jsimd_h2v2_extrgb_merged_upsample_neon;
break;
}
neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
} }
GLOBAL(void) GLOBAL(void)
jsimd_h2v1_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, jsimd_h2v1_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf)
{ {
void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY);
switch (cinfo->out_color_space) {
case JCS_EXT_RGB:
neonfct = jsimd_h2v1_extrgb_merged_upsample_neon;
break;
case JCS_EXT_RGBX:
case JCS_EXT_RGBA:
neonfct = jsimd_h2v1_extrgbx_merged_upsample_neon;
break;
case JCS_EXT_BGR:
neonfct = jsimd_h2v1_extbgr_merged_upsample_neon;
break;
case JCS_EXT_BGRX:
case JCS_EXT_BGRA:
neonfct = jsimd_h2v1_extbgrx_merged_upsample_neon;
break;
case JCS_EXT_XBGR:
case JCS_EXT_ABGR:
neonfct = jsimd_h2v1_extxbgr_merged_upsample_neon;
break;
case JCS_EXT_XRGB:
case JCS_EXT_ARGB:
neonfct = jsimd_h2v1_extxrgb_merged_upsample_neon;
break;
default:
neonfct = jsimd_h2v1_extrgb_merged_upsample_neon;
break;
}
neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
} }
GLOBAL(int) GLOBAL(int)
@@ -448,6 +674,17 @@ jsimd_convsamp_float(JSAMPARRAY sample_data, JDIMENSION start_col,
GLOBAL(int) GLOBAL(int)
jsimd_can_fdct_islow(void) jsimd_can_fdct_islow(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (DCTSIZE != 8)
return 0;
if (sizeof(DCTELEM) != 2)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -477,6 +714,7 @@ jsimd_can_fdct_float(void)
GLOBAL(void) GLOBAL(void)
jsimd_fdct_islow(DCTELEM *data) jsimd_fdct_islow(DCTELEM *data)
{ {
jsimd_fdct_islow_neon(data);
} }
GLOBAL(void) GLOBAL(void)
@@ -696,6 +934,16 @@ jsimd_huff_encode_one_block(void *state, JOCTET *buffer, JCOEFPTR block,
GLOBAL(int) GLOBAL(int)
jsimd_can_encode_mcu_AC_first_prepare(void) jsimd_can_encode_mcu_AC_first_prepare(void)
{ {
init_simd();
if (DCTSIZE != 8)
return 0;
if (sizeof(JCOEF) != 2)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -704,11 +952,23 @@ jsimd_encode_mcu_AC_first_prepare(const JCOEF *block,
const int *jpeg_natural_order_start, int Sl, const int *jpeg_natural_order_start, int Sl,
int Al, JCOEF *values, size_t *zerobits) int Al, JCOEF *values, size_t *zerobits)
{ {
jsimd_encode_mcu_AC_first_prepare_neon(block, jpeg_natural_order_start,
Sl, Al, values, zerobits);
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_encode_mcu_AC_refine_prepare(void) jsimd_can_encode_mcu_AC_refine_prepare(void)
{ {
init_simd();
if (DCTSIZE != 8)
return 0;
if (sizeof(JCOEF) != 2)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -717,5 +977,7 @@ jsimd_encode_mcu_AC_refine_prepare(const JCOEF *block,
const int *jpeg_natural_order_start, int Sl, const int *jpeg_natural_order_start, int Sl,
int Al, JCOEF *absvalues, size_t *bits) int Al, JCOEF *absvalues, size_t *bits)
{ {
return 0; return jsimd_encode_mcu_AC_refine_prepare_neon(block,
jpeg_natural_order_start, Sl,
Al, absvalues, bits);
} }

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,316 @@
/*
* jccolext-neon.c - colorspace conversion (64-bit Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
/* This file is included by jccolor-neon.c */
/* RGB -> YCbCr conversion is defined by the following equations:
* Y = 0.29900 * R + 0.58700 * G + 0.11400 * B
* Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128
* Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128
*
* Avoid floating point arithmetic by using shifted integer constants:
* 0.29899597 = 19595 * 2^-16
* 0.58700561 = 38470 * 2^-16
* 0.11399841 = 7471 * 2^-16
* 0.16874695 = 11059 * 2^-16
* 0.33125305 = 21709 * 2^-16
* 0.50000000 = 32768 * 2^-16
* 0.41868592 = 27439 * 2^-16
* 0.08131409 = 5329 * 2^-16
* These constants are defined in jccolor-neon.c
*
* We add the fixed-point equivalent of 0.5 to Cb and Cr, which effectively
* rounds up or down the result via integer truncation.
*/
void jsimd_rgb_ycc_convert_neon(JDIMENSION image_width, JSAMPARRAY input_buf,
JSAMPIMAGE output_buf, JDIMENSION output_row,
int num_rows)
{
/* Pointer to RGB(X/A) input data */
JSAMPROW inptr;
/* Pointers to Y, Cb, and Cr output data */
JSAMPROW outptr0, outptr1, outptr2;
/* Allocate temporary buffer for final (image_width % 16) pixels in row. */
ALIGN(16) uint8_t tmp_buf[16 * RGB_PIXELSIZE];
/* Set up conversion constants. */
const uint16x8_t consts = vld1q_u16(jsimd_rgb_ycc_neon_consts);
const uint32x4_t scaled_128_5 = vdupq_n_u32((128 << 16) + 32767);
while (--num_rows >= 0) {
inptr = *input_buf++;
outptr0 = output_buf[0][output_row];
outptr1 = output_buf[1][output_row];
outptr2 = output_buf[2][output_row];
output_row++;
int cols_remaining = image_width;
for (; cols_remaining >= 16; cols_remaining -= 16) {
#if RGB_PIXELSIZE == 4
uint8x16x4_t input_pixels = vld4q_u8(inptr);
#else
uint8x16x3_t input_pixels = vld3q_u8(inptr);
#endif
uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED]));
uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN]));
uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE]));
uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED]));
uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN]));
uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE]));
/* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */
uint32x4_t y_ll = vmull_laneq_u16(vget_low_u16(r_l), consts, 0);
y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(g_l), consts, 1);
y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(b_l), consts, 2);
uint32x4_t y_lh = vmull_laneq_u16(vget_high_u16(r_l), consts, 0);
y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(g_l), consts, 1);
y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(b_l), consts, 2);
uint32x4_t y_hl = vmull_laneq_u16(vget_low_u16(r_h), consts, 0);
y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(g_h), consts, 1);
y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(b_h), consts, 2);
uint32x4_t y_hh = vmull_laneq_u16(vget_high_u16(r_h), consts, 0);
y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(g_h), consts, 1);
y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(b_h), consts, 2);
/* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */
uint32x4_t cb_ll = scaled_128_5;
cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(r_l), consts, 3);
cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(g_l), consts, 4);
cb_ll = vmlal_laneq_u16(cb_ll, vget_low_u16(b_l), consts, 5);
uint32x4_t cb_lh = scaled_128_5;
cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(r_l), consts, 3);
cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(g_l), consts, 4);
cb_lh = vmlal_laneq_u16(cb_lh, vget_high_u16(b_l), consts, 5);
uint32x4_t cb_hl = scaled_128_5;
cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(r_h), consts, 3);
cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(g_h), consts, 4);
cb_hl = vmlal_laneq_u16(cb_hl, vget_low_u16(b_h), consts, 5);
uint32x4_t cb_hh = scaled_128_5;
cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(r_h), consts, 3);
cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(g_h), consts, 4);
cb_hh = vmlal_laneq_u16(cb_hh, vget_high_u16(b_h), consts, 5);
/* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */
uint32x4_t cr_ll = scaled_128_5;
cr_ll = vmlal_laneq_u16(cr_ll, vget_low_u16(r_l), consts, 5);
cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(g_l), consts, 6);
cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(b_l), consts, 7);
uint32x4_t cr_lh = scaled_128_5;
cr_lh = vmlal_laneq_u16(cr_lh, vget_high_u16(r_l), consts, 5);
cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(g_l), consts, 6);
cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(b_l), consts, 7);
uint32x4_t cr_hl = scaled_128_5;
cr_hl = vmlal_laneq_u16(cr_hl, vget_low_u16(r_h), consts, 5);
cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(g_h), consts, 6);
cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(b_h), consts, 7);
uint32x4_t cr_hh = scaled_128_5;
cr_hh = vmlal_laneq_u16(cr_hh, vget_high_u16(r_h), consts, 5);
cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(g_h), consts, 6);
cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(b_h), consts, 7);
/* Descale Y values (rounding right shift) and narrow to 16-bit. */
uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16),
vrshrn_n_u32(y_lh, 16));
uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16),
vrshrn_n_u32(y_hh, 16));
/* Descale Cb values (right shift) and narrow to 16-bit. */
uint16x8_t cb_l = vcombine_u16(vshrn_n_u32(cb_ll, 16),
vshrn_n_u32(cb_lh, 16));
uint16x8_t cb_h = vcombine_u16(vshrn_n_u32(cb_hl, 16),
vshrn_n_u32(cb_hh, 16));
/* Descale Cr values (right shift) and narrow to 16-bit. */
uint16x8_t cr_l = vcombine_u16(vshrn_n_u32(cr_ll, 16),
vshrn_n_u32(cr_lh, 16));
uint16x8_t cr_h = vcombine_u16(vshrn_n_u32(cr_hl, 16),
vshrn_n_u32(cr_hh, 16));
/* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer
* overwrite is permitted up to the next multiple of ALIGN_SIZE bytes.
*/
vst1q_u8(outptr0, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h)));
vst1q_u8(outptr1, vcombine_u8(vmovn_u16(cb_l), vmovn_u16(cb_h)));
vst1q_u8(outptr2, vcombine_u8(vmovn_u16(cr_l), vmovn_u16(cr_h)));
/* Increment pointers. */
inptr += (16 * RGB_PIXELSIZE);
outptr0 += 16;
outptr1 += 16;
outptr2 += 16;
}
if (cols_remaining > 8) {
/* To prevent buffer overread by the vector load instructions, the last
* (image_width % 16) columns of data are first memcopied to a temporary
* buffer large enough to accommodate the vector load.
*/
memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE);
inptr = tmp_buf;
#if RGB_PIXELSIZE == 4
uint8x16x4_t input_pixels = vld4q_u8(inptr);
#else
uint8x16x3_t input_pixels = vld3q_u8(inptr);
#endif
uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED]));
uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN]));
uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE]));
uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED]));
uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN]));
uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE]));
/* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */
uint32x4_t y_ll = vmull_laneq_u16(vget_low_u16(r_l), consts, 0);
y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(g_l), consts, 1);
y_ll = vmlal_laneq_u16(y_ll, vget_low_u16(b_l), consts, 2);
uint32x4_t y_lh = vmull_laneq_u16(vget_high_u16(r_l), consts, 0);
y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(g_l), consts, 1);
y_lh = vmlal_laneq_u16(y_lh, vget_high_u16(b_l), consts, 2);
uint32x4_t y_hl = vmull_laneq_u16(vget_low_u16(r_h), consts, 0);
y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(g_h), consts, 1);
y_hl = vmlal_laneq_u16(y_hl, vget_low_u16(b_h), consts, 2);
uint32x4_t y_hh = vmull_laneq_u16(vget_high_u16(r_h), consts, 0);
y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(g_h), consts, 1);
y_hh = vmlal_laneq_u16(y_hh, vget_high_u16(b_h), consts, 2);
/* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */
uint32x4_t cb_ll = scaled_128_5;
cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(r_l), consts, 3);
cb_ll = vmlsl_laneq_u16(cb_ll, vget_low_u16(g_l), consts, 4);
cb_ll = vmlal_laneq_u16(cb_ll, vget_low_u16(b_l), consts, 5);
uint32x4_t cb_lh = scaled_128_5;
cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(r_l), consts, 3);
cb_lh = vmlsl_laneq_u16(cb_lh, vget_high_u16(g_l), consts, 4);
cb_lh = vmlal_laneq_u16(cb_lh, vget_high_u16(b_l), consts, 5);
uint32x4_t cb_hl = scaled_128_5;
cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(r_h), consts, 3);
cb_hl = vmlsl_laneq_u16(cb_hl, vget_low_u16(g_h), consts, 4);
cb_hl = vmlal_laneq_u16(cb_hl, vget_low_u16(b_h), consts, 5);
uint32x4_t cb_hh = scaled_128_5;
cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(r_h), consts, 3);
cb_hh = vmlsl_laneq_u16(cb_hh, vget_high_u16(g_h), consts, 4);
cb_hh = vmlal_laneq_u16(cb_hh, vget_high_u16(b_h), consts, 5);
/* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */
uint32x4_t cr_ll = scaled_128_5;
cr_ll = vmlal_laneq_u16(cr_ll, vget_low_u16(r_l), consts, 5);
cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(g_l), consts, 6);
cr_ll = vmlsl_laneq_u16(cr_ll, vget_low_u16(b_l), consts, 7);
uint32x4_t cr_lh = scaled_128_5;
cr_lh = vmlal_laneq_u16(cr_lh, vget_high_u16(r_l), consts, 5);
cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(g_l), consts, 6);
cr_lh = vmlsl_laneq_u16(cr_lh, vget_high_u16(b_l), consts, 7);
uint32x4_t cr_hl = scaled_128_5;
cr_hl = vmlal_laneq_u16(cr_hl, vget_low_u16(r_h), consts, 5);
cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(g_h), consts, 6);
cr_hl = vmlsl_laneq_u16(cr_hl, vget_low_u16(b_h), consts, 7);
uint32x4_t cr_hh = scaled_128_5;
cr_hh = vmlal_laneq_u16(cr_hh, vget_high_u16(r_h), consts, 5);
cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(g_h), consts, 6);
cr_hh = vmlsl_laneq_u16(cr_hh, vget_high_u16(b_h), consts, 7);
/* Descale Y values (rounding right shift) and narrow to 16-bit. */
uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16),
vrshrn_n_u32(y_lh, 16));
uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16),
vrshrn_n_u32(y_hh, 16));
/* Descale Cb values (right shift) and narrow to 16-bit. */
uint16x8_t cb_l = vcombine_u16(vshrn_n_u32(cb_ll, 16),
vshrn_n_u32(cb_lh, 16));
uint16x8_t cb_h = vcombine_u16(vshrn_n_u32(cb_hl, 16),
vshrn_n_u32(cb_hh, 16));
/* Descale Cr values (right shift) and narrow to 16-bit. */
uint16x8_t cr_l = vcombine_u16(vshrn_n_u32(cr_ll, 16),
vshrn_n_u32(cr_lh, 16));
uint16x8_t cr_h = vcombine_u16(vshrn_n_u32(cr_hl, 16),
vshrn_n_u32(cr_hh, 16));
/* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer
* overwrite is permitted up to the next multiple of ALIGN_SIZE bytes.
*/
vst1q_u8(outptr0, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h)));
vst1q_u8(outptr1, vcombine_u8(vmovn_u16(cb_l), vmovn_u16(cb_h)));
vst1q_u8(outptr2, vcombine_u8(vmovn_u16(cr_l), vmovn_u16(cr_h)));
} else if (cols_remaining > 0) {
/* To prevent buffer overread by the vector load instructions, the last
* (image_width % 8) columns of data are first memcopied to a temporary
* buffer large enough to accommodate the vector load.
*/
memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE);
inptr = tmp_buf;
#if RGB_PIXELSIZE == 4
uint8x8x4_t input_pixels = vld4_u8(inptr);
#else
uint8x8x3_t input_pixels = vld3_u8(inptr);
#endif
uint16x8_t r = vmovl_u8(input_pixels.val[RGB_RED]);
uint16x8_t g = vmovl_u8(input_pixels.val[RGB_GREEN]);
uint16x8_t b = vmovl_u8(input_pixels.val[RGB_BLUE]);
/* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */
uint32x4_t y_l = vmull_laneq_u16(vget_low_u16(r), consts, 0);
y_l = vmlal_laneq_u16(y_l, vget_low_u16(g), consts, 1);
y_l = vmlal_laneq_u16(y_l, vget_low_u16(b), consts, 2);
uint32x4_t y_h = vmull_laneq_u16(vget_high_u16(r), consts, 0);
y_h = vmlal_laneq_u16(y_h, vget_high_u16(g), consts, 1);
y_h = vmlal_laneq_u16(y_h, vget_high_u16(b), consts, 2);
/* Compute Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + 128 */
uint32x4_t cb_l = scaled_128_5;
cb_l = vmlsl_laneq_u16(cb_l, vget_low_u16(r), consts, 3);
cb_l = vmlsl_laneq_u16(cb_l, vget_low_u16(g), consts, 4);
cb_l = vmlal_laneq_u16(cb_l, vget_low_u16(b), consts, 5);
uint32x4_t cb_h = scaled_128_5;
cb_h = vmlsl_laneq_u16(cb_h, vget_high_u16(r), consts, 3);
cb_h = vmlsl_laneq_u16(cb_h, vget_high_u16(g), consts, 4);
cb_h = vmlal_laneq_u16(cb_h, vget_high_u16(b), consts, 5);
/* Compute Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + 128 */
uint32x4_t cr_l = scaled_128_5;
cr_l = vmlal_laneq_u16(cr_l, vget_low_u16(r), consts, 5);
cr_l = vmlsl_laneq_u16(cr_l, vget_low_u16(g), consts, 6);
cr_l = vmlsl_laneq_u16(cr_l, vget_low_u16(b), consts, 7);
uint32x4_t cr_h = scaled_128_5;
cr_h = vmlal_laneq_u16(cr_h, vget_high_u16(r), consts, 5);
cr_h = vmlsl_laneq_u16(cr_h, vget_high_u16(g), consts, 6);
cr_h = vmlsl_laneq_u16(cr_h, vget_high_u16(b), consts, 7);
/* Descale Y values (rounding right shift) and narrow to 16-bit. */
uint16x8_t y_u16 = vcombine_u16(vrshrn_n_u32(y_l, 16),
vrshrn_n_u32(y_h, 16));
/* Descale Cb values (right shift) and narrow to 16-bit. */
uint16x8_t cb_u16 = vcombine_u16(vshrn_n_u32(cb_l, 16),
vshrn_n_u32(cb_h, 16));
/* Descale Cr values (right shift) and narrow to 16-bit. */
uint16x8_t cr_u16 = vcombine_u16(vshrn_n_u32(cr_l, 16),
vshrn_n_u32(cr_h, 16));
/* Narrow Y, Cb, and Cr values to 8-bit and store to memory. Buffer
* overwrite is permitted up to the next multiple of ALIGN_SIZE bytes.
*/
vst1_u8(outptr0, vmovn_u16(y_u16));
vst1_u8(outptr1, vmovn_u16(cb_u16));
vst1_u8(outptr2, vmovn_u16(cr_u16));
}
}
}

View File

@@ -0,0 +1,403 @@
/*
* jchuff-neon.c - Huffman entropy encoding (64-bit Arm Neon)
*
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*
* NOTE: All referenced figures are from
* Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994.
*/
#define JPEG_INTERNALS
#include "../../../jinclude.h"
#include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../jsimd.h"
#include "../align.h"
#include "../jchuff.h"
#include "neon-compat.h"
#include <limits.h>
#include <arm_neon.h>
ALIGN(16) static const uint8_t jsimd_huff_encode_one_block_consts[] = {
0, 1, 2, 3, 16, 17, 32, 33,
18, 19, 4, 5, 6, 7, 20, 21,
34, 35, 48, 49, 255, 255, 50, 51,
36, 37, 22, 23, 8, 9, 10, 11,
255, 255, 6, 7, 20, 21, 34, 35,
48, 49, 255, 255, 50, 51, 36, 37,
54, 55, 40, 41, 26, 27, 12, 13,
14, 15, 28, 29, 42, 43, 56, 57,
6, 7, 20, 21, 34, 35, 48, 49,
50, 51, 36, 37, 22, 23, 8, 9,
26, 27, 12, 13, 255, 255, 14, 15,
28, 29, 42, 43, 56, 57, 255, 255,
52, 53, 54, 55, 40, 41, 26, 27,
12, 13, 255, 255, 14, 15, 28, 29,
26, 27, 40, 41, 42, 43, 28, 29,
14, 15, 30, 31, 44, 45, 46, 47
};
JOCTET *jsimd_huff_encode_one_block_neon(void *state, JOCTET *buffer,
JCOEFPTR block, int last_dc_val,
c_derived_tbl *dctbl,
c_derived_tbl *actbl)
{
uint16_t block_diff[DCTSIZE2];
/* Load lookup table indices for rows of zig-zag ordering. */
#ifdef HAVE_VLD1Q_U8_X4
const uint8x16x4_t idx_rows_0123 =
vld1q_u8_x4(jsimd_huff_encode_one_block_consts + 0 * DCTSIZE);
const uint8x16x4_t idx_rows_4567 =
vld1q_u8_x4(jsimd_huff_encode_one_block_consts + 8 * DCTSIZE);
#else
/* GCC does not currently support intrinsics vl1dq_<type>_x4(). */
const uint8x16x4_t idx_rows_0123 = { {
vld1q_u8(jsimd_huff_encode_one_block_consts + 0 * DCTSIZE),
vld1q_u8(jsimd_huff_encode_one_block_consts + 2 * DCTSIZE),
vld1q_u8(jsimd_huff_encode_one_block_consts + 4 * DCTSIZE),
vld1q_u8(jsimd_huff_encode_one_block_consts + 6 * DCTSIZE)
} };
const uint8x16x4_t idx_rows_4567 = { {
vld1q_u8(jsimd_huff_encode_one_block_consts + 8 * DCTSIZE),
vld1q_u8(jsimd_huff_encode_one_block_consts + 10 * DCTSIZE),
vld1q_u8(jsimd_huff_encode_one_block_consts + 12 * DCTSIZE),
vld1q_u8(jsimd_huff_encode_one_block_consts + 14 * DCTSIZE)
} };
#endif
/* Load 8x8 block of DCT coefficients. */
#ifdef HAVE_VLD1Q_U8_X4
const int8x16x4_t tbl_rows_0123 =
vld1q_s8_x4((int8_t *)(block + 0 * DCTSIZE));
const int8x16x4_t tbl_rows_4567 =
vld1q_s8_x4((int8_t *)(block + 4 * DCTSIZE));
#else
const int8x16x4_t tbl_rows_0123 = { {
vld1q_s8((int8_t *)(block + 0 * DCTSIZE)),
vld1q_s8((int8_t *)(block + 1 * DCTSIZE)),
vld1q_s8((int8_t *)(block + 2 * DCTSIZE)),
vld1q_s8((int8_t *)(block + 3 * DCTSIZE))
} };
const int8x16x4_t tbl_rows_4567 = { {
vld1q_s8((int8_t *)(block + 4 * DCTSIZE)),
vld1q_s8((int8_t *)(block + 5 * DCTSIZE)),
vld1q_s8((int8_t *)(block + 6 * DCTSIZE)),
vld1q_s8((int8_t *)(block + 7 * DCTSIZE))
} };
#endif
/* Initialise extra lookup tables. */
const int8x16x4_t tbl_rows_2345 = { {
tbl_rows_0123.val[2], tbl_rows_0123.val[3],
tbl_rows_4567.val[0], tbl_rows_4567.val[1]
} };
const int8x16x3_t tbl_rows_567 =
{ { tbl_rows_4567.val[1], tbl_rows_4567.val[2], tbl_rows_4567.val[3] } };
/* Shuffle coefficients into zig-zag order. */
int16x8_t row0 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_0123, idx_rows_0123.val[0]));
int16x8_t row1 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_0123, idx_rows_0123.val[1]));
int16x8_t row2 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_2345, idx_rows_0123.val[2]));
int16x8_t row3 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_0123, idx_rows_0123.val[3]));
int16x8_t row4 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_4567, idx_rows_4567.val[0]));
int16x8_t row5 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_2345, idx_rows_4567.val[1]));
int16x8_t row6 =
vreinterpretq_s16_s8(vqtbl4q_s8(tbl_rows_4567, idx_rows_4567.val[2]));
int16x8_t row7 =
vreinterpretq_s16_s8(vqtbl3q_s8(tbl_rows_567, idx_rows_4567.val[3]));
/* Compute DC coefficient difference value (F.1.1.5.1). */
row0 = vsetq_lane_s16(block[0] - last_dc_val, row0, 0);
/* Initialize AC coefficient lanes not reachable by lookup tables. */
row1 =
vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_4567.val[0]),
0), row1, 2);
row2 =
vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_0123.val[1]),
4), row2, 0);
row2 =
vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_4567.val[2]),
0), row2, 5);
row5 =
vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_0123.val[1]),
7), row5, 2);
row5 =
vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_4567.val[2]),
3), row5, 7);
row6 =
vsetq_lane_s16(vgetq_lane_s16(vreinterpretq_s16_s8(tbl_rows_0123.val[3]),
7), row6, 5);
/* DCT block is now in zig-zag order; start Huffman encoding process. */
int16x8_t abs_row0 = vabsq_s16(row0);
int16x8_t abs_row1 = vabsq_s16(row1);
int16x8_t abs_row2 = vabsq_s16(row2);
int16x8_t abs_row3 = vabsq_s16(row3);
int16x8_t abs_row4 = vabsq_s16(row4);
int16x8_t abs_row5 = vabsq_s16(row5);
int16x8_t abs_row6 = vabsq_s16(row6);
int16x8_t abs_row7 = vabsq_s16(row7);
/* For negative coeffs: diff = abs(coeff) -1 = ~abs(coeff) */
uint16x8_t row0_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row0, vshrq_n_s16(row0, 15)));
uint16x8_t row1_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row1, vshrq_n_s16(row1, 15)));
uint16x8_t row2_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row2, vshrq_n_s16(row2, 15)));
uint16x8_t row3_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row3, vshrq_n_s16(row3, 15)));
uint16x8_t row4_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row4, vshrq_n_s16(row4, 15)));
uint16x8_t row5_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row5, vshrq_n_s16(row5, 15)));
uint16x8_t row6_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row6, vshrq_n_s16(row6, 15)));
uint16x8_t row7_diff =
vreinterpretq_u16_s16(veorq_s16(abs_row7, vshrq_n_s16(row7, 15)));
/* Construct bitmap to accelerate encoding of AC coefficients. A set bit
* means that the corresponding coefficient != 0.
*/
uint8x8_t abs_row0_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row0),
vdupq_n_u16(0)));
uint8x8_t abs_row1_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row1),
vdupq_n_u16(0)));
uint8x8_t abs_row2_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row2),
vdupq_n_u16(0)));
uint8x8_t abs_row3_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row3),
vdupq_n_u16(0)));
uint8x8_t abs_row4_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row4),
vdupq_n_u16(0)));
uint8x8_t abs_row5_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row5),
vdupq_n_u16(0)));
uint8x8_t abs_row6_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row6),
vdupq_n_u16(0)));
uint8x8_t abs_row7_gt0 = vmovn_u16(vcgtq_u16(vreinterpretq_u16_s16(abs_row7),
vdupq_n_u16(0)));
/* { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 } */
const uint8x8_t bitmap_mask =
vreinterpret_u8_u64(vmov_n_u64(0x0102040810204080));
abs_row0_gt0 = vand_u8(abs_row0_gt0, bitmap_mask);
abs_row1_gt0 = vand_u8(abs_row1_gt0, bitmap_mask);
abs_row2_gt0 = vand_u8(abs_row2_gt0, bitmap_mask);
abs_row3_gt0 = vand_u8(abs_row3_gt0, bitmap_mask);
abs_row4_gt0 = vand_u8(abs_row4_gt0, bitmap_mask);
abs_row5_gt0 = vand_u8(abs_row5_gt0, bitmap_mask);
abs_row6_gt0 = vand_u8(abs_row6_gt0, bitmap_mask);
abs_row7_gt0 = vand_u8(abs_row7_gt0, bitmap_mask);
uint8x8_t bitmap_rows_10 = vpadd_u8(abs_row1_gt0, abs_row0_gt0);
uint8x8_t bitmap_rows_32 = vpadd_u8(abs_row3_gt0, abs_row2_gt0);
uint8x8_t bitmap_rows_54 = vpadd_u8(abs_row5_gt0, abs_row4_gt0);
uint8x8_t bitmap_rows_76 = vpadd_u8(abs_row7_gt0, abs_row6_gt0);
uint8x8_t bitmap_rows_3210 = vpadd_u8(bitmap_rows_32, bitmap_rows_10);
uint8x8_t bitmap_rows_7654 = vpadd_u8(bitmap_rows_76, bitmap_rows_54);
uint8x8_t bitmap_all = vpadd_u8(bitmap_rows_7654, bitmap_rows_3210);
/* Shift left to remove DC bit. */
bitmap_all =
vreinterpret_u8_u64(vshl_n_u64(vreinterpret_u64_u8(bitmap_all), 1));
/* Count bits set (number of non-zero coefficients) in bitmap. */
unsigned int non_zero_coefficients = vaddv_u8(vcnt_u8(bitmap_all));
/* Move bitmap to 64-bit scalar register. */
uint64_t bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0);
/* Set up state and bit buffer for output bitstream. */
working_state *state_ptr = (working_state *)state;
int free_bits = state_ptr->cur.free_bits;
size_t put_buffer = state_ptr->cur.put_buffer;
/* Encode DC coefficient. */
/* Find nbits required to specify sign and amplitude of coefficient. */
#if defined(_MSC_VER) && !defined(__clang__)
unsigned int lz = BUILTIN_CLZ(vgetq_lane_s16(abs_row0, 0));
#else
unsigned int lz;
__asm__("clz %w0, %w1" : "=r"(lz) : "r"(vgetq_lane_s16(abs_row0, 0)));
#endif
unsigned int nbits = 32 - lz;
/* Emit Huffman-coded symbol and additional diff bits. */
unsigned int diff = (unsigned int)(vgetq_lane_u16(row0_diff, 0) << lz) >> lz;
PUT_CODE(dctbl->ehufco[nbits], dctbl->ehufsi[nbits], diff)
/* Encode AC coefficients. */
unsigned int r = 0; /* r = run length of zeros */
unsigned int i = 1; /* i = number of coefficients encoded */
/* Code and size information for a run length of 16 zero coefficients */
const unsigned int code_0xf0 = actbl->ehufco[0xf0];
const unsigned int size_0xf0 = actbl->ehufsi[0xf0];
/* The most efficient method of computing nbits and diff depends on the
* number of non-zero coefficients. If the bitmap is not too sparse (> 8
* non-zero AC coefficients), it is beneficial to use Neon; else we compute
* nbits and diff on demand using scalar code.
*/
if (non_zero_coefficients > 8) {
uint8_t block_nbits[DCTSIZE2];
int16x8_t row0_lz = vclzq_s16(abs_row0);
int16x8_t row1_lz = vclzq_s16(abs_row1);
int16x8_t row2_lz = vclzq_s16(abs_row2);
int16x8_t row3_lz = vclzq_s16(abs_row3);
int16x8_t row4_lz = vclzq_s16(abs_row4);
int16x8_t row5_lz = vclzq_s16(abs_row5);
int16x8_t row6_lz = vclzq_s16(abs_row6);
int16x8_t row7_lz = vclzq_s16(abs_row7);
/* Compute nbits needed to specify magnitude of each coefficient. */
uint8x8_t row0_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row0_lz)));
uint8x8_t row1_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row1_lz)));
uint8x8_t row2_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row2_lz)));
uint8x8_t row3_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row3_lz)));
uint8x8_t row4_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row4_lz)));
uint8x8_t row5_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row5_lz)));
uint8x8_t row6_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row6_lz)));
uint8x8_t row7_nbits = vsub_u8(vdup_n_u8(16),
vmovn_u16(vreinterpretq_u16_s16(row7_lz)));
/* Store nbits. */
vst1_u8(block_nbits + 0 * DCTSIZE, row0_nbits);
vst1_u8(block_nbits + 1 * DCTSIZE, row1_nbits);
vst1_u8(block_nbits + 2 * DCTSIZE, row2_nbits);
vst1_u8(block_nbits + 3 * DCTSIZE, row3_nbits);
vst1_u8(block_nbits + 4 * DCTSIZE, row4_nbits);
vst1_u8(block_nbits + 5 * DCTSIZE, row5_nbits);
vst1_u8(block_nbits + 6 * DCTSIZE, row6_nbits);
vst1_u8(block_nbits + 7 * DCTSIZE, row7_nbits);
/* Mask bits not required to specify sign and amplitude of diff. */
row0_diff = vshlq_u16(row0_diff, row0_lz);
row1_diff = vshlq_u16(row1_diff, row1_lz);
row2_diff = vshlq_u16(row2_diff, row2_lz);
row3_diff = vshlq_u16(row3_diff, row3_lz);
row4_diff = vshlq_u16(row4_diff, row4_lz);
row5_diff = vshlq_u16(row5_diff, row5_lz);
row6_diff = vshlq_u16(row6_diff, row6_lz);
row7_diff = vshlq_u16(row7_diff, row7_lz);
row0_diff = vshlq_u16(row0_diff, vnegq_s16(row0_lz));
row1_diff = vshlq_u16(row1_diff, vnegq_s16(row1_lz));
row2_diff = vshlq_u16(row2_diff, vnegq_s16(row2_lz));
row3_diff = vshlq_u16(row3_diff, vnegq_s16(row3_lz));
row4_diff = vshlq_u16(row4_diff, vnegq_s16(row4_lz));
row5_diff = vshlq_u16(row5_diff, vnegq_s16(row5_lz));
row6_diff = vshlq_u16(row6_diff, vnegq_s16(row6_lz));
row7_diff = vshlq_u16(row7_diff, vnegq_s16(row7_lz));
/* Store diff bits. */
vst1q_u16(block_diff + 0 * DCTSIZE, row0_diff);
vst1q_u16(block_diff + 1 * DCTSIZE, row1_diff);
vst1q_u16(block_diff + 2 * DCTSIZE, row2_diff);
vst1q_u16(block_diff + 3 * DCTSIZE, row3_diff);
vst1q_u16(block_diff + 4 * DCTSIZE, row4_diff);
vst1q_u16(block_diff + 5 * DCTSIZE, row5_diff);
vst1q_u16(block_diff + 6 * DCTSIZE, row6_diff);
vst1q_u16(block_diff + 7 * DCTSIZE, row7_diff);
while (bitmap != 0) {
r = BUILTIN_CLZLL(bitmap);
i += r;
bitmap <<= r;
nbits = block_nbits[i];
diff = block_diff[i];
while (r > 15) {
/* If run length > 15, emit special run-length-16 codes. */
PUT_BITS(code_0xf0, size_0xf0)
r -= 16;
}
/* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */
unsigned int rs = (r << 4) + nbits;
PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff)
i++;
bitmap <<= 1;
}
} else if (bitmap != 0) {
uint16_t block_abs[DCTSIZE2];
/* Store absolute value of coefficients. */
vst1q_u16(block_abs + 0 * DCTSIZE, vreinterpretq_u16_s16(abs_row0));
vst1q_u16(block_abs + 1 * DCTSIZE, vreinterpretq_u16_s16(abs_row1));
vst1q_u16(block_abs + 2 * DCTSIZE, vreinterpretq_u16_s16(abs_row2));
vst1q_u16(block_abs + 3 * DCTSIZE, vreinterpretq_u16_s16(abs_row3));
vst1q_u16(block_abs + 4 * DCTSIZE, vreinterpretq_u16_s16(abs_row4));
vst1q_u16(block_abs + 5 * DCTSIZE, vreinterpretq_u16_s16(abs_row5));
vst1q_u16(block_abs + 6 * DCTSIZE, vreinterpretq_u16_s16(abs_row6));
vst1q_u16(block_abs + 7 * DCTSIZE, vreinterpretq_u16_s16(abs_row7));
/* Store diff bits. */
vst1q_u16(block_diff + 0 * DCTSIZE, row0_diff);
vst1q_u16(block_diff + 1 * DCTSIZE, row1_diff);
vst1q_u16(block_diff + 2 * DCTSIZE, row2_diff);
vst1q_u16(block_diff + 3 * DCTSIZE, row3_diff);
vst1q_u16(block_diff + 4 * DCTSIZE, row4_diff);
vst1q_u16(block_diff + 5 * DCTSIZE, row5_diff);
vst1q_u16(block_diff + 6 * DCTSIZE, row6_diff);
vst1q_u16(block_diff + 7 * DCTSIZE, row7_diff);
/* Same as above but must mask diff bits and compute nbits on demand. */
while (bitmap != 0) {
r = BUILTIN_CLZLL(bitmap);
i += r;
bitmap <<= r;
lz = BUILTIN_CLZ(block_abs[i]);
nbits = 32 - lz;
diff = (unsigned int)(block_diff[i] << lz) >> lz;
while (r > 15) {
/* If run length > 15, emit special run-length-16 codes. */
PUT_BITS(code_0xf0, size_0xf0)
r -= 16;
}
/* Emit Huffman symbol for run length / number of bits. (F.1.2.2.1) */
unsigned int rs = (r << 4) + nbits;
PUT_CODE(actbl->ehufco[rs], actbl->ehufsi[rs], diff)
i++;
bitmap <<= 1;
}
}
/* If the last coefficient(s) were zero, emit an end-of-block (EOB) code.
* The value of RS for the EOB code is 0.
*/
if (i != 64) {
PUT_BITS(actbl->ehufco[0], actbl->ehufsi[0])
}
state_ptr->cur.put_buffer = put_buffer;
state_ptr->cur.free_bits = free_bits;
return buffer;
}

View File

@@ -3,8 +3,9 @@
* *
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB * Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2011, Nokia Corporation and/or its subsidiary(-ies). * Copyright (C) 2011, Nokia Corporation and/or its subsidiary(-ies).
* Copyright (C) 2009-2011, 2013-2014, 2016, 2018, D. R. Commander. * Copyright (C) 2009-2011, 2013-2014, 2016, 2018, 2020, D. R. Commander.
* Copyright (C) 2015-2016, 2018, Matthieu Darbois. * Copyright (C) 2015-2016, 2018, Matthieu Darbois.
* Copyright (C) 2020, Arm Limited.
* *
* Based on the x86 SIMD extension for IJG JPEG library, * Based on the x86 SIMD extension for IJG JPEG library,
* Copyright (C) 1999-2006, MIYASAKA Masaru. * Copyright (C) 1999-2006, MIYASAKA Masaru.
@@ -16,12 +17,13 @@
*/ */
#define JPEG_INTERNALS #define JPEG_INTERNALS
#include "../../jinclude.h" #include "../../../jinclude.h"
#include "../../jpeglib.h" #include "../../../jpeglib.h"
#include "../../../jsimd.h"
#include "../../../jdct.h"
#include "../../../jsimddct.h"
#include "../../jsimd.h" #include "../../jsimd.h"
#include "../../jdct.h" #include "jconfigint.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include <stdio.h> #include <stdio.h>
#include <string.h> #include <string.h>
@@ -189,6 +191,19 @@ jsimd_can_rgb_ycc(void)
GLOBAL(int) GLOBAL(int)
jsimd_can_rgb_gray(void) jsimd_can_rgb_gray(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if ((RGB_PIXELSIZE != 3) && (RGB_PIXELSIZE != 4))
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -237,20 +252,28 @@ jsimd_rgb_ycc_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf,
switch (cinfo->in_color_space) { switch (cinfo->in_color_space) {
case JCS_EXT_RGB: case JCS_EXT_RGB:
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTLD3) if (simd_features & JSIMD_FASTLD3)
#endif
neonfct = jsimd_extrgb_ycc_convert_neon; neonfct = jsimd_extrgb_ycc_convert_neon;
#ifndef NEON_INTRINSICS
else else
neonfct = jsimd_extrgb_ycc_convert_neon_slowld3; neonfct = jsimd_extrgb_ycc_convert_neon_slowld3;
#endif
break; break;
case JCS_EXT_RGBX: case JCS_EXT_RGBX:
case JCS_EXT_RGBA: case JCS_EXT_RGBA:
neonfct = jsimd_extrgbx_ycc_convert_neon; neonfct = jsimd_extrgbx_ycc_convert_neon;
break; break;
case JCS_EXT_BGR: case JCS_EXT_BGR:
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTLD3) if (simd_features & JSIMD_FASTLD3)
#endif
neonfct = jsimd_extbgr_ycc_convert_neon; neonfct = jsimd_extbgr_ycc_convert_neon;
#ifndef NEON_INTRINSICS
else else
neonfct = jsimd_extbgr_ycc_convert_neon_slowld3; neonfct = jsimd_extbgr_ycc_convert_neon_slowld3;
#endif
break; break;
case JCS_EXT_BGRX: case JCS_EXT_BGRX:
case JCS_EXT_BGRA: case JCS_EXT_BGRA:
@@ -265,10 +288,14 @@ jsimd_rgb_ycc_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf,
neonfct = jsimd_extxrgb_ycc_convert_neon; neonfct = jsimd_extxrgb_ycc_convert_neon;
break; break;
default: default:
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTLD3) if (simd_features & JSIMD_FASTLD3)
#endif
neonfct = jsimd_extrgb_ycc_convert_neon; neonfct = jsimd_extrgb_ycc_convert_neon;
#ifndef NEON_INTRINSICS
else else
neonfct = jsimd_extrgb_ycc_convert_neon_slowld3; neonfct = jsimd_extrgb_ycc_convert_neon_slowld3;
#endif
break; break;
} }
@@ -280,6 +307,37 @@ jsimd_rgb_gray_convert(j_compress_ptr cinfo, JSAMPARRAY input_buf,
JSAMPIMAGE output_buf, JDIMENSION output_row, JSAMPIMAGE output_buf, JDIMENSION output_row,
int num_rows) int num_rows)
{ {
void (*neonfct) (JDIMENSION, JSAMPARRAY, JSAMPIMAGE, JDIMENSION, int);
switch (cinfo->in_color_space) {
case JCS_EXT_RGB:
neonfct = jsimd_extrgb_gray_convert_neon;
break;
case JCS_EXT_RGBX:
case JCS_EXT_RGBA:
neonfct = jsimd_extrgbx_gray_convert_neon;
break;
case JCS_EXT_BGR:
neonfct = jsimd_extbgr_gray_convert_neon;
break;
case JCS_EXT_BGRX:
case JCS_EXT_BGRA:
neonfct = jsimd_extbgrx_gray_convert_neon;
break;
case JCS_EXT_XBGR:
case JCS_EXT_ABGR:
neonfct = jsimd_extxbgr_gray_convert_neon;
break;
case JCS_EXT_XRGB:
case JCS_EXT_ARGB:
neonfct = jsimd_extxrgb_gray_convert_neon;
break;
default:
neonfct = jsimd_extrgb_gray_convert_neon;
break;
}
neonfct(cinfo->image_width, input_buf, output_buf, output_row, num_rows);
} }
GLOBAL(void) GLOBAL(void)
@@ -291,20 +349,28 @@ jsimd_ycc_rgb_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
switch (cinfo->out_color_space) { switch (cinfo->out_color_space) {
case JCS_EXT_RGB: case JCS_EXT_RGB:
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTST3) if (simd_features & JSIMD_FASTST3)
#endif
neonfct = jsimd_ycc_extrgb_convert_neon; neonfct = jsimd_ycc_extrgb_convert_neon;
#ifndef NEON_INTRINSICS
else else
neonfct = jsimd_ycc_extrgb_convert_neon_slowst3; neonfct = jsimd_ycc_extrgb_convert_neon_slowst3;
#endif
break; break;
case JCS_EXT_RGBX: case JCS_EXT_RGBX:
case JCS_EXT_RGBA: case JCS_EXT_RGBA:
neonfct = jsimd_ycc_extrgbx_convert_neon; neonfct = jsimd_ycc_extrgbx_convert_neon;
break; break;
case JCS_EXT_BGR: case JCS_EXT_BGR:
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTST3) if (simd_features & JSIMD_FASTST3)
#endif
neonfct = jsimd_ycc_extbgr_convert_neon; neonfct = jsimd_ycc_extbgr_convert_neon;
#ifndef NEON_INTRINSICS
else else
neonfct = jsimd_ycc_extbgr_convert_neon_slowst3; neonfct = jsimd_ycc_extbgr_convert_neon_slowst3;
#endif
break; break;
case JCS_EXT_BGRX: case JCS_EXT_BGRX:
case JCS_EXT_BGRA: case JCS_EXT_BGRA:
@@ -319,10 +385,14 @@ jsimd_ycc_rgb_convert(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
neonfct = jsimd_ycc_extxrgb_convert_neon; neonfct = jsimd_ycc_extxrgb_convert_neon;
break; break;
default: default:
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTST3) if (simd_features & JSIMD_FASTST3)
#endif
neonfct = jsimd_ycc_extrgb_convert_neon; neonfct = jsimd_ycc_extrgb_convert_neon;
#ifndef NEON_INTRINSICS
else else
neonfct = jsimd_ycc_extrgb_convert_neon_slowst3; neonfct = jsimd_ycc_extrgb_convert_neon_slowst3;
#endif
break; break;
} }
@@ -397,12 +467,33 @@ jsimd_h2v1_downsample(j_compress_ptr cinfo, jpeg_component_info *compptr,
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_upsample(void) jsimd_can_h2v2_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v1_upsample(void) jsimd_can_h2v1_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -410,23 +501,66 @@ GLOBAL(void)
jsimd_h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v2_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v2_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width,
input_data, output_data_ptr);
} }
GLOBAL(void) GLOBAL(void)
jsimd_h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v1_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v1_upsample_neon(cinfo->max_v_samp_factor, cinfo->output_width,
input_data, output_data_ptr);
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_fancy_upsample(void) jsimd_can_h2v2_fancy_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v1_fancy_upsample(void) jsimd_can_h2v1_fancy_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0;
}
GLOBAL(int)
jsimd_can_h1v2_fancy_upsample(void)
{
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -434,23 +568,60 @@ GLOBAL(void)
jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v2_fancy_upsample_neon(cinfo->max_v_samp_factor,
compptr->downsampled_width, input_data,
output_data_ptr);
} }
GLOBAL(void) GLOBAL(void)
jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr, jsimd_h2v1_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr) JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{ {
jsimd_h2v1_fancy_upsample_neon(cinfo->max_v_samp_factor,
compptr->downsampled_width, input_data,
output_data_ptr);
}
GLOBAL(void)
jsimd_h1v2_fancy_upsample(j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{
jsimd_h1v2_fancy_upsample_neon(cinfo->max_v_samp_factor,
compptr->downsampled_width, input_data,
output_data_ptr);
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v2_merged_upsample(void) jsimd_can_h2v2_merged_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_h2v1_merged_upsample(void) jsimd_can_h2v1_merged_upsample(void)
{ {
init_simd();
/* The code is optimised for these values only */
if (BITS_IN_JSAMPLE != 8)
return 0;
if (sizeof(JDIMENSION) != 4)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -458,12 +629,74 @@ GLOBAL(void)
jsimd_h2v2_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, jsimd_h2v2_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf)
{ {
void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY);
switch (cinfo->out_color_space) {
case JCS_EXT_RGB:
neonfct = jsimd_h2v2_extrgb_merged_upsample_neon;
break;
case JCS_EXT_RGBX:
case JCS_EXT_RGBA:
neonfct = jsimd_h2v2_extrgbx_merged_upsample_neon;
break;
case JCS_EXT_BGR:
neonfct = jsimd_h2v2_extbgr_merged_upsample_neon;
break;
case JCS_EXT_BGRX:
case JCS_EXT_BGRA:
neonfct = jsimd_h2v2_extbgrx_merged_upsample_neon;
break;
case JCS_EXT_XBGR:
case JCS_EXT_ABGR:
neonfct = jsimd_h2v2_extxbgr_merged_upsample_neon;
break;
case JCS_EXT_XRGB:
case JCS_EXT_ARGB:
neonfct = jsimd_h2v2_extxrgb_merged_upsample_neon;
break;
default:
neonfct = jsimd_h2v2_extrgb_merged_upsample_neon;
break;
}
neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
} }
GLOBAL(void) GLOBAL(void)
jsimd_h2v1_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf, jsimd_h2v1_merged_upsample(j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf) JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf)
{ {
void (*neonfct) (JDIMENSION, JSAMPIMAGE, JDIMENSION, JSAMPARRAY);
switch (cinfo->out_color_space) {
case JCS_EXT_RGB:
neonfct = jsimd_h2v1_extrgb_merged_upsample_neon;
break;
case JCS_EXT_RGBX:
case JCS_EXT_RGBA:
neonfct = jsimd_h2v1_extrgbx_merged_upsample_neon;
break;
case JCS_EXT_BGR:
neonfct = jsimd_h2v1_extbgr_merged_upsample_neon;
break;
case JCS_EXT_BGRX:
case JCS_EXT_BGRA:
neonfct = jsimd_h2v1_extbgrx_merged_upsample_neon;
break;
case JCS_EXT_XBGR:
case JCS_EXT_ABGR:
neonfct = jsimd_h2v1_extxbgr_merged_upsample_neon;
break;
case JCS_EXT_XRGB:
case JCS_EXT_ARGB:
neonfct = jsimd_h2v1_extxrgb_merged_upsample_neon;
break;
default:
neonfct = jsimd_h2v1_extrgb_merged_upsample_neon;
break;
}
neonfct(cinfo->output_width, input_buf, in_row_group_ctr, output_buf);
} }
GLOBAL(int) GLOBAL(int)
@@ -762,17 +995,33 @@ jsimd_huff_encode_one_block(void *state, JOCTET *buffer, JCOEFPTR block,
int last_dc_val, c_derived_tbl *dctbl, int last_dc_val, c_derived_tbl *dctbl,
c_derived_tbl *actbl) c_derived_tbl *actbl)
{ {
#ifndef NEON_INTRINSICS
if (simd_features & JSIMD_FASTTBL) if (simd_features & JSIMD_FASTTBL)
#endif
return jsimd_huff_encode_one_block_neon(state, buffer, block, last_dc_val, return jsimd_huff_encode_one_block_neon(state, buffer, block, last_dc_val,
dctbl, actbl); dctbl, actbl);
#ifndef NEON_INTRINSICS
else else
return jsimd_huff_encode_one_block_neon_slowtbl(state, buffer, block, return jsimd_huff_encode_one_block_neon_slowtbl(state, buffer, block,
last_dc_val, dctbl, actbl); last_dc_val, dctbl, actbl);
#endif
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_encode_mcu_AC_first_prepare(void) jsimd_can_encode_mcu_AC_first_prepare(void)
{ {
init_simd();
if (DCTSIZE != 8)
return 0;
if (sizeof(JCOEF) != 2)
return 0;
if (SIZEOF_SIZE_T != 8)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -781,11 +1030,25 @@ jsimd_encode_mcu_AC_first_prepare(const JCOEF *block,
const int *jpeg_natural_order_start, int Sl, const int *jpeg_natural_order_start, int Sl,
int Al, JCOEF *values, size_t *zerobits) int Al, JCOEF *values, size_t *zerobits)
{ {
jsimd_encode_mcu_AC_first_prepare_neon(block, jpeg_natural_order_start,
Sl, Al, values, zerobits);
} }
GLOBAL(int) GLOBAL(int)
jsimd_can_encode_mcu_AC_refine_prepare(void) jsimd_can_encode_mcu_AC_refine_prepare(void)
{ {
init_simd();
if (DCTSIZE != 8)
return 0;
if (sizeof(JCOEF) != 2)
return 0;
if (SIZEOF_SIZE_T != 8)
return 0;
if (simd_support & JSIMD_NEON)
return 1;
return 0; return 0;
} }
@@ -794,5 +1057,7 @@ jsimd_encode_mcu_AC_refine_prepare(const JCOEF *block,
const int *jpeg_natural_order_start, int Sl, const int *jpeg_natural_order_start, int Sl,
int Al, JCOEF *absvalues, size_t *bits) int Al, JCOEF *absvalues, size_t *bits)
{ {
return 0; return jsimd_encode_mcu_AC_refine_prepare_neon(block,
jpeg_natural_order_start,
Sl, Al, absvalues, bits);
} }

File diff suppressed because it is too large Load Diff

28
simd/arm/align.h Normal file
View File

@@ -0,0 +1,28 @@
/*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
/* How to obtain memory alignment for structures and variables */
#if defined(_MSC_VER)
#define ALIGN(alignment) __declspec(align(alignment))
#elif defined(__clang__) || defined(__GNUC__)
#define ALIGN(alignment) __attribute__((aligned(alignment)))
#else
#error "Unknown compiler"
#endif

160
simd/arm/jccolor-neon.c Normal file
View File

@@ -0,0 +1,160 @@
/*
* jccolor-neon.c - colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>
/* RGB -> YCbCr conversion constants */
#define F_0_298 19595
#define F_0_587 38470
#define F_0_113 7471
#define F_0_168 11059
#define F_0_331 21709
#define F_0_500 32768
#define F_0_418 27439
#define F_0_081 5329
ALIGN(16) static const uint16_t jsimd_rgb_ycc_neon_consts[] = {
F_0_298, F_0_587, F_0_113, F_0_168,
F_0_331, F_0_500, F_0_418, F_0_081
};
/* Include inline routines for colorspace extensions. */
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#define RGB_RED EXT_RGB_RED
#define RGB_GREEN EXT_RGB_GREEN
#define RGB_BLUE EXT_RGB_BLUE
#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
#define jsimd_rgb_ycc_convert_neon jsimd_extrgb_ycc_convert_neon
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_ycc_convert_neon
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
#define RGB_BLUE EXT_RGBX_BLUE
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define jsimd_rgb_ycc_convert_neon jsimd_extrgbx_ycc_convert_neon
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_ycc_convert_neon
#define RGB_RED EXT_BGR_RED
#define RGB_GREEN EXT_BGR_GREEN
#define RGB_BLUE EXT_BGR_BLUE
#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
#define jsimd_rgb_ycc_convert_neon jsimd_extbgr_ycc_convert_neon
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_ycc_convert_neon
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
#define RGB_BLUE EXT_BGRX_BLUE
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define jsimd_rgb_ycc_convert_neon jsimd_extbgrx_ycc_convert_neon
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_ycc_convert_neon
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
#define RGB_BLUE EXT_XBGR_BLUE
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define jsimd_rgb_ycc_convert_neon jsimd_extxbgr_ycc_convert_neon
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_ycc_convert_neon
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
#define RGB_BLUE EXT_XRGB_BLUE
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define jsimd_rgb_ycc_convert_neon jsimd_extxrgb_ycc_convert_neon
#if defined(__aarch64__) || defined(_M_ARM64)
#include "aarch64/jccolext-neon.c"
#else
#include "aarch32/jccolext-neon.c"
#endif
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_ycc_convert_neon

120
simd/arm/jcgray-neon.c Normal file
View File

@@ -0,0 +1,120 @@
/*
* jcgray-neon.c - grayscale colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include <arm_neon.h>
/* RGB -> Grayscale conversion constants */
#define F_0_298 19595
#define F_0_587 38470
#define F_0_113 7471
/* Include inline routines for colorspace extensions. */
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#define RGB_RED EXT_RGB_RED
#define RGB_GREEN EXT_RGB_GREEN
#define RGB_BLUE EXT_RGB_BLUE
#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
#define jsimd_rgb_gray_convert_neon jsimd_extrgb_gray_convert_neon
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_gray_convert_neon
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
#define RGB_BLUE EXT_RGBX_BLUE
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define jsimd_rgb_gray_convert_neon jsimd_extrgbx_gray_convert_neon
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_gray_convert_neon
#define RGB_RED EXT_BGR_RED
#define RGB_GREEN EXT_BGR_GREEN
#define RGB_BLUE EXT_BGR_BLUE
#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
#define jsimd_rgb_gray_convert_neon jsimd_extbgr_gray_convert_neon
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_gray_convert_neon
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
#define RGB_BLUE EXT_BGRX_BLUE
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define jsimd_rgb_gray_convert_neon jsimd_extbgrx_gray_convert_neon
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_gray_convert_neon
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
#define RGB_BLUE EXT_XBGR_BLUE
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define jsimd_rgb_gray_convert_neon jsimd_extxbgr_gray_convert_neon
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_gray_convert_neon
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
#define RGB_BLUE EXT_XRGB_BLUE
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define jsimd_rgb_gray_convert_neon jsimd_extxrgb_gray_convert_neon
#include "jcgryext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_rgb_gray_convert_neon

106
simd/arm/jcgryext-neon.c Normal file
View File

@@ -0,0 +1,106 @@
/*
* jcgryext-neon.c - grayscale colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
/* This file is included by jcgray-neon.c */
/* RGB -> Grayscale conversion is defined by the following equation:
* Y = 0.29900 * R + 0.58700 * G + 0.11400 * B
*
* Avoid floating point arithmetic by using shifted integer constants:
* 0.29899597 = 19595 * 2^-16
* 0.58700561 = 38470 * 2^-16
* 0.11399841 = 7471 * 2^-16
* These constants are defined in jcgray-neon.c
*
* This is the same computation as the RGB -> Y portion of RGB -> YCbCr.
*/
void jsimd_rgb_gray_convert_neon(JDIMENSION image_width, JSAMPARRAY input_buf,
JSAMPIMAGE output_buf, JDIMENSION output_row,
int num_rows)
{
JSAMPROW inptr;
JSAMPROW outptr;
/* Allocate temporary buffer for final (image_width % 16) pixels in row. */
ALIGN(16) uint8_t tmp_buf[16 * RGB_PIXELSIZE];
while (--num_rows >= 0) {
inptr = *input_buf++;
outptr = output_buf[0][output_row];
output_row++;
int cols_remaining = image_width;
for (; cols_remaining > 0; cols_remaining -= 16) {
/* To prevent buffer overread by the vector load instructions, the last
* (image_width % 16) columns of data are first memcopied to a temporary
* buffer large enough to accommodate the vector load.
*/
if (cols_remaining < 16) {
memcpy(tmp_buf, inptr, cols_remaining * RGB_PIXELSIZE);
inptr = tmp_buf;
}
#if RGB_PIXELSIZE == 4
uint8x16x4_t input_pixels = vld4q_u8(inptr);
#else
uint8x16x3_t input_pixels = vld3q_u8(inptr);
#endif
uint16x8_t r_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_RED]));
uint16x8_t r_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_RED]));
uint16x8_t g_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_GREEN]));
uint16x8_t g_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_GREEN]));
uint16x8_t b_l = vmovl_u8(vget_low_u8(input_pixels.val[RGB_BLUE]));
uint16x8_t b_h = vmovl_u8(vget_high_u8(input_pixels.val[RGB_BLUE]));
/* Compute Y = 0.29900 * R + 0.58700 * G + 0.11400 * B */
uint32x4_t y_ll = vmull_n_u16(vget_low_u16(r_l), F_0_298);
uint32x4_t y_lh = vmull_n_u16(vget_high_u16(r_l), F_0_298);
uint32x4_t y_hl = vmull_n_u16(vget_low_u16(r_h), F_0_298);
uint32x4_t y_hh = vmull_n_u16(vget_high_u16(r_h), F_0_298);
y_ll = vmlal_n_u16(y_ll, vget_low_u16(g_l), F_0_587);
y_lh = vmlal_n_u16(y_lh, vget_high_u16(g_l), F_0_587);
y_hl = vmlal_n_u16(y_hl, vget_low_u16(g_h), F_0_587);
y_hh = vmlal_n_u16(y_hh, vget_high_u16(g_h), F_0_587);
y_ll = vmlal_n_u16(y_ll, vget_low_u16(b_l), F_0_113);
y_lh = vmlal_n_u16(y_lh, vget_high_u16(b_l), F_0_113);
y_hl = vmlal_n_u16(y_hl, vget_low_u16(b_h), F_0_113);
y_hh = vmlal_n_u16(y_hh, vget_high_u16(b_h), F_0_113);
/* Descale Y values (rounding right shift) and narrow to 16-bit. */
uint16x8_t y_l = vcombine_u16(vrshrn_n_u32(y_ll, 16),
vrshrn_n_u32(y_lh, 16));
uint16x8_t y_h = vcombine_u16(vrshrn_n_u32(y_hl, 16),
vrshrn_n_u32(y_hh, 16));
/* Narrow Y values to 8-bit and store to memory. Buffer overwrite is
* permitted up to the next multiple of ALIGN_SIZE bytes.
*/
vst1q_u8(outptr, vcombine_u8(vmovn_u16(y_l), vmovn_u16(y_h)));
/* Increment pointers. */
inptr += (16 * RGB_PIXELSIZE);
outptr += 16;
}
}
}

149
simd/arm/jchuff.h Normal file
View File

@@ -0,0 +1,149 @@
/*
* jchuff.h
*
* This file was part of the Independent JPEG Group's software:
* Copyright (C) 1991-1997, Thomas G. Lane.
* libjpeg-turbo Modifications:
* Copyright (C) 2009, 2018, D. R. Commander.
* Copyright (C) 2018, Matthias Räncker.
* Copyright (C) 2020, Arm Limited.
* For conditions of distribution and use, see the accompanying README.ijg
* file.
*/
/* Expanded entropy encoder object for Huffman encoding.
*
* The savable_state subrecord contains fields that change within an MCU,
* but must not be updated permanently until we complete the MCU.
*/
#if defined(__aarch64__) || defined(_M_ARM64)
#define BIT_BUF_SIZE 64
#else
#define BIT_BUF_SIZE 32
#endif
typedef struct {
size_t put_buffer; /* current bit accumulation buffer */
int free_bits; /* # of bits available in it */
int last_dc_val[MAX_COMPS_IN_SCAN]; /* last DC coef for each component */
} savable_state;
typedef struct {
JOCTET *next_output_byte; /* => next byte to write in buffer */
size_t free_in_buffer; /* # of byte spaces remaining in buffer */
savable_state cur; /* Current bit buffer & DC state */
j_compress_ptr cinfo; /* dump_buffer needs access to this */
int simd;
} working_state;
/* Outputting bits to the file */
/* Output byte b and, speculatively, an additional 0 byte. 0xFF must be encoded
* as 0xFF 0x00, so the output buffer pointer is advanced by 2 if the byte is
* 0xFF. Otherwise, the output buffer pointer is advanced by 1, and the
* speculative 0 byte will be overwritten by the next byte.
*/
#define EMIT_BYTE(b) { \
buffer[0] = (JOCTET)(b); \
buffer[1] = 0; \
buffer -= -2 + ((JOCTET)(b) < 0xFF); \
}
/* Output the entire bit buffer. If there are no 0xFF bytes in it, then write
* directly to the output buffer. Otherwise, use the EMIT_BYTE() macro to
* encode 0xFF as 0xFF 0x00.
*/
#if defined(__aarch64__) || defined(_M_ARM64)
#if defined(_MSC_VER) && !defined(__clang__)
#define SPLAT() { \
buffer[0] = (JOCTET)(put_buffer >> 56); \
buffer[1] = (JOCTET)(put_buffer >> 48); \
buffer[2] = (JOCTET)(put_buffer >> 40); \
buffer[3] = (JOCTET)(put_buffer >> 32); \
buffer[4] = (JOCTET)(put_buffer >> 24); \
buffer[5] = (JOCTET)(put_buffer >> 16); \
buffer[6] = (JOCTET)(put_buffer >> 8); \
buffer[7] = (JOCTET)(put_buffer ); \
}
#else
#define SPLAT() { \
__asm__("rev %x0, %x1" : "=r"(put_buffer) : "r"(put_buffer)); \
*((uint64_t *)buffer) = put_buffer; \
}
#endif
#define FLUSH() { \
if (put_buffer & 0x8080808080808080 & ~(put_buffer + 0x0101010101010101)) { \
EMIT_BYTE(put_buffer >> 56) \
EMIT_BYTE(put_buffer >> 48) \
EMIT_BYTE(put_buffer >> 40) \
EMIT_BYTE(put_buffer >> 32) \
EMIT_BYTE(put_buffer >> 24) \
EMIT_BYTE(put_buffer >> 16) \
EMIT_BYTE(put_buffer >> 8) \
EMIT_BYTE(put_buffer ) \
} else { \
SPLAT() \
buffer += 8; \
} \
}
#else
#if defined(_MSC_VER) && !defined(__clang__)
#define SPLAT() { \
buffer[0] = (JOCTET)(put_buffer >> 24); \
buffer[1] = (JOCTET)(put_buffer >> 16); \
buffer[2] = (JOCTET)(put_buffer >> 8); \
buffer[3] = (JOCTET)(put_buffer ); \
}
#else
#define SPLAT() { \
__asm__("rev %0, %1" : "=r"(put_buffer) : "r"(put_buffer)); \
*((uint32_t *)buffer) = put_buffer; \
}
#endif
#define FLUSH() { \
if (put_buffer & 0x80808080 & ~(put_buffer + 0x01010101)) { \
EMIT_BYTE(put_buffer >> 24) \
EMIT_BYTE(put_buffer >> 16) \
EMIT_BYTE(put_buffer >> 8) \
EMIT_BYTE(put_buffer ) \
} else { \
SPLAT() \
buffer += 4; \
} \
}
#endif
/* Fill the bit buffer to capacity with the leading bits from code, then output
* the bit buffer and put the remaining bits from code into the bit buffer.
*/
#define PUT_AND_FLUSH(code, size) { \
put_buffer = (put_buffer << (size + free_bits)) | (code >> -free_bits); \
FLUSH() \
free_bits += BIT_BUF_SIZE; \
put_buffer = code; \
}
/* Insert code into the bit buffer and output the bit buffer if needed.
* NOTE: We can't flush with free_bits == 0, since the left shift in
* PUT_AND_FLUSH() would have undefined behavior.
*/
#define PUT_BITS(code, size) { \
free_bits -= size; \
if (free_bits < 0) \
PUT_AND_FLUSH(code, size) \
else \
put_buffer = (put_buffer << size) | code; \
}
#define PUT_CODE(code, size, diff) { \
diff |= code << nbits; \
nbits += size; \
PUT_BITS(diff, nbits) \
}

591
simd/arm/jcphuff-neon.c Normal file
View File

@@ -0,0 +1,591 @@
/*
* jcphuff-neon.c - prepare data for progressive Huffman encoding (Arm Neon)
*
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "neon-compat.h"
#include <arm_neon.h>
/* Data preparation for encode_mcu_AC_first().
*
* The equivalent scalar C function (encode_mcu_AC_first_prepare()) can be
* found in jcphuff.c.
*/
void jsimd_encode_mcu_AC_first_prepare_neon
(const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al,
JCOEF *values, size_t *zerobits)
{
JCOEF *values_ptr = values;
JCOEF *diff_values_ptr = values + DCTSIZE2;
/* Rows of coefficients to zero (since they haven't been processed) */
int i, rows_to_zero = 8;
for (i = 0; i < Sl / 16; i++) {
int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7);
int16x8_t coefs2 = vld1q_dup_s16(block + jpeg_natural_order_start[8]);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[15], coefs2, 7);
/* Isolate sign of coefficients. */
int16x8_t sign_coefs1 = vshrq_n_s16(coefs1, 15);
int16x8_t sign_coefs2 = vshrq_n_s16(coefs2, 15);
/* Compute absolute value of coefficients and apply point transform Al. */
int16x8_t abs_coefs1 = vabsq_s16(coefs1);
int16x8_t abs_coefs2 = vabsq_s16(coefs2);
coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al));
coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al));
/* Compute diff values. */
int16x8_t diff1 = veorq_s16(coefs1, sign_coefs1);
int16x8_t diff2 = veorq_s16(coefs2, sign_coefs2);
/* Store transformed coefficients and diff values. */
vst1q_s16(values_ptr, coefs1);
vst1q_s16(values_ptr + DCTSIZE, coefs2);
vst1q_s16(diff_values_ptr, diff1);
vst1q_s16(diff_values_ptr + DCTSIZE, diff2);
values_ptr += 16;
diff_values_ptr += 16;
jpeg_natural_order_start += 16;
rows_to_zero -= 2;
}
/* Same operation but for remaining partial vector */
int remaining_coefs = Sl % 16;
if (remaining_coefs > 8) {
int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7);
int16x8_t coefs2 = vdupq_n_s16(0);
switch (remaining_coefs) {
case 15:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6);
case 14:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5);
case 13:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4);
case 12:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3);
case 11:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2);
case 10:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1);
case 9:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[8], coefs2, 0);
default:
break;
}
/* Isolate sign of coefficients. */
int16x8_t sign_coefs1 = vshrq_n_s16(coefs1, 15);
int16x8_t sign_coefs2 = vshrq_n_s16(coefs2, 15);
/* Compute absolute value of coefficients and apply point transform Al. */
int16x8_t abs_coefs1 = vabsq_s16(coefs1);
int16x8_t abs_coefs2 = vabsq_s16(coefs2);
coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al));
coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al));
/* Compute diff values. */
int16x8_t diff1 = veorq_s16(coefs1, sign_coefs1);
int16x8_t diff2 = veorq_s16(coefs2, sign_coefs2);
/* Store transformed coefficients and diff values. */
vst1q_s16(values_ptr, coefs1);
vst1q_s16(values_ptr + DCTSIZE, coefs2);
vst1q_s16(diff_values_ptr, diff1);
vst1q_s16(diff_values_ptr + DCTSIZE, diff2);
values_ptr += 16;
diff_values_ptr += 16;
rows_to_zero -= 2;
} else if (remaining_coefs > 0) {
int16x8_t coefs = vdupq_n_s16(0);
switch (remaining_coefs) {
case 8:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs, 7);
case 7:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs, 6);
case 6:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs, 5);
case 5:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs, 4);
case 4:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs, 3);
case 3:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs, 2);
case 2:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs, 1);
case 1:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[0], coefs, 0);
default:
break;
}
/* Isolate sign of coefficients. */
int16x8_t sign_coefs = vshrq_n_s16(coefs, 15);
/* Compute absolute value of coefficients and apply point transform Al. */
int16x8_t abs_coefs = vabsq_s16(coefs);
coefs = vshlq_s16(abs_coefs, vdupq_n_s16(-Al));
/* Compute diff values. */
int16x8_t diff = veorq_s16(coefs, sign_coefs);
/* Store transformed coefficients and diff values. */
vst1q_s16(values_ptr, coefs);
vst1q_s16(diff_values_ptr, diff);
values_ptr += 8;
diff_values_ptr += 8;
rows_to_zero--;
}
/* Zero remaining memory in the values and diff_values blocks. */
for (i = 0; i < rows_to_zero; i++) {
vst1q_s16(values_ptr, vdupq_n_s16(0));
vst1q_s16(diff_values_ptr, vdupq_n_s16(0));
values_ptr += 8;
diff_values_ptr += 8;
}
/* Construct zerobits bitmap. A set bit means that the corresponding
* coefficient != 0.
*/
int16x8_t row0 = vld1q_s16(values + 0 * DCTSIZE);
int16x8_t row1 = vld1q_s16(values + 1 * DCTSIZE);
int16x8_t row2 = vld1q_s16(values + 2 * DCTSIZE);
int16x8_t row3 = vld1q_s16(values + 3 * DCTSIZE);
int16x8_t row4 = vld1q_s16(values + 4 * DCTSIZE);
int16x8_t row5 = vld1q_s16(values + 5 * DCTSIZE);
int16x8_t row6 = vld1q_s16(values + 6 * DCTSIZE);
int16x8_t row7 = vld1q_s16(values + 7 * DCTSIZE);
uint8x8_t row0_eq0 = vmovn_u16(vceqq_s16(row0, vdupq_n_s16(0)));
uint8x8_t row1_eq0 = vmovn_u16(vceqq_s16(row1, vdupq_n_s16(0)));
uint8x8_t row2_eq0 = vmovn_u16(vceqq_s16(row2, vdupq_n_s16(0)));
uint8x8_t row3_eq0 = vmovn_u16(vceqq_s16(row3, vdupq_n_s16(0)));
uint8x8_t row4_eq0 = vmovn_u16(vceqq_s16(row4, vdupq_n_s16(0)));
uint8x8_t row5_eq0 = vmovn_u16(vceqq_s16(row5, vdupq_n_s16(0)));
uint8x8_t row6_eq0 = vmovn_u16(vceqq_s16(row6, vdupq_n_s16(0)));
uint8x8_t row7_eq0 = vmovn_u16(vceqq_s16(row7, vdupq_n_s16(0)));
/* { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 } */
const uint8x8_t bitmap_mask =
vreinterpret_u8_u64(vmov_n_u64(0x8040201008040201));
row0_eq0 = vand_u8(row0_eq0, bitmap_mask);
row1_eq0 = vand_u8(row1_eq0, bitmap_mask);
row2_eq0 = vand_u8(row2_eq0, bitmap_mask);
row3_eq0 = vand_u8(row3_eq0, bitmap_mask);
row4_eq0 = vand_u8(row4_eq0, bitmap_mask);
row5_eq0 = vand_u8(row5_eq0, bitmap_mask);
row6_eq0 = vand_u8(row6_eq0, bitmap_mask);
row7_eq0 = vand_u8(row7_eq0, bitmap_mask);
uint8x8_t bitmap_rows_01 = vpadd_u8(row0_eq0, row1_eq0);
uint8x8_t bitmap_rows_23 = vpadd_u8(row2_eq0, row3_eq0);
uint8x8_t bitmap_rows_45 = vpadd_u8(row4_eq0, row5_eq0);
uint8x8_t bitmap_rows_67 = vpadd_u8(row6_eq0, row7_eq0);
uint8x8_t bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23);
uint8x8_t bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67);
uint8x8_t bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567);
#if defined(__aarch64__) || defined(_M_ARM64)
/* Move bitmap to a 64-bit scalar register. */
uint64_t bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0);
/* Store zerobits bitmap. */
*zerobits = ~bitmap;
#else
/* Move bitmap to two 32-bit scalar registers. */
uint32_t bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0);
uint32_t bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1);
/* Store zerobits bitmap. */
zerobits[0] = ~bitmap0;
zerobits[1] = ~bitmap1;
#endif
}
/* Data preparation for encode_mcu_AC_refine().
*
* The equivalent scalar C function (encode_mcu_AC_refine_prepare()) can be
* found in jcphuff.c.
*/
int jsimd_encode_mcu_AC_refine_prepare_neon
(const JCOEF *block, const int *jpeg_natural_order_start, int Sl, int Al,
JCOEF *absvalues, size_t *bits)
{
/* Temporary storage buffers for data used to compute the signbits bitmap and
* the end-of-block (EOB) position
*/
uint8_t coef_sign_bits[64];
uint8_t coef_eq1_bits[64];
JCOEF *absvalues_ptr = absvalues;
uint8_t *coef_sign_bits_ptr = coef_sign_bits;
uint8_t *eq1_bits_ptr = coef_eq1_bits;
/* Rows of coefficients to zero (since they haven't been processed) */
int i, rows_to_zero = 8;
for (i = 0; i < Sl / 16; i++) {
int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7);
int16x8_t coefs2 = vld1q_dup_s16(block + jpeg_natural_order_start[8]);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6);
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[15], coefs2, 7);
/* Compute and store data for signbits bitmap. */
uint8x8_t sign_coefs1 =
vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs1, 15)));
uint8x8_t sign_coefs2 =
vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs2, 15)));
vst1_u8(coef_sign_bits_ptr, sign_coefs1);
vst1_u8(coef_sign_bits_ptr + DCTSIZE, sign_coefs2);
/* Compute absolute value of coefficients and apply point transform Al. */
int16x8_t abs_coefs1 = vabsq_s16(coefs1);
int16x8_t abs_coefs2 = vabsq_s16(coefs2);
coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al));
coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al));
vst1q_s16(absvalues_ptr, coefs1);
vst1q_s16(absvalues_ptr + DCTSIZE, coefs2);
/* Test whether transformed coefficient values == 1 (used to find EOB
* position.)
*/
uint8x8_t coefs_eq11 = vmovn_u16(vceqq_s16(coefs1, vdupq_n_s16(1)));
uint8x8_t coefs_eq12 = vmovn_u16(vceqq_s16(coefs2, vdupq_n_s16(1)));
vst1_u8(eq1_bits_ptr, coefs_eq11);
vst1_u8(eq1_bits_ptr + DCTSIZE, coefs_eq12);
absvalues_ptr += 16;
coef_sign_bits_ptr += 16;
eq1_bits_ptr += 16;
jpeg_natural_order_start += 16;
rows_to_zero -= 2;
}
/* Same operation but for remaining partial vector */
int remaining_coefs = Sl % 16;
if (remaining_coefs > 8) {
int16x8_t coefs1 = vld1q_dup_s16(block + jpeg_natural_order_start[0]);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs1, 1);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs1, 2);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs1, 3);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs1, 4);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs1, 5);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs1, 6);
coefs1 = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs1, 7);
int16x8_t coefs2 = vdupq_n_s16(0);
switch (remaining_coefs) {
case 15:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[14], coefs2, 6);
case 14:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[13], coefs2, 5);
case 13:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[12], coefs2, 4);
case 12:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[11], coefs2, 3);
case 11:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[10], coefs2, 2);
case 10:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[9], coefs2, 1);
case 9:
coefs2 = vld1q_lane_s16(block + jpeg_natural_order_start[8], coefs2, 0);
default:
break;
}
/* Compute and store data for signbits bitmap. */
uint8x8_t sign_coefs1 =
vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs1, 15)));
uint8x8_t sign_coefs2 =
vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs2, 15)));
vst1_u8(coef_sign_bits_ptr, sign_coefs1);
vst1_u8(coef_sign_bits_ptr + DCTSIZE, sign_coefs2);
/* Compute absolute value of coefficients and apply point transform Al. */
int16x8_t abs_coefs1 = vabsq_s16(coefs1);
int16x8_t abs_coefs2 = vabsq_s16(coefs2);
coefs1 = vshlq_s16(abs_coefs1, vdupq_n_s16(-Al));
coefs2 = vshlq_s16(abs_coefs2, vdupq_n_s16(-Al));
vst1q_s16(absvalues_ptr, coefs1);
vst1q_s16(absvalues_ptr + DCTSIZE, coefs2);
/* Test whether transformed coefficient values == 1 (used to find EOB
* position.)
*/
uint8x8_t coefs_eq11 = vmovn_u16(vceqq_s16(coefs1, vdupq_n_s16(1)));
uint8x8_t coefs_eq12 = vmovn_u16(vceqq_s16(coefs2, vdupq_n_s16(1)));
vst1_u8(eq1_bits_ptr, coefs_eq11);
vst1_u8(eq1_bits_ptr + DCTSIZE, coefs_eq12);
absvalues_ptr += 16;
coef_sign_bits_ptr += 16;
eq1_bits_ptr += 16;
jpeg_natural_order_start += 16;
rows_to_zero -= 2;
} else if (remaining_coefs > 0) {
int16x8_t coefs = vdupq_n_s16(0);
switch (remaining_coefs) {
case 8:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[7], coefs, 7);
case 7:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[6], coefs, 6);
case 6:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[5], coefs, 5);
case 5:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[4], coefs, 4);
case 4:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[3], coefs, 3);
case 3:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[2], coefs, 2);
case 2:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[1], coefs, 1);
case 1:
coefs = vld1q_lane_s16(block + jpeg_natural_order_start[0], coefs, 0);
default:
break;
}
/* Compute and store data for signbits bitmap. */
uint8x8_t sign_coefs =
vmovn_u16(vreinterpretq_u16_s16(vshrq_n_s16(coefs, 15)));
vst1_u8(coef_sign_bits_ptr, sign_coefs);
/* Compute absolute value of coefficients and apply point transform Al. */
int16x8_t abs_coefs = vabsq_s16(coefs);
coefs = vshlq_s16(abs_coefs, vdupq_n_s16(-Al));
vst1q_s16(absvalues_ptr, coefs);
/* Test whether transformed coefficient values == 1 (used to find EOB
* position.)
*/
uint8x8_t coefs_eq1 = vmovn_u16(vceqq_s16(coefs, vdupq_n_s16(1)));
vst1_u8(eq1_bits_ptr, coefs_eq1);
absvalues_ptr += 8;
coef_sign_bits_ptr += 8;
eq1_bits_ptr += 8;
rows_to_zero--;
}
/* Zero remaining memory in blocks. */
for (i = 0; i < rows_to_zero; i++) {
vst1q_s16(absvalues_ptr, vdupq_n_s16(0));
vst1_u8(coef_sign_bits_ptr, vdup_n_u8(0));
vst1_u8(eq1_bits_ptr, vdup_n_u8(0));
absvalues_ptr += 8;
coef_sign_bits_ptr += 8;
eq1_bits_ptr += 8;
}
/* Construct zerobits bitmap. */
int16x8_t abs_row0 = vld1q_s16(absvalues + 0 * DCTSIZE);
int16x8_t abs_row1 = vld1q_s16(absvalues + 1 * DCTSIZE);
int16x8_t abs_row2 = vld1q_s16(absvalues + 2 * DCTSIZE);
int16x8_t abs_row3 = vld1q_s16(absvalues + 3 * DCTSIZE);
int16x8_t abs_row4 = vld1q_s16(absvalues + 4 * DCTSIZE);
int16x8_t abs_row5 = vld1q_s16(absvalues + 5 * DCTSIZE);
int16x8_t abs_row6 = vld1q_s16(absvalues + 6 * DCTSIZE);
int16x8_t abs_row7 = vld1q_s16(absvalues + 7 * DCTSIZE);
uint8x8_t abs_row0_eq0 = vmovn_u16(vceqq_s16(abs_row0, vdupq_n_s16(0)));
uint8x8_t abs_row1_eq0 = vmovn_u16(vceqq_s16(abs_row1, vdupq_n_s16(0)));
uint8x8_t abs_row2_eq0 = vmovn_u16(vceqq_s16(abs_row2, vdupq_n_s16(0)));
uint8x8_t abs_row3_eq0 = vmovn_u16(vceqq_s16(abs_row3, vdupq_n_s16(0)));
uint8x8_t abs_row4_eq0 = vmovn_u16(vceqq_s16(abs_row4, vdupq_n_s16(0)));
uint8x8_t abs_row5_eq0 = vmovn_u16(vceqq_s16(abs_row5, vdupq_n_s16(0)));
uint8x8_t abs_row6_eq0 = vmovn_u16(vceqq_s16(abs_row6, vdupq_n_s16(0)));
uint8x8_t abs_row7_eq0 = vmovn_u16(vceqq_s16(abs_row7, vdupq_n_s16(0)));
/* { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 } */
const uint8x8_t bitmap_mask =
vreinterpret_u8_u64(vmov_n_u64(0x8040201008040201));
abs_row0_eq0 = vand_u8(abs_row0_eq0, bitmap_mask);
abs_row1_eq0 = vand_u8(abs_row1_eq0, bitmap_mask);
abs_row2_eq0 = vand_u8(abs_row2_eq0, bitmap_mask);
abs_row3_eq0 = vand_u8(abs_row3_eq0, bitmap_mask);
abs_row4_eq0 = vand_u8(abs_row4_eq0, bitmap_mask);
abs_row5_eq0 = vand_u8(abs_row5_eq0, bitmap_mask);
abs_row6_eq0 = vand_u8(abs_row6_eq0, bitmap_mask);
abs_row7_eq0 = vand_u8(abs_row7_eq0, bitmap_mask);
uint8x8_t bitmap_rows_01 = vpadd_u8(abs_row0_eq0, abs_row1_eq0);
uint8x8_t bitmap_rows_23 = vpadd_u8(abs_row2_eq0, abs_row3_eq0);
uint8x8_t bitmap_rows_45 = vpadd_u8(abs_row4_eq0, abs_row5_eq0);
uint8x8_t bitmap_rows_67 = vpadd_u8(abs_row6_eq0, abs_row7_eq0);
uint8x8_t bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23);
uint8x8_t bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67);
uint8x8_t bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567);
#if defined(__aarch64__) || defined(_M_ARM64)
/* Move bitmap to a 64-bit scalar register. */
uint64_t bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0);
/* Store zerobits bitmap. */
bits[0] = ~bitmap;
#else
/* Move bitmap to two 32-bit scalar registers. */
uint32_t bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0);
uint32_t bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1);
/* Store zerobits bitmap. */
bits[0] = ~bitmap0;
bits[1] = ~bitmap1;
#endif
/* Construct signbits bitmap. */
uint8x8_t signbits_row0 = vld1_u8(coef_sign_bits + 0 * DCTSIZE);
uint8x8_t signbits_row1 = vld1_u8(coef_sign_bits + 1 * DCTSIZE);
uint8x8_t signbits_row2 = vld1_u8(coef_sign_bits + 2 * DCTSIZE);
uint8x8_t signbits_row3 = vld1_u8(coef_sign_bits + 3 * DCTSIZE);
uint8x8_t signbits_row4 = vld1_u8(coef_sign_bits + 4 * DCTSIZE);
uint8x8_t signbits_row5 = vld1_u8(coef_sign_bits + 5 * DCTSIZE);
uint8x8_t signbits_row6 = vld1_u8(coef_sign_bits + 6 * DCTSIZE);
uint8x8_t signbits_row7 = vld1_u8(coef_sign_bits + 7 * DCTSIZE);
signbits_row0 = vand_u8(signbits_row0, bitmap_mask);
signbits_row1 = vand_u8(signbits_row1, bitmap_mask);
signbits_row2 = vand_u8(signbits_row2, bitmap_mask);
signbits_row3 = vand_u8(signbits_row3, bitmap_mask);
signbits_row4 = vand_u8(signbits_row4, bitmap_mask);
signbits_row5 = vand_u8(signbits_row5, bitmap_mask);
signbits_row6 = vand_u8(signbits_row6, bitmap_mask);
signbits_row7 = vand_u8(signbits_row7, bitmap_mask);
bitmap_rows_01 = vpadd_u8(signbits_row0, signbits_row1);
bitmap_rows_23 = vpadd_u8(signbits_row2, signbits_row3);
bitmap_rows_45 = vpadd_u8(signbits_row4, signbits_row5);
bitmap_rows_67 = vpadd_u8(signbits_row6, signbits_row7);
bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23);
bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67);
bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567);
#if defined(__aarch64__) || defined(_M_ARM64)
/* Move bitmap to a 64-bit scalar register. */
bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0);
/* Store signbits bitmap. */
bits[1] = ~bitmap;
#else
/* Move bitmap to two 32-bit scalar registers. */
bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0);
bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1);
/* Store signbits bitmap. */
bits[2] = ~bitmap0;
bits[3] = ~bitmap1;
#endif
/* Construct bitmap to find EOB position (the index of the last coefficient
* equal to 1.)
*/
uint8x8_t row0_eq1 = vld1_u8(coef_eq1_bits + 0 * DCTSIZE);
uint8x8_t row1_eq1 = vld1_u8(coef_eq1_bits + 1 * DCTSIZE);
uint8x8_t row2_eq1 = vld1_u8(coef_eq1_bits + 2 * DCTSIZE);
uint8x8_t row3_eq1 = vld1_u8(coef_eq1_bits + 3 * DCTSIZE);
uint8x8_t row4_eq1 = vld1_u8(coef_eq1_bits + 4 * DCTSIZE);
uint8x8_t row5_eq1 = vld1_u8(coef_eq1_bits + 5 * DCTSIZE);
uint8x8_t row6_eq1 = vld1_u8(coef_eq1_bits + 6 * DCTSIZE);
uint8x8_t row7_eq1 = vld1_u8(coef_eq1_bits + 7 * DCTSIZE);
row0_eq1 = vand_u8(row0_eq1, bitmap_mask);
row1_eq1 = vand_u8(row1_eq1, bitmap_mask);
row2_eq1 = vand_u8(row2_eq1, bitmap_mask);
row3_eq1 = vand_u8(row3_eq1, bitmap_mask);
row4_eq1 = vand_u8(row4_eq1, bitmap_mask);
row5_eq1 = vand_u8(row5_eq1, bitmap_mask);
row6_eq1 = vand_u8(row6_eq1, bitmap_mask);
row7_eq1 = vand_u8(row7_eq1, bitmap_mask);
bitmap_rows_01 = vpadd_u8(row0_eq1, row1_eq1);
bitmap_rows_23 = vpadd_u8(row2_eq1, row3_eq1);
bitmap_rows_45 = vpadd_u8(row4_eq1, row5_eq1);
bitmap_rows_67 = vpadd_u8(row6_eq1, row7_eq1);
bitmap_rows_0123 = vpadd_u8(bitmap_rows_01, bitmap_rows_23);
bitmap_rows_4567 = vpadd_u8(bitmap_rows_45, bitmap_rows_67);
bitmap_all = vpadd_u8(bitmap_rows_0123, bitmap_rows_4567);
#if defined(__aarch64__) || defined(_M_ARM64)
/* Move bitmap to a 64-bit scalar register. */
bitmap = vget_lane_u64(vreinterpret_u64_u8(bitmap_all), 0);
/* Return EOB position. */
if (bitmap == 0) {
/* EOB position is defined to be 0 if all coefficients != 1. */
return 0;
} else {
return 63 - BUILTIN_CLZLL(bitmap);
}
#else
/* Move bitmap to two 32-bit scalar registers. */
bitmap0 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 0);
bitmap1 = vget_lane_u32(vreinterpret_u32_u8(bitmap_all), 1);
/* Return EOB position. */
if (bitmap0 == 0 && bitmap1 == 0) {
return 0;
} else if (bitmap1 != 0) {
return 63 - BUILTIN_CLZ(bitmap1);
} else {
return 31 - BUILTIN_CLZ(bitmap0);
}
#endif
}

192
simd/arm/jcsample-neon.c Normal file
View File

@@ -0,0 +1,192 @@
/*
* jcsample-neon.c - downsampling (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include <arm_neon.h>
ALIGN(16) static const uint8_t jsimd_h2_downsample_consts[] = {
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 0 */
0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 1 */
0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0E,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 2 */
0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0D, 0x0D,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 3 */
0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0C, 0x0C, 0x0C,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 4 */
0x08, 0x09, 0x0A, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 5 */
0x08, 0x09, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 6 */
0x08, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 7 */
0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, /* Pad 8 */
0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x06, /* Pad 9 */
0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x05, 0x05, /* Pad 10 */
0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
0x00, 0x01, 0x02, 0x03, 0x04, 0x04, 0x04, 0x04, /* Pad 11 */
0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
0x00, 0x01, 0x02, 0x03, 0x03, 0x03, 0x03, 0x03, /* Pad 12 */
0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
0x00, 0x01, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, /* Pad 13 */
0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
0x00, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, /* Pad 14 */
0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* Pad 15 */
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
};
/* Downsample pixel values of a single component.
* This version handles the common case of 2:1 horizontal and 1:1 vertical,
* without smoothing.
*/
void jsimd_h2v1_downsample_neon(JDIMENSION image_width, int max_v_samp_factor,
JDIMENSION v_samp_factor,
JDIMENSION width_in_blocks,
JSAMPARRAY input_data, JSAMPARRAY output_data)
{
JSAMPROW inptr, outptr;
/* Load expansion mask to pad remaining elements of last DCT block. */
const int mask_offset = 16 * ((width_in_blocks * 2 * DCTSIZE) - image_width);
const uint8x16_t expand_mask =
vld1q_u8(&jsimd_h2_downsample_consts[mask_offset]);
/* Load bias pattern (alternating every pixel.) */
/* { 0, 1, 0, 1, 0, 1, 0, 1 } */
const uint16x8_t bias = vreinterpretq_u16_u32(vdupq_n_u32(0x00010000));
unsigned i, outrow;
for (outrow = 0; outrow < v_samp_factor; outrow++) {
outptr = output_data[outrow];
inptr = input_data[outrow];
/* Downsample all but the last DCT block of pixels. */
for (i = 0; i < width_in_blocks - 1; i++) {
uint8x16_t pixels = vld1q_u8(inptr + i * 2 * DCTSIZE);
/* Add adjacent pixel values, widen to 16-bit, and add bias. */
uint16x8_t samples_u16 = vpadalq_u8(bias, pixels);
/* Divide total by 2 and narrow to 8-bit. */
uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 1);
/* Store samples to memory. */
vst1_u8(outptr + i * DCTSIZE, samples_u8);
}
/* Load pixels in last DCT block into a table. */
uint8x16_t pixels = vld1q_u8(inptr + (width_in_blocks - 1) * 2 * DCTSIZE);
#if defined(__aarch64__) || defined(_M_ARM64)
/* Pad the empty elements with the value of the last pixel. */
pixels = vqtbl1q_u8(pixels, expand_mask);
#else
uint8x8x2_t table = { { vget_low_u8(pixels), vget_high_u8(pixels) } };
pixels = vcombine_u8(vtbl2_u8(table, vget_low_u8(expand_mask)),
vtbl2_u8(table, vget_high_u8(expand_mask)));
#endif
/* Add adjacent pixel values, widen to 16-bit, and add bias. */
uint16x8_t samples_u16 = vpadalq_u8(bias, pixels);
/* Divide total by 2, narrow to 8-bit, and store. */
uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 1);
vst1_u8(outptr + (width_in_blocks - 1) * DCTSIZE, samples_u8);
}
}
/* Downsample pixel values of a single component.
* This version handles the standard case of 2:1 horizontal and 2:1 vertical,
* without smoothing.
*/
void jsimd_h2v2_downsample_neon(JDIMENSION image_width, int max_v_samp_factor,
JDIMENSION v_samp_factor,
JDIMENSION width_in_blocks,
JSAMPARRAY input_data, JSAMPARRAY output_data)
{
JSAMPROW inptr0, inptr1, outptr;
/* Load expansion mask to pad remaining elements of last DCT block. */
const int mask_offset = 16 * ((width_in_blocks * 2 * DCTSIZE) - image_width);
const uint8x16_t expand_mask =
vld1q_u8(&jsimd_h2_downsample_consts[mask_offset]);
/* Load bias pattern (alternating every pixel.) */
/* { 1, 2, 1, 2, 1, 2, 1, 2 } */
const uint16x8_t bias = vreinterpretq_u16_u32(vdupq_n_u32(0x00020001));
unsigned i, outrow;
for (outrow = 0; outrow < v_samp_factor; outrow++) {
outptr = output_data[outrow];
inptr0 = input_data[outrow];
inptr1 = input_data[outrow + 1];
/* Downsample all but the last DCT block of pixels. */
for (i = 0; i < width_in_blocks - 1; i++) {
uint8x16_t pixels_r0 = vld1q_u8(inptr0 + i * 2 * DCTSIZE);
uint8x16_t pixels_r1 = vld1q_u8(inptr1 + i * 2 * DCTSIZE);
/* Add adjacent pixel values in row 0, widen to 16-bit, and add bias. */
uint16x8_t samples_u16 = vpadalq_u8(bias, pixels_r0);
/* Add adjacent pixel values in row 1, widen to 16-bit, and accumulate.
*/
samples_u16 = vpadalq_u8(samples_u16, pixels_r1);
/* Divide total by 4 and narrow to 8-bit. */
uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 2);
/* Store samples to memory and increment pointers. */
vst1_u8(outptr + i * DCTSIZE, samples_u8);
}
/* Load pixels in last DCT block into a table. */
uint8x16_t pixels_r0 =
vld1q_u8(inptr0 + (width_in_blocks - 1) * 2 * DCTSIZE);
uint8x16_t pixels_r1 =
vld1q_u8(inptr1 + (width_in_blocks - 1) * 2 * DCTSIZE);
#if defined(__aarch64__) || defined(_M_ARM64)
/* Pad the empty elements with the value of the last pixel. */
pixels_r0 = vqtbl1q_u8(pixels_r0, expand_mask);
pixels_r1 = vqtbl1q_u8(pixels_r1, expand_mask);
#else
uint8x8x2_t table_r0 =
{ { vget_low_u8(pixels_r0), vget_high_u8(pixels_r0) } };
uint8x8x2_t table_r1 =
{ { vget_low_u8(pixels_r1), vget_high_u8(pixels_r1) } };
pixels_r0 = vcombine_u8(vtbl2_u8(table_r0, vget_low_u8(expand_mask)),
vtbl2_u8(table_r0, vget_high_u8(expand_mask)));
pixels_r1 = vcombine_u8(vtbl2_u8(table_r1, vget_low_u8(expand_mask)),
vtbl2_u8(table_r1, vget_high_u8(expand_mask)));
#endif
/* Add adjacent pixel values in row 0, widen to 16-bit, and add bias. */
uint16x8_t samples_u16 = vpadalq_u8(bias, pixels_r0);
/* Add adjacent pixel values in row 1, widen to 16-bit, and accumulate. */
samples_u16 = vpadalq_u8(samples_u16, pixels_r1);
/* Divide total by 4, narrow to 8-bit, and store. */
uint8x8_t samples_u8 = vshrn_n_u16(samples_u16, 2);
vst1_u8(outptr + (width_in_blocks - 1) * DCTSIZE, samples_u8);
}
}

353
simd/arm/jdcolext-neon.c Normal file
View File

@@ -0,0 +1,353 @@
/*
* jdcolext-neon.c - colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
/* This file is included by jdcolor-neon.c. */
/* YCbCr -> RGB conversion is defined by the following equations:
* R = Y + 1.40200 * (Cr - 128)
* G = Y - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128)
* B = Y + 1.77200 * (Cb - 128)
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.3441467 = 11277 * 2^-15
* 0.7141418 = 23401 * 2^-15
* 1.4020386 = 22971 * 2^-14
* 1.7720337 = 29033 * 2^-14
* These constants are defined in jdcolor-neon.c.
*
* To ensure correct results, rounding is used when descaling.
*/
/* Notes on safe memory access for YCbCr -> RGB conversion routines:
*
* Input memory buffers can be safely overread up to the next multiple of
* ALIGN_SIZE bytes, since they are always allocated by alloc_sarray() in
* jmemmgr.c.
*
* The output buffer cannot safely be written beyond output_width, since
* output_buf points to a possibly unpadded row in the decompressed image
* buffer allocated by the calling program.
*/
void jsimd_ycc_rgb_convert_neon(JDIMENSION output_width, JSAMPIMAGE input_buf,
JDIMENSION input_row, JSAMPARRAY output_buf,
int num_rows)
{
JSAMPROW outptr;
/* Pointers to Y, Cb, and Cr data */
JSAMPROW inptr0, inptr1, inptr2;
const int16x4_t consts = vld1_s16(jsimd_ycc_rgb_convert_neon_consts);
const int16x8_t neg_128 = vdupq_n_s16(-128);
while (--num_rows >= 0) {
inptr0 = input_buf[0][input_row];
inptr1 = input_buf[1][input_row];
inptr2 = input_buf[2][input_row];
input_row++;
outptr = *output_buf++;
int cols_remaining = output_width;
for (; cols_remaining >= 16; cols_remaining -= 16) {
uint8x16_t y = vld1q_u8(inptr0);
uint8x16_t cb = vld1q_u8(inptr1);
uint8x16_t cr = vld1q_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128_l =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128),
vget_low_u8(cr)));
int16x8_t cr_128_h =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128),
vget_high_u8(cr)));
int16x8_t cb_128_l =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128),
vget_low_u8(cb)));
int16x8_t cb_128_h =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128),
vget_high_u8(cb)));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_ll = vmull_lane_s16(vget_low_s16(cb_128_l), consts, 0);
int32x4_t g_sub_y_lh = vmull_lane_s16(vget_high_s16(cb_128_l),
consts, 0);
int32x4_t g_sub_y_hl = vmull_lane_s16(vget_low_s16(cb_128_h), consts, 0);
int32x4_t g_sub_y_hh = vmull_lane_s16(vget_high_s16(cb_128_h),
consts, 0);
g_sub_y_ll = vmlsl_lane_s16(g_sub_y_ll, vget_low_s16(cr_128_l),
consts, 1);
g_sub_y_lh = vmlsl_lane_s16(g_sub_y_lh, vget_high_s16(cr_128_l),
consts, 1);
g_sub_y_hl = vmlsl_lane_s16(g_sub_y_hl, vget_low_s16(cr_128_h),
consts, 1);
g_sub_y_hh = vmlsl_lane_s16(g_sub_y_hh, vget_high_s16(cr_128_h),
consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y_l = vcombine_s16(vrshrn_n_s32(g_sub_y_ll, 15),
vrshrn_n_s32(g_sub_y_lh, 15));
int16x8_t g_sub_y_h = vcombine_s16(vrshrn_n_s32(g_sub_y_hl, 15),
vrshrn_n_s32(g_sub_y_hh, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y_l = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128_l, 1),
consts, 2);
int16x8_t r_sub_y_h = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128_h, 1),
consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y_l = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128_l, 1),
consts, 3);
int16x8_t b_sub_y_h = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128_h, 1),
consts, 3);
/* Add Y. */
int16x8_t r_l =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y_l),
vget_low_u8(y)));
int16x8_t r_h =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y_h),
vget_high_u8(y)));
int16x8_t b_l =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y_l),
vget_low_u8(y)));
int16x8_t b_h =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y_h),
vget_high_u8(y)));
int16x8_t g_l =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y_l),
vget_low_u8(y)));
int16x8_t g_h =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y_h),
vget_high_u8(y)));
#if RGB_PIXELSIZE == 4
uint8x16x4_t rgba;
/* Convert each component to unsigned and narrow, clamping to [0-255]. */
rgba.val[RGB_RED] = vcombine_u8(vqmovun_s16(r_l), vqmovun_s16(r_h));
rgba.val[RGB_GREEN] = vcombine_u8(vqmovun_s16(g_l), vqmovun_s16(g_h));
rgba.val[RGB_BLUE] = vcombine_u8(vqmovun_s16(b_l), vqmovun_s16(b_h));
/* Set alpha channel to opaque (0xFF). */
rgba.val[RGB_ALPHA] = vdupq_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
vst4q_u8(outptr, rgba);
#elif RGB_PIXELSIZE == 3
uint8x16x3_t rgb;
/* Convert each component to unsigned and narrow, clamping to [0-255]. */
rgb.val[RGB_RED] = vcombine_u8(vqmovun_s16(r_l), vqmovun_s16(r_h));
rgb.val[RGB_GREEN] = vcombine_u8(vqmovun_s16(g_l), vqmovun_s16(g_h));
rgb.val[RGB_BLUE] = vcombine_u8(vqmovun_s16(b_l), vqmovun_s16(b_h));
/* Store RGB pixel data to memory. */
vst3q_u8(outptr, rgb);
#else
/* Pack R, G, and B values in ratio 5:6:5. */
uint16x8_t rgb565_l = vqshluq_n_s16(r_l, 8);
rgb565_l = vsriq_n_u16(rgb565_l, vqshluq_n_s16(g_l, 8), 5);
rgb565_l = vsriq_n_u16(rgb565_l, vqshluq_n_s16(b_l, 8), 11);
uint16x8_t rgb565_h = vqshluq_n_s16(r_h, 8);
rgb565_h = vsriq_n_u16(rgb565_h, vqshluq_n_s16(g_h, 8), 5);
rgb565_h = vsriq_n_u16(rgb565_h, vqshluq_n_s16(b_h, 8), 11);
/* Store RGB pixel data to memory. */
vst1q_u16((uint16_t *)outptr, rgb565_l);
vst1q_u16(((uint16_t *)outptr) + 8, rgb565_h);
#endif
/* Increment pointers. */
inptr0 += 16;
inptr1 += 16;
inptr2 += 16;
outptr += (RGB_PIXELSIZE * 16);
}
if (cols_remaining >= 8) {
uint8x8_t y = vld1_u8(inptr0);
uint8x8_t cb = vld1_u8(inptr1);
uint8x8_t cr = vld1_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr));
int16x8_t cb_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0);
int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0);
g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1);
g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15),
vrshrn_n_s32(g_sub_y_h, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1),
consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1),
consts, 3);
/* Add Y. */
int16x8_t r =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y));
int16x8_t b =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y));
int16x8_t g =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y));
#if RGB_PIXELSIZE == 4
uint8x8x4_t rgba;
/* Convert each component to unsigned and narrow, clamping to [0-255]. */
rgba.val[RGB_RED] = vqmovun_s16(r);
rgba.val[RGB_GREEN] = vqmovun_s16(g);
rgba.val[RGB_BLUE] = vqmovun_s16(b);
/* Set alpha channel to opaque (0xFF). */
rgba.val[RGB_ALPHA] = vdup_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
vst4_u8(outptr, rgba);
#elif RGB_PIXELSIZE == 3
uint8x8x3_t rgb;
/* Convert each component to unsigned and narrow, clamping to [0-255]. */
rgb.val[RGB_RED] = vqmovun_s16(r);
rgb.val[RGB_GREEN] = vqmovun_s16(g);
rgb.val[RGB_BLUE] = vqmovun_s16(b);
/* Store RGB pixel data to memory. */
vst3_u8(outptr, rgb);
#else
/* Pack R, G, and B values in ratio 5:6:5. */
uint16x8_t rgb565 = vqshluq_n_s16(r, 8);
rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(g, 8), 5);
rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(b, 8), 11);
/* Store RGB pixel data to memory. */
vst1q_u16((uint16_t *)outptr, rgb565);
#endif
/* Increment pointers. */
inptr0 += 8;
inptr1 += 8;
inptr2 += 8;
outptr += (RGB_PIXELSIZE * 8);
cols_remaining -= 8;
}
/* Handle the tail elements. */
if (cols_remaining > 0) {
uint8x8_t y = vld1_u8(inptr0);
uint8x8_t cb = vld1_u8(inptr1);
uint8x8_t cr = vld1_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr));
int16x8_t cb_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0);
int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0);
g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1);
g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15),
vrshrn_n_s32(g_sub_y_h, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1),
consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1),
consts, 3);
/* Add Y. */
int16x8_t r =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y), y));
int16x8_t b =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y), y));
int16x8_t g =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y), y));
#if RGB_PIXELSIZE == 4
uint8x8x4_t rgba;
/* Convert each component to unsigned and narrow, clamping to [0-255]. */
rgba.val[RGB_RED] = vqmovun_s16(r);
rgba.val[RGB_GREEN] = vqmovun_s16(g);
rgba.val[RGB_BLUE] = vqmovun_s16(b);
/* Set alpha channel to opaque (0xFF). */
rgba.val[RGB_ALPHA] = vdup_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
switch (cols_remaining) {
case 7:
vst4_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgba, 6);
case 6:
vst4_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgba, 5);
case 5:
vst4_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgba, 4);
case 4:
vst4_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgba, 3);
case 3:
vst4_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgba, 2);
case 2:
vst4_lane_u8(outptr + RGB_PIXELSIZE, rgba, 1);
case 1:
vst4_lane_u8(outptr, rgba, 0);
default:
break;
}
#elif RGB_PIXELSIZE == 3
uint8x8x3_t rgb;
/* Convert each component to unsigned and narrow, clamping to [0-255]. */
rgb.val[RGB_RED] = vqmovun_s16(r);
rgb.val[RGB_GREEN] = vqmovun_s16(g);
rgb.val[RGB_BLUE] = vqmovun_s16(b);
/* Store RGB pixel data to memory. */
switch (cols_remaining) {
case 7:
vst3_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgb, 6);
case 6:
vst3_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgb, 5);
case 5:
vst3_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgb, 4);
case 4:
vst3_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgb, 3);
case 3:
vst3_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgb, 2);
case 2:
vst3_lane_u8(outptr + RGB_PIXELSIZE, rgb, 1);
case 1:
vst3_lane_u8(outptr, rgb, 0);
default:
break;
}
#else
/* Pack R, G, and B values in ratio 5:6:5. */
uint16x8_t rgb565 = vqshluq_n_s16(r, 8);
rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(g, 8), 5);
rgb565 = vsriq_n_u16(rgb565, vqshluq_n_s16(b, 8), 11);
/* Store RGB565 pixel data to memory. */
switch (cols_remaining) {
case 7:
vst1q_lane_u16((uint16_t *)(outptr + 6 * RGB_PIXELSIZE), rgb565, 6);
case 6:
vst1q_lane_u16((uint16_t *)(outptr + 5 * RGB_PIXELSIZE), rgb565, 5);
case 5:
vst1q_lane_u16((uint16_t *)(outptr + 4 * RGB_PIXELSIZE), rgb565, 4);
case 4:
vst1q_lane_u16((uint16_t *)(outptr + 3 * RGB_PIXELSIZE), rgb565, 3);
case 3:
vst1q_lane_u16((uint16_t *)(outptr + 2 * RGB_PIXELSIZE), rgb565, 2);
case 2:
vst1q_lane_u16((uint16_t *)(outptr + RGB_PIXELSIZE), rgb565, 1);
case 1:
vst1q_lane_u16((uint16_t *)outptr, rgb565, 0);
default:
break;
}
#endif
}
}
}

141
simd/arm/jdcolor-neon.c Normal file
View File

@@ -0,0 +1,141 @@
/*
* jdcolor-neon.c - colorspace conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include <arm_neon.h>
/* YCbCr -> RGB conversion constants */
#define F_0_344 11277 /* 0.3441467 = 11277 * 2^-15 */
#define F_0_714 23401 /* 0.7141418 = 23401 * 2^-15 */
#define F_1_402 22971 /* 1.4020386 = 22971 * 2^-14 */
#define F_1_772 29033 /* 1.7720337 = 29033 * 2^-14 */
ALIGN(16) static const int16_t jsimd_ycc_rgb_convert_neon_consts[] = {
-F_0_344, F_0_714, F_1_402, F_1_772
};
/* Include inline routines for colorspace extensions. */
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#define RGB_RED EXT_RGB_RED
#define RGB_GREEN EXT_RGB_GREEN
#define RGB_BLUE EXT_RGB_BLUE
#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extrgb_convert_neon
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
#define RGB_BLUE EXT_RGBX_BLUE
#define RGB_ALPHA 3
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extrgbx_convert_neon
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon
#define RGB_RED EXT_BGR_RED
#define RGB_GREEN EXT_BGR_GREEN
#define RGB_BLUE EXT_BGR_BLUE
#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extbgr_convert_neon
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
#define RGB_BLUE EXT_BGRX_BLUE
#define RGB_ALPHA 3
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extbgrx_convert_neon
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
#define RGB_BLUE EXT_XBGR_BLUE
#define RGB_ALPHA 0
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extxbgr_convert_neon
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
#define RGB_BLUE EXT_XRGB_BLUE
#define RGB_ALPHA 0
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_extxrgb_convert_neon
#include "jdcolext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon
/* YCbCr -> RGB565 Conversion */
#define RGB_PIXELSIZE 2
#define jsimd_ycc_rgb_convert_neon jsimd_ycc_rgb565_convert_neon
#include "jdcolext-neon.c"
#undef RGB_PIXELSIZE
#undef jsimd_ycc_rgb_convert_neon

144
simd/arm/jdmerge-neon.c Normal file
View File

@@ -0,0 +1,144 @@
/*
* jdmerge-neon.c - merged upsampling/color conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include <arm_neon.h>
/* YCbCr -> RGB conversion constants */
#define F_0_344 11277 /* 0.3441467 = 11277 * 2^-15 */
#define F_0_714 23401 /* 0.7141418 = 23401 * 2^-15 */
#define F_1_402 22971 /* 1.4020386 = 22971 * 2^-14 */
#define F_1_772 29033 /* 1.7720337 = 29033 * 2^-14 */
ALIGN(16) static const int16_t jsimd_ycc_rgb_convert_neon_consts[] = {
-F_0_344, F_0_714, F_1_402, F_1_772
};
/* Include inline routines for colorspace extensions. */
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#define RGB_RED EXT_RGB_RED
#define RGB_GREEN EXT_RGB_GREEN
#define RGB_BLUE EXT_RGB_BLUE
#define RGB_PIXELSIZE EXT_RGB_PIXELSIZE
#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extrgb_merged_upsample_neon
#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extrgb_merged_upsample_neon
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_h2v1_merged_upsample_neon
#undef jsimd_h2v2_merged_upsample_neon
#define RGB_RED EXT_RGBX_RED
#define RGB_GREEN EXT_RGBX_GREEN
#define RGB_BLUE EXT_RGBX_BLUE
#define RGB_ALPHA 3
#define RGB_PIXELSIZE EXT_RGBX_PIXELSIZE
#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extrgbx_merged_upsample_neon
#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extrgbx_merged_upsample_neon
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_h2v1_merged_upsample_neon
#undef jsimd_h2v2_merged_upsample_neon
#define RGB_RED EXT_BGR_RED
#define RGB_GREEN EXT_BGR_GREEN
#define RGB_BLUE EXT_BGR_BLUE
#define RGB_PIXELSIZE EXT_BGR_PIXELSIZE
#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extbgr_merged_upsample_neon
#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extbgr_merged_upsample_neon
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_PIXELSIZE
#undef jsimd_h2v1_merged_upsample_neon
#undef jsimd_h2v2_merged_upsample_neon
#define RGB_RED EXT_BGRX_RED
#define RGB_GREEN EXT_BGRX_GREEN
#define RGB_BLUE EXT_BGRX_BLUE
#define RGB_ALPHA 3
#define RGB_PIXELSIZE EXT_BGRX_PIXELSIZE
#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extbgrx_merged_upsample_neon
#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extbgrx_merged_upsample_neon
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_h2v1_merged_upsample_neon
#undef jsimd_h2v2_merged_upsample_neon
#define RGB_RED EXT_XBGR_RED
#define RGB_GREEN EXT_XBGR_GREEN
#define RGB_BLUE EXT_XBGR_BLUE
#define RGB_ALPHA 0
#define RGB_PIXELSIZE EXT_XBGR_PIXELSIZE
#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extxbgr_merged_upsample_neon
#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extxbgr_merged_upsample_neon
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_h2v1_merged_upsample_neon
#undef jsimd_h2v2_merged_upsample_neon
#define RGB_RED EXT_XRGB_RED
#define RGB_GREEN EXT_XRGB_GREEN
#define RGB_BLUE EXT_XRGB_BLUE
#define RGB_ALPHA 0
#define RGB_PIXELSIZE EXT_XRGB_PIXELSIZE
#define jsimd_h2v1_merged_upsample_neon jsimd_h2v1_extxrgb_merged_upsample_neon
#define jsimd_h2v2_merged_upsample_neon jsimd_h2v2_extxrgb_merged_upsample_neon
#include "jdmrgext-neon.c"
#undef RGB_RED
#undef RGB_GREEN
#undef RGB_BLUE
#undef RGB_ALPHA
#undef RGB_PIXELSIZE
#undef jsimd_h2v1_merged_upsample_neon

667
simd/arm/jdmrgext-neon.c Normal file
View File

@@ -0,0 +1,667 @@
/*
* jdmrgext-neon.c - merged upsampling/color conversion (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
/* This file is included by jdmerge-neon.c. */
/* These routines combine simple (non-fancy, i.e. non-smooth) h2v1 or h2v2
* chroma upsampling and YCbCr -> RGB color conversion into a single function.
*
* As with the standalone functions, YCbCr -> RGB conversion is defined by the
* following equations:
* R = Y + 1.40200 * (Cr - 128)
* G = Y - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128)
* B = Y + 1.77200 * (Cb - 128)
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.3441467 = 11277 * 2^-15
* 0.7141418 = 23401 * 2^-15
* 1.4020386 = 22971 * 2^-14
* 1.7720337 = 29033 * 2^-14
* These constants are defined in jdmerge-neon.c.
*
* To ensure correct results, rounding is used when descaling.
*/
/* Notes on safe memory access for merged upsampling/YCbCr -> RGB conversion
* routines:
*
* Input memory buffers can be safely overread up to the next multiple of
* ALIGN_SIZE bytes, since they are always allocated by alloc_sarray() in
* jmemmgr.c.
*
* The output buffer cannot safely be written beyond output_width, since
* output_buf points to a possibly unpadded row in the decompressed image
* buffer allocated by the calling program.
*/
/* Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
*/
void jsimd_h2v1_merged_upsample_neon(JDIMENSION output_width,
JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr,
JSAMPARRAY output_buf)
{
JSAMPROW outptr;
/* Pointers to Y, Cb, and Cr data */
JSAMPROW inptr0, inptr1, inptr2;
const int16x4_t consts = vld1_s16(jsimd_ycc_rgb_convert_neon_consts);
const int16x8_t neg_128 = vdupq_n_s16(-128);
inptr0 = input_buf[0][in_row_group_ctr];
inptr1 = input_buf[1][in_row_group_ctr];
inptr2 = input_buf[2][in_row_group_ctr];
outptr = output_buf[0];
int cols_remaining = output_width;
for (; cols_remaining >= 16; cols_remaining -= 16) {
/* De-interleave Y component values into two separate vectors, one
* containing the component values with even-numbered indices and one
* containing the component values with odd-numbered indices.
*/
uint8x8x2_t y = vld2_u8(inptr0);
uint8x8_t cb = vld1_u8(inptr1);
uint8x8_t cr = vld1_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr));
int16x8_t cb_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0);
int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0);
g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1);
g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15),
vrshrn_n_s32(g_sub_y_h, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3);
/* Add the chroma-derived values (G-Y, R-Y, and B-Y) to both the "even" and
* "odd" Y component values. This effectively upsamples the chroma
* components horizontally.
*/
int16x8_t g_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y.val[0]));
int16x8_t r_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y.val[0]));
int16x8_t b_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y.val[0]));
int16x8_t g_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y.val[1]));
int16x8_t r_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y.val[1]));
int16x8_t b_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y.val[1]));
/* Convert each component to unsigned and narrow, clamping to [0-255].
* Re-interleave the "even" and "odd" component values.
*/
uint8x8x2_t r = vzip_u8(vqmovun_s16(r_even), vqmovun_s16(r_odd));
uint8x8x2_t g = vzip_u8(vqmovun_s16(g_even), vqmovun_s16(g_odd));
uint8x8x2_t b = vzip_u8(vqmovun_s16(b_even), vqmovun_s16(b_odd));
#ifdef RGB_ALPHA
uint8x16x4_t rgba;
rgba.val[RGB_RED] = vcombine_u8(r.val[0], r.val[1]);
rgba.val[RGB_GREEN] = vcombine_u8(g.val[0], g.val[1]);
rgba.val[RGB_BLUE] = vcombine_u8(b.val[0], b.val[1]);
/* Set alpha channel to opaque (0xFF). */
rgba.val[RGB_ALPHA] = vdupq_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
vst4q_u8(outptr, rgba);
#else
uint8x16x3_t rgb;
rgb.val[RGB_RED] = vcombine_u8(r.val[0], r.val[1]);
rgb.val[RGB_GREEN] = vcombine_u8(g.val[0], g.val[1]);
rgb.val[RGB_BLUE] = vcombine_u8(b.val[0], b.val[1]);
/* Store RGB pixel data to memory. */
vst3q_u8(outptr, rgb);
#endif
/* Increment pointers. */
inptr0 += 16;
inptr1 += 8;
inptr2 += 8;
outptr += (RGB_PIXELSIZE * 16);
}
if (cols_remaining > 0) {
/* De-interleave Y component values into two separate vectors, one
* containing the component values with even-numbered indices and one
* containing the component values with odd-numbered indices.
*/
uint8x8x2_t y = vld2_u8(inptr0);
uint8x8_t cb = vld1_u8(inptr1);
uint8x8_t cr = vld1_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr));
int16x8_t cb_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0);
int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0);
g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1);
g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15),
vrshrn_n_s32(g_sub_y_h, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3);
/* Add the chroma-derived values (G-Y, R-Y, and B-Y) to both the "even" and
* "odd" Y component values. This effectively upsamples the chroma
* components horizontally.
*/
int16x8_t g_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y.val[0]));
int16x8_t r_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y.val[0]));
int16x8_t b_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y.val[0]));
int16x8_t g_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y.val[1]));
int16x8_t r_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y.val[1]));
int16x8_t b_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y.val[1]));
/* Convert each component to unsigned and narrow, clamping to [0-255].
* Re-interleave the "even" and "odd" component values.
*/
uint8x8x2_t r = vzip_u8(vqmovun_s16(r_even), vqmovun_s16(r_odd));
uint8x8x2_t g = vzip_u8(vqmovun_s16(g_even), vqmovun_s16(g_odd));
uint8x8x2_t b = vzip_u8(vqmovun_s16(b_even), vqmovun_s16(b_odd));
#ifdef RGB_ALPHA
uint8x8x4_t rgba_h;
rgba_h.val[RGB_RED] = r.val[1];
rgba_h.val[RGB_GREEN] = g.val[1];
rgba_h.val[RGB_BLUE] = b.val[1];
/* Set alpha channel to opaque (0xFF). */
rgba_h.val[RGB_ALPHA] = vdup_n_u8(0xFF);
uint8x8x4_t rgba_l;
rgba_l.val[RGB_RED] = r.val[0];
rgba_l.val[RGB_GREEN] = g.val[0];
rgba_l.val[RGB_BLUE] = b.val[0];
/* Set alpha channel to opaque (0xFF). */
rgba_l.val[RGB_ALPHA] = vdup_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
switch (cols_remaining) {
case 15:
vst4_lane_u8(outptr + 14 * RGB_PIXELSIZE, rgba_h, 6);
case 14:
vst4_lane_u8(outptr + 13 * RGB_PIXELSIZE, rgba_h, 5);
case 13:
vst4_lane_u8(outptr + 12 * RGB_PIXELSIZE, rgba_h, 4);
case 12:
vst4_lane_u8(outptr + 11 * RGB_PIXELSIZE, rgba_h, 3);
case 11:
vst4_lane_u8(outptr + 10 * RGB_PIXELSIZE, rgba_h, 2);
case 10:
vst4_lane_u8(outptr + 9 * RGB_PIXELSIZE, rgba_h, 1);
case 9:
vst4_lane_u8(outptr + 8 * RGB_PIXELSIZE, rgba_h, 0);
case 8:
vst4_u8(outptr, rgba_l);
break;
case 7:
vst4_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgba_l, 6);
case 6:
vst4_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgba_l, 5);
case 5:
vst4_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgba_l, 4);
case 4:
vst4_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgba_l, 3);
case 3:
vst4_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgba_l, 2);
case 2:
vst4_lane_u8(outptr + RGB_PIXELSIZE, rgba_l, 1);
case 1:
vst4_lane_u8(outptr, rgba_l, 0);
default:
break;
}
#else
uint8x8x3_t rgb_h;
rgb_h.val[RGB_RED] = r.val[1];
rgb_h.val[RGB_GREEN] = g.val[1];
rgb_h.val[RGB_BLUE] = b.val[1];
uint8x8x3_t rgb_l;
rgb_l.val[RGB_RED] = r.val[0];
rgb_l.val[RGB_GREEN] = g.val[0];
rgb_l.val[RGB_BLUE] = b.val[0];
/* Store RGB pixel data to memory. */
switch (cols_remaining) {
case 15:
vst3_lane_u8(outptr + 14 * RGB_PIXELSIZE, rgb_h, 6);
case 14:
vst3_lane_u8(outptr + 13 * RGB_PIXELSIZE, rgb_h, 5);
case 13:
vst3_lane_u8(outptr + 12 * RGB_PIXELSIZE, rgb_h, 4);
case 12:
vst3_lane_u8(outptr + 11 * RGB_PIXELSIZE, rgb_h, 3);
case 11:
vst3_lane_u8(outptr + 10 * RGB_PIXELSIZE, rgb_h, 2);
case 10:
vst3_lane_u8(outptr + 9 * RGB_PIXELSIZE, rgb_h, 1);
case 9:
vst3_lane_u8(outptr + 8 * RGB_PIXELSIZE, rgb_h, 0);
case 8:
vst3_u8(outptr, rgb_l);
break;
case 7:
vst3_lane_u8(outptr + 6 * RGB_PIXELSIZE, rgb_l, 6);
case 6:
vst3_lane_u8(outptr + 5 * RGB_PIXELSIZE, rgb_l, 5);
case 5:
vst3_lane_u8(outptr + 4 * RGB_PIXELSIZE, rgb_l, 4);
case 4:
vst3_lane_u8(outptr + 3 * RGB_PIXELSIZE, rgb_l, 3);
case 3:
vst3_lane_u8(outptr + 2 * RGB_PIXELSIZE, rgb_l, 2);
case 2:
vst3_lane_u8(outptr + RGB_PIXELSIZE, rgb_l, 1);
case 1:
vst3_lane_u8(outptr, rgb_l, 0);
default:
break;
}
#endif
}
}
/* Upsample and color convert for the case of 2:1 horizontal and 2:1 vertical.
*
* See comments above for details regarding color conversion and safe memory
* access.
*/
void jsimd_h2v2_merged_upsample_neon(JDIMENSION output_width,
JSAMPIMAGE input_buf,
JDIMENSION in_row_group_ctr,
JSAMPARRAY output_buf)
{
JSAMPROW outptr0, outptr1;
/* Pointers to Y (both rows), Cb, and Cr data */
JSAMPROW inptr0_0, inptr0_1, inptr1, inptr2;
const int16x4_t consts = vld1_s16(jsimd_ycc_rgb_convert_neon_consts);
const int16x8_t neg_128 = vdupq_n_s16(-128);
inptr0_0 = input_buf[0][in_row_group_ctr * 2];
inptr0_1 = input_buf[0][in_row_group_ctr * 2 + 1];
inptr1 = input_buf[1][in_row_group_ctr];
inptr2 = input_buf[2][in_row_group_ctr];
outptr0 = output_buf[0];
outptr1 = output_buf[1];
int cols_remaining = output_width;
for (; cols_remaining >= 16; cols_remaining -= 16) {
/* For each row, de-interleave Y component values into two separate
* vectors, one containing the component values with even-numbered indices
* and one containing the component values with odd-numbered indices.
*/
uint8x8x2_t y0 = vld2_u8(inptr0_0);
uint8x8x2_t y1 = vld2_u8(inptr0_1);
uint8x8_t cb = vld1_u8(inptr1);
uint8x8_t cr = vld1_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr));
int16x8_t cb_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0);
int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0);
g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1);
g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15),
vrshrn_n_s32(g_sub_y_h, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3);
/* For each row, add the chroma-derived values (G-Y, R-Y, and B-Y) to both
* the "even" and "odd" Y component values. This effectively upsamples the
* chroma components both horizontally and vertically.
*/
int16x8_t g0_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y0.val[0]));
int16x8_t r0_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y0.val[0]));
int16x8_t b0_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y0.val[0]));
int16x8_t g0_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y0.val[1]));
int16x8_t r0_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y0.val[1]));
int16x8_t b0_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y0.val[1]));
int16x8_t g1_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y1.val[0]));
int16x8_t r1_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y1.val[0]));
int16x8_t b1_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y1.val[0]));
int16x8_t g1_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y1.val[1]));
int16x8_t r1_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y1.val[1]));
int16x8_t b1_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y1.val[1]));
/* Convert each component to unsigned and narrow, clamping to [0-255].
* Re-interleave the "even" and "odd" component values.
*/
uint8x8x2_t r0 = vzip_u8(vqmovun_s16(r0_even), vqmovun_s16(r0_odd));
uint8x8x2_t r1 = vzip_u8(vqmovun_s16(r1_even), vqmovun_s16(r1_odd));
uint8x8x2_t g0 = vzip_u8(vqmovun_s16(g0_even), vqmovun_s16(g0_odd));
uint8x8x2_t g1 = vzip_u8(vqmovun_s16(g1_even), vqmovun_s16(g1_odd));
uint8x8x2_t b0 = vzip_u8(vqmovun_s16(b0_even), vqmovun_s16(b0_odd));
uint8x8x2_t b1 = vzip_u8(vqmovun_s16(b1_even), vqmovun_s16(b1_odd));
#ifdef RGB_ALPHA
uint8x16x4_t rgba0, rgba1;
rgba0.val[RGB_RED] = vcombine_u8(r0.val[0], r0.val[1]);
rgba1.val[RGB_RED] = vcombine_u8(r1.val[0], r1.val[1]);
rgba0.val[RGB_GREEN] = vcombine_u8(g0.val[0], g0.val[1]);
rgba1.val[RGB_GREEN] = vcombine_u8(g1.val[0], g1.val[1]);
rgba0.val[RGB_BLUE] = vcombine_u8(b0.val[0], b0.val[1]);
rgba1.val[RGB_BLUE] = vcombine_u8(b1.val[0], b1.val[1]);
/* Set alpha channel to opaque (0xFF). */
rgba0.val[RGB_ALPHA] = vdupq_n_u8(0xFF);
rgba1.val[RGB_ALPHA] = vdupq_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
vst4q_u8(outptr0, rgba0);
vst4q_u8(outptr1, rgba1);
#else
uint8x16x3_t rgb0, rgb1;
rgb0.val[RGB_RED] = vcombine_u8(r0.val[0], r0.val[1]);
rgb1.val[RGB_RED] = vcombine_u8(r1.val[0], r1.val[1]);
rgb0.val[RGB_GREEN] = vcombine_u8(g0.val[0], g0.val[1]);
rgb1.val[RGB_GREEN] = vcombine_u8(g1.val[0], g1.val[1]);
rgb0.val[RGB_BLUE] = vcombine_u8(b0.val[0], b0.val[1]);
rgb1.val[RGB_BLUE] = vcombine_u8(b1.val[0], b1.val[1]);
/* Store RGB pixel data to memory. */
vst3q_u8(outptr0, rgb0);
vst3q_u8(outptr1, rgb1);
#endif
/* Increment pointers. */
inptr0_0 += 16;
inptr0_1 += 16;
inptr1 += 8;
inptr2 += 8;
outptr0 += (RGB_PIXELSIZE * 16);
outptr1 += (RGB_PIXELSIZE * 16);
}
if (cols_remaining > 0) {
/* For each row, de-interleave Y component values into two separate
* vectors, one containing the component values with even-numbered indices
* and one containing the component values with odd-numbered indices.
*/
uint8x8x2_t y0 = vld2_u8(inptr0_0);
uint8x8x2_t y1 = vld2_u8(inptr0_1);
uint8x8_t cb = vld1_u8(inptr1);
uint8x8_t cr = vld1_u8(inptr2);
/* Subtract 128 from Cb and Cr. */
int16x8_t cr_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cr));
int16x8_t cb_128 =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(neg_128), cb));
/* Compute G-Y: - 0.34414 * (Cb - 128) - 0.71414 * (Cr - 128) */
int32x4_t g_sub_y_l = vmull_lane_s16(vget_low_s16(cb_128), consts, 0);
int32x4_t g_sub_y_h = vmull_lane_s16(vget_high_s16(cb_128), consts, 0);
g_sub_y_l = vmlsl_lane_s16(g_sub_y_l, vget_low_s16(cr_128), consts, 1);
g_sub_y_h = vmlsl_lane_s16(g_sub_y_h, vget_high_s16(cr_128), consts, 1);
/* Descale G components: shift right 15, round, and narrow to 16-bit. */
int16x8_t g_sub_y = vcombine_s16(vrshrn_n_s32(g_sub_y_l, 15),
vrshrn_n_s32(g_sub_y_h, 15));
/* Compute R-Y: 1.40200 * (Cr - 128) */
int16x8_t r_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cr_128, 1), consts, 2);
/* Compute B-Y: 1.77200 * (Cb - 128) */
int16x8_t b_sub_y = vqrdmulhq_lane_s16(vshlq_n_s16(cb_128, 1), consts, 3);
/* For each row, add the chroma-derived values (G-Y, R-Y, and B-Y) to both
* the "even" and "odd" Y component values. This effectively upsamples the
* chroma components both horizontally and vertically.
*/
int16x8_t g0_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y0.val[0]));
int16x8_t r0_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y0.val[0]));
int16x8_t b0_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y0.val[0]));
int16x8_t g0_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y0.val[1]));
int16x8_t r0_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y0.val[1]));
int16x8_t b0_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y0.val[1]));
int16x8_t g1_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y1.val[0]));
int16x8_t r1_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y1.val[0]));
int16x8_t b1_even =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y1.val[0]));
int16x8_t g1_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(g_sub_y),
y1.val[1]));
int16x8_t r1_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(r_sub_y),
y1.val[1]));
int16x8_t b1_odd =
vreinterpretq_s16_u16(vaddw_u8(vreinterpretq_u16_s16(b_sub_y),
y1.val[1]));
/* Convert each component to unsigned and narrow, clamping to [0-255].
* Re-interleave the "even" and "odd" component values.
*/
uint8x8x2_t r0 = vzip_u8(vqmovun_s16(r0_even), vqmovun_s16(r0_odd));
uint8x8x2_t r1 = vzip_u8(vqmovun_s16(r1_even), vqmovun_s16(r1_odd));
uint8x8x2_t g0 = vzip_u8(vqmovun_s16(g0_even), vqmovun_s16(g0_odd));
uint8x8x2_t g1 = vzip_u8(vqmovun_s16(g1_even), vqmovun_s16(g1_odd));
uint8x8x2_t b0 = vzip_u8(vqmovun_s16(b0_even), vqmovun_s16(b0_odd));
uint8x8x2_t b1 = vzip_u8(vqmovun_s16(b1_even), vqmovun_s16(b1_odd));
#ifdef RGB_ALPHA
uint8x8x4_t rgba0_h, rgba1_h;
rgba0_h.val[RGB_RED] = r0.val[1];
rgba1_h.val[RGB_RED] = r1.val[1];
rgba0_h.val[RGB_GREEN] = g0.val[1];
rgba1_h.val[RGB_GREEN] = g1.val[1];
rgba0_h.val[RGB_BLUE] = b0.val[1];
rgba1_h.val[RGB_BLUE] = b1.val[1];
/* Set alpha channel to opaque (0xFF). */
rgba0_h.val[RGB_ALPHA] = vdup_n_u8(0xFF);
rgba1_h.val[RGB_ALPHA] = vdup_n_u8(0xFF);
uint8x8x4_t rgba0_l, rgba1_l;
rgba0_l.val[RGB_RED] = r0.val[0];
rgba1_l.val[RGB_RED] = r1.val[0];
rgba0_l.val[RGB_GREEN] = g0.val[0];
rgba1_l.val[RGB_GREEN] = g1.val[0];
rgba0_l.val[RGB_BLUE] = b0.val[0];
rgba1_l.val[RGB_BLUE] = b1.val[0];
/* Set alpha channel to opaque (0xFF). */
rgba0_l.val[RGB_ALPHA] = vdup_n_u8(0xFF);
rgba1_l.val[RGB_ALPHA] = vdup_n_u8(0xFF);
/* Store RGBA pixel data to memory. */
switch (cols_remaining) {
case 15:
vst4_lane_u8(outptr0 + 14 * RGB_PIXELSIZE, rgba0_h, 6);
vst4_lane_u8(outptr1 + 14 * RGB_PIXELSIZE, rgba1_h, 6);
case 14:
vst4_lane_u8(outptr0 + 13 * RGB_PIXELSIZE, rgba0_h, 5);
vst4_lane_u8(outptr1 + 13 * RGB_PIXELSIZE, rgba1_h, 5);
case 13:
vst4_lane_u8(outptr0 + 12 * RGB_PIXELSIZE, rgba0_h, 4);
vst4_lane_u8(outptr1 + 12 * RGB_PIXELSIZE, rgba1_h, 4);
case 12:
vst4_lane_u8(outptr0 + 11 * RGB_PIXELSIZE, rgba0_h, 3);
vst4_lane_u8(outptr1 + 11 * RGB_PIXELSIZE, rgba1_h, 3);
case 11:
vst4_lane_u8(outptr0 + 10 * RGB_PIXELSIZE, rgba0_h, 2);
vst4_lane_u8(outptr1 + 10 * RGB_PIXELSIZE, rgba1_h, 2);
case 10:
vst4_lane_u8(outptr0 + 9 * RGB_PIXELSIZE, rgba0_h, 1);
vst4_lane_u8(outptr1 + 9 * RGB_PIXELSIZE, rgba1_h, 1);
case 9:
vst4_lane_u8(outptr0 + 8 * RGB_PIXELSIZE, rgba0_h, 0);
vst4_lane_u8(outptr1 + 8 * RGB_PIXELSIZE, rgba1_h, 0);
case 8:
vst4_u8(outptr0, rgba0_l);
vst4_u8(outptr1, rgba1_l);
break;
case 7:
vst4_lane_u8(outptr0 + 6 * RGB_PIXELSIZE, rgba0_l, 6);
vst4_lane_u8(outptr1 + 6 * RGB_PIXELSIZE, rgba1_l, 6);
case 6:
vst4_lane_u8(outptr0 + 5 * RGB_PIXELSIZE, rgba0_l, 5);
vst4_lane_u8(outptr1 + 5 * RGB_PIXELSIZE, rgba1_l, 5);
case 5:
vst4_lane_u8(outptr0 + 4 * RGB_PIXELSIZE, rgba0_l, 4);
vst4_lane_u8(outptr1 + 4 * RGB_PIXELSIZE, rgba1_l, 4);
case 4:
vst4_lane_u8(outptr0 + 3 * RGB_PIXELSIZE, rgba0_l, 3);
vst4_lane_u8(outptr1 + 3 * RGB_PIXELSIZE, rgba1_l, 3);
case 3:
vst4_lane_u8(outptr0 + 2 * RGB_PIXELSIZE, rgba0_l, 2);
vst4_lane_u8(outptr1 + 2 * RGB_PIXELSIZE, rgba1_l, 2);
case 2:
vst4_lane_u8(outptr0 + 1 * RGB_PIXELSIZE, rgba0_l, 1);
vst4_lane_u8(outptr1 + 1 * RGB_PIXELSIZE, rgba1_l, 1);
case 1:
vst4_lane_u8(outptr0, rgba0_l, 0);
vst4_lane_u8(outptr1, rgba1_l, 0);
default:
break;
}
#else
uint8x8x3_t rgb0_h, rgb1_h;
rgb0_h.val[RGB_RED] = r0.val[1];
rgb1_h.val[RGB_RED] = r1.val[1];
rgb0_h.val[RGB_GREEN] = g0.val[1];
rgb1_h.val[RGB_GREEN] = g1.val[1];
rgb0_h.val[RGB_BLUE] = b0.val[1];
rgb1_h.val[RGB_BLUE] = b1.val[1];
uint8x8x3_t rgb0_l, rgb1_l;
rgb0_l.val[RGB_RED] = r0.val[0];
rgb1_l.val[RGB_RED] = r1.val[0];
rgb0_l.val[RGB_GREEN] = g0.val[0];
rgb1_l.val[RGB_GREEN] = g1.val[0];
rgb0_l.val[RGB_BLUE] = b0.val[0];
rgb1_l.val[RGB_BLUE] = b1.val[0];
/* Store RGB pixel data to memory. */
switch (cols_remaining) {
case 15:
vst3_lane_u8(outptr0 + 14 * RGB_PIXELSIZE, rgb0_h, 6);
vst3_lane_u8(outptr1 + 14 * RGB_PIXELSIZE, rgb1_h, 6);
case 14:
vst3_lane_u8(outptr0 + 13 * RGB_PIXELSIZE, rgb0_h, 5);
vst3_lane_u8(outptr1 + 13 * RGB_PIXELSIZE, rgb1_h, 5);
case 13:
vst3_lane_u8(outptr0 + 12 * RGB_PIXELSIZE, rgb0_h, 4);
vst3_lane_u8(outptr1 + 12 * RGB_PIXELSIZE, rgb1_h, 4);
case 12:
vst3_lane_u8(outptr0 + 11 * RGB_PIXELSIZE, rgb0_h, 3);
vst3_lane_u8(outptr1 + 11 * RGB_PIXELSIZE, rgb1_h, 3);
case 11:
vst3_lane_u8(outptr0 + 10 * RGB_PIXELSIZE, rgb0_h, 2);
vst3_lane_u8(outptr1 + 10 * RGB_PIXELSIZE, rgb1_h, 2);
case 10:
vst3_lane_u8(outptr0 + 9 * RGB_PIXELSIZE, rgb0_h, 1);
vst3_lane_u8(outptr1 + 9 * RGB_PIXELSIZE, rgb1_h, 1);
case 9:
vst3_lane_u8(outptr0 + 8 * RGB_PIXELSIZE, rgb0_h, 0);
vst3_lane_u8(outptr1 + 8 * RGB_PIXELSIZE, rgb1_h, 0);
case 8:
vst3_u8(outptr0, rgb0_l);
vst3_u8(outptr1, rgb1_l);
break;
case 7:
vst3_lane_u8(outptr0 + 6 * RGB_PIXELSIZE, rgb0_l, 6);
vst3_lane_u8(outptr1 + 6 * RGB_PIXELSIZE, rgb1_l, 6);
case 6:
vst3_lane_u8(outptr0 + 5 * RGB_PIXELSIZE, rgb0_l, 5);
vst3_lane_u8(outptr1 + 5 * RGB_PIXELSIZE, rgb1_l, 5);
case 5:
vst3_lane_u8(outptr0 + 4 * RGB_PIXELSIZE, rgb0_l, 4);
vst3_lane_u8(outptr1 + 4 * RGB_PIXELSIZE, rgb1_l, 4);
case 4:
vst3_lane_u8(outptr0 + 3 * RGB_PIXELSIZE, rgb0_l, 3);
vst3_lane_u8(outptr1 + 3 * RGB_PIXELSIZE, rgb1_l, 3);
case 3:
vst3_lane_u8(outptr0 + 2 * RGB_PIXELSIZE, rgb0_l, 2);
vst3_lane_u8(outptr1 + 2 * RGB_PIXELSIZE, rgb1_l, 2);
case 2:
vst3_lane_u8(outptr0 + 1 * RGB_PIXELSIZE, rgb0_l, 1);
vst3_lane_u8(outptr1 + 1 * RGB_PIXELSIZE, rgb1_l, 1);
case 1:
vst3_lane_u8(outptr0, rgb0_l, 0);
vst3_lane_u8(outptr1, rgb1_l, 0);
default:
break;
}
#endif
}
}

569
simd/arm/jdsample-neon.c Normal file
View File

@@ -0,0 +1,569 @@
/*
* jdsample-neon.c - upsampling (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include <arm_neon.h>
/* The diagram below shows a row of samples produced by h2v1 downsampling.
*
* s0 s1 s2
* +---------+---------+---------+
* | | | |
* | p0 p1 | p2 p3 | p4 p5 |
* | | | |
* +---------+---------+---------+
*
* Samples s0-s2 were created by averaging the original pixel component values
* centered at positions p0-p5 above. To approximate those original pixel
* component values, we proportionally blend the adjacent samples in each row.
*
* An upsampled pixel component value is computed by blending the sample
* containing the pixel center with the nearest neighboring sample, in the
* ratio 3:1. For example:
* p1(upsampled) = 3/4 * s0 + 1/4 * s1
* p2(upsampled) = 3/4 * s1 + 1/4 * s0
* When computing the first and last pixel component values in the row, there
* is no adjacent sample to blend, so:
* p0(upsampled) = s0
* p5(upsampled) = s2
*/
void jsimd_h2v1_fancy_upsample_neon(int max_v_samp_factor,
JDIMENSION downsampled_width,
JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr)
{
JSAMPARRAY output_data = *output_data_ptr;
JSAMPROW inptr, outptr;
int inrow;
unsigned colctr;
/* Set up constants. */
const uint16x8_t one_u16 = vdupq_n_u16(1);
const uint8x8_t three_u8 = vdup_n_u8(3);
for (inrow = 0; inrow < max_v_samp_factor; inrow++) {
inptr = input_data[inrow];
outptr = output_data[inrow];
/* First pixel component value in this row of the original image */
*outptr = (JSAMPLE)GETJSAMPLE(*inptr);
/* 3/4 * containing sample + 1/4 * nearest neighboring sample
* For p1: containing sample = s0, nearest neighboring sample = s1
* For p2: containing sample = s1, nearest neighboring sample = s0
*/
uint8x16_t s0 = vld1q_u8(inptr);
uint8x16_t s1 = vld1q_u8(inptr + 1);
/* Multiplication makes vectors twice as wide. '_l' and '_h' suffixes
* denote low half and high half respectively.
*/
uint16x8_t s1_add_3s0_l =
vmlal_u8(vmovl_u8(vget_low_u8(s1)), vget_low_u8(s0), three_u8);
uint16x8_t s1_add_3s0_h =
vmlal_u8(vmovl_u8(vget_high_u8(s1)), vget_high_u8(s0), three_u8);
uint16x8_t s0_add_3s1_l =
vmlal_u8(vmovl_u8(vget_low_u8(s0)), vget_low_u8(s1), three_u8);
uint16x8_t s0_add_3s1_h =
vmlal_u8(vmovl_u8(vget_high_u8(s0)), vget_high_u8(s1), three_u8);
/* Add ordered dithering bias to odd pixel values. */
s0_add_3s1_l = vaddq_u16(s0_add_3s1_l, one_u16);
s0_add_3s1_h = vaddq_u16(s0_add_3s1_h, one_u16);
/* The offset is initially 1, because the first pixel component has already
* been stored. However, in subsequent iterations of the SIMD loop, this
* offset is (2 * colctr - 1) to stay within the bounds of the sample
* buffers without having to resort to a slow scalar tail case for the last
* (downsampled_width % 16) samples. See "Creation of 2-D sample arrays"
* in jmemmgr.c for more details.
*/
unsigned outptr_offset = 1;
uint8x16x2_t output_pixels;
/* We use software pipelining to maximise performance. The code indented
* an extra two spaces begins the next iteration of the loop.
*/
for (colctr = 16; colctr < downsampled_width; colctr += 16) {
s0 = vld1q_u8(inptr + colctr - 1);
s1 = vld1q_u8(inptr + colctr);
/* Right-shift by 2 (divide by 4), narrow to 8-bit, and combine. */
output_pixels.val[0] = vcombine_u8(vrshrn_n_u16(s1_add_3s0_l, 2),
vrshrn_n_u16(s1_add_3s0_h, 2));
output_pixels.val[1] = vcombine_u8(vshrn_n_u16(s0_add_3s1_l, 2),
vshrn_n_u16(s0_add_3s1_h, 2));
/* Multiplication makes vectors twice as wide. '_l' and '_h' suffixes
* denote low half and high half respectively.
*/
s1_add_3s0_l =
vmlal_u8(vmovl_u8(vget_low_u8(s1)), vget_low_u8(s0), three_u8);
s1_add_3s0_h =
vmlal_u8(vmovl_u8(vget_high_u8(s1)), vget_high_u8(s0), three_u8);
s0_add_3s1_l =
vmlal_u8(vmovl_u8(vget_low_u8(s0)), vget_low_u8(s1), three_u8);
s0_add_3s1_h =
vmlal_u8(vmovl_u8(vget_high_u8(s0)), vget_high_u8(s1), three_u8);
/* Add ordered dithering bias to odd pixel values. */
s0_add_3s1_l = vaddq_u16(s0_add_3s1_l, one_u16);
s0_add_3s1_h = vaddq_u16(s0_add_3s1_h, one_u16);
/* Store pixel component values to memory. */
vst2q_u8(outptr + outptr_offset, output_pixels);
outptr_offset = 2 * colctr - 1;
}
/* Complete the last iteration of the loop. */
/* Right-shift by 2 (divide by 4), narrow to 8-bit, and combine. */
output_pixels.val[0] = vcombine_u8(vrshrn_n_u16(s1_add_3s0_l, 2),
vrshrn_n_u16(s1_add_3s0_h, 2));
output_pixels.val[1] = vcombine_u8(vshrn_n_u16(s0_add_3s1_l, 2),
vshrn_n_u16(s0_add_3s1_h, 2));
/* Store pixel component values to memory. */
vst2q_u8(outptr + outptr_offset, output_pixels);
/* Last pixel component value in this row of the original image */
outptr[2 * downsampled_width - 1] =
GETJSAMPLE(inptr[downsampled_width - 1]);
}
}
/* The diagram below shows an array of samples produced by h2v2 downsampling.
*
* s0 s1 s2
* +---------+---------+---------+
* | p0 p1 | p2 p3 | p4 p5 |
* sA | | | |
* | p6 p7 | p8 p9 | p10 p11|
* +---------+---------+---------+
* | p12 p13| p14 p15| p16 p17|
* sB | | | |
* | p18 p19| p20 p21| p22 p23|
* +---------+---------+---------+
* | p24 p25| p26 p27| p28 p29|
* sC | | | |
* | p30 p31| p32 p33| p34 p35|
* +---------+---------+---------+
*
* Samples s0A-s2C were created by averaging the original pixel component
* values centered at positions p0-p35 above. To approximate one of those
* original pixel component values, we proportionally blend the sample
* containing the pixel center with the nearest neighboring samples in each
* row, column, and diagonal.
*
* An upsampled pixel component value is computed by first blending the sample
* containing the pixel center with the nearest neighboring samples in the
* same column, in the ratio 3:1, and then blending each column sum with the
* nearest neighboring column sum, in the ratio 3:1. For example:
* p14(upsampled) = 3/4 * (3/4 * s1B + 1/4 * s1A) +
* 1/4 * (3/4 * s0B + 1/4 * s0A)
* = 9/16 * s1B + 3/16 * s1A + 3/16 * s0B + 1/16 * s0A
* When computing the first and last pixel component values in the row, there
* is no horizontally adjacent sample to blend, so:
* p12(upsampled) = 3/4 * s0B + 1/4 * s0A
* p23(upsampled) = 3/4 * s2B + 1/4 * s2C
* When computing the first and last pixel component values in the column,
* there is no vertically adjacent sample to blend, so:
* p2(upsampled) = 3/4 * s1A + 1/4 * s0A
* p33(upsampled) = 3/4 * s1C + 1/4 * s2C
* When computing the corner pixel component values, there is no adjacent
* sample to blend, so:
* p0(upsampled) = s0A
* p35(upsampled) = s2C
*/
void jsimd_h2v2_fancy_upsample_neon(int max_v_samp_factor,
JDIMENSION downsampled_width,
JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr)
{
JSAMPARRAY output_data = *output_data_ptr;
JSAMPROW inptr0, inptr1, inptr2, outptr0, outptr1;
int inrow, outrow;
unsigned colctr;
/* Set up constants. */
const uint16x8_t seven_u16 = vdupq_n_u16(7);
const uint8x8_t three_u8 = vdup_n_u8(3);
const uint16x8_t three_u16 = vdupq_n_u16(3);
inrow = outrow = 0;
while (outrow < max_v_samp_factor) {
inptr0 = input_data[inrow - 1];
inptr1 = input_data[inrow];
inptr2 = input_data[inrow + 1];
/* Suffixes 0 and 1 denote the upper and lower rows of output pixels,
* respectively.
*/
outptr0 = output_data[outrow++];
outptr1 = output_data[outrow++];
/* First pixel component value in this row of the original image */
int s0colsum0 = GETJSAMPLE(*inptr1) * 3 + GETJSAMPLE(*inptr0);
*outptr0 = (JSAMPLE)((s0colsum0 * 4 + 8) >> 4);
int s0colsum1 = GETJSAMPLE(*inptr1) * 3 + GETJSAMPLE(*inptr2);
*outptr1 = (JSAMPLE)((s0colsum1 * 4 + 8) >> 4);
/* Step 1: Blend samples vertically in columns s0 and s1.
* Leave the divide by 4 until the end, when it can be done for both
* dimensions at once, right-shifting by 4.
*/
/* Load and compute s0colsum0 and s0colsum1. */
uint8x16_t s0A = vld1q_u8(inptr0);
uint8x16_t s0B = vld1q_u8(inptr1);
uint8x16_t s0C = vld1q_u8(inptr2);
/* Multiplication makes vectors twice as wide. '_l' and '_h' suffixes
* denote low half and high half respectively.
*/
uint16x8_t s0colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s0A)),
vget_low_u8(s0B), three_u8);
uint16x8_t s0colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s0A)),
vget_high_u8(s0B), three_u8);
uint16x8_t s0colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0C)),
vget_low_u8(s0B), three_u8);
uint16x8_t s0colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0C)),
vget_high_u8(s0B), three_u8);
/* Load and compute s1colsum0 and s1colsum1. */
uint8x16_t s1A = vld1q_u8(inptr0 + 1);
uint8x16_t s1B = vld1q_u8(inptr1 + 1);
uint8x16_t s1C = vld1q_u8(inptr2 + 1);
uint16x8_t s1colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1A)),
vget_low_u8(s1B), three_u8);
uint16x8_t s1colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1A)),
vget_high_u8(s1B), three_u8);
uint16x8_t s1colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s1C)),
vget_low_u8(s1B), three_u8);
uint16x8_t s1colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s1C)),
vget_high_u8(s1B), three_u8);
/* Step 2: Blend the already-blended columns. */
uint16x8_t output0_p1_l = vmlaq_u16(s1colsum0_l, s0colsum0_l, three_u16);
uint16x8_t output0_p1_h = vmlaq_u16(s1colsum0_h, s0colsum0_h, three_u16);
uint16x8_t output0_p2_l = vmlaq_u16(s0colsum0_l, s1colsum0_l, three_u16);
uint16x8_t output0_p2_h = vmlaq_u16(s0colsum0_h, s1colsum0_h, three_u16);
uint16x8_t output1_p1_l = vmlaq_u16(s1colsum1_l, s0colsum1_l, three_u16);
uint16x8_t output1_p1_h = vmlaq_u16(s1colsum1_h, s0colsum1_h, three_u16);
uint16x8_t output1_p2_l = vmlaq_u16(s0colsum1_l, s1colsum1_l, three_u16);
uint16x8_t output1_p2_h = vmlaq_u16(s0colsum1_h, s1colsum1_h, three_u16);
/* Add ordered dithering bias to odd pixel values. */
output0_p1_l = vaddq_u16(output0_p1_l, seven_u16);
output0_p1_h = vaddq_u16(output0_p1_h, seven_u16);
output1_p1_l = vaddq_u16(output1_p1_l, seven_u16);
output1_p1_h = vaddq_u16(output1_p1_h, seven_u16);
/* Right-shift by 4 (divide by 16), narrow to 8-bit, and combine. */
uint8x16x2_t output_pixels0 = { {
vcombine_u8(vshrn_n_u16(output0_p1_l, 4), vshrn_n_u16(output0_p1_h, 4)),
vcombine_u8(vrshrn_n_u16(output0_p2_l, 4), vrshrn_n_u16(output0_p2_h, 4))
} };
uint8x16x2_t output_pixels1 = { {
vcombine_u8(vshrn_n_u16(output1_p1_l, 4), vshrn_n_u16(output1_p1_h, 4)),
vcombine_u8(vrshrn_n_u16(output1_p2_l, 4), vrshrn_n_u16(output1_p2_h, 4))
} };
/* Store pixel component values to memory.
* The minimum size of the output buffer for each row is 64 bytes => no
* need to worry about buffer overflow here. See "Creation of 2-D sample
* arrays" in jmemmgr.c for more details.
*/
vst2q_u8(outptr0 + 1, output_pixels0);
vst2q_u8(outptr1 + 1, output_pixels1);
/* The first pixel of the image shifted our loads and stores by one byte.
* We have to re-align on a 32-byte boundary at some point before the end
* of the row (we do it now on the 32/33 pixel boundary) to stay within the
* bounds of the sample buffers without having to resort to a slow scalar
* tail case for the last (downsampled_width % 16) samples. See "Creation
* of 2-D sample arrays" in jmemmgr.c for more details.
*/
for (colctr = 16; colctr < downsampled_width; colctr += 16) {
/* Step 1: Blend samples vertically in columns s0 and s1. */
/* Load and compute s0colsum0 and s0colsum1. */
s0A = vld1q_u8(inptr0 + colctr - 1);
s0B = vld1q_u8(inptr1 + colctr - 1);
s0C = vld1q_u8(inptr2 + colctr - 1);
s0colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s0A)), vget_low_u8(s0B),
three_u8);
s0colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s0A)), vget_high_u8(s0B),
three_u8);
s0colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s0C)), vget_low_u8(s0B),
three_u8);
s0colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s0C)), vget_high_u8(s0B),
three_u8);
/* Load and compute s1colsum0 and s1colsum1. */
s1A = vld1q_u8(inptr0 + colctr);
s1B = vld1q_u8(inptr1 + colctr);
s1C = vld1q_u8(inptr2 + colctr);
s1colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(s1A)), vget_low_u8(s1B),
three_u8);
s1colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(s1A)), vget_high_u8(s1B),
three_u8);
s1colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(s1C)), vget_low_u8(s1B),
three_u8);
s1colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(s1C)), vget_high_u8(s1B),
three_u8);
/* Step 2: Blend the already-blended columns. */
output0_p1_l = vmlaq_u16(s1colsum0_l, s0colsum0_l, three_u16);
output0_p1_h = vmlaq_u16(s1colsum0_h, s0colsum0_h, three_u16);
output0_p2_l = vmlaq_u16(s0colsum0_l, s1colsum0_l, three_u16);
output0_p2_h = vmlaq_u16(s0colsum0_h, s1colsum0_h, three_u16);
output1_p1_l = vmlaq_u16(s1colsum1_l, s0colsum1_l, three_u16);
output1_p1_h = vmlaq_u16(s1colsum1_h, s0colsum1_h, three_u16);
output1_p2_l = vmlaq_u16(s0colsum1_l, s1colsum1_l, three_u16);
output1_p2_h = vmlaq_u16(s0colsum1_h, s1colsum1_h, three_u16);
/* Add ordered dithering bias to odd pixel values. */
output0_p1_l = vaddq_u16(output0_p1_l, seven_u16);
output0_p1_h = vaddq_u16(output0_p1_h, seven_u16);
output1_p1_l = vaddq_u16(output1_p1_l, seven_u16);
output1_p1_h = vaddq_u16(output1_p1_h, seven_u16);
/* Right-shift by 4 (divide by 16), narrow to 8-bit, and combine. */
output_pixels0.val[0] = vcombine_u8(vshrn_n_u16(output0_p1_l, 4),
vshrn_n_u16(output0_p1_h, 4));
output_pixels0.val[1] = vcombine_u8(vrshrn_n_u16(output0_p2_l, 4),
vrshrn_n_u16(output0_p2_h, 4));
output_pixels1.val[0] = vcombine_u8(vshrn_n_u16(output1_p1_l, 4),
vshrn_n_u16(output1_p1_h, 4));
output_pixels1.val[1] = vcombine_u8(vrshrn_n_u16(output1_p2_l, 4),
vrshrn_n_u16(output1_p2_h, 4));
/* Store pixel component values to memory. */
vst2q_u8(outptr0 + 2 * colctr - 1, output_pixels0);
vst2q_u8(outptr1 + 2 * colctr - 1, output_pixels1);
}
/* Last pixel component value in this row of the original image */
int s1colsum0 = GETJSAMPLE(inptr1[downsampled_width - 1]) * 3 +
GETJSAMPLE(inptr0[downsampled_width - 1]);
outptr0[2 * downsampled_width - 1] = (JSAMPLE)((s1colsum0 * 4 + 7) >> 4);
int s1colsum1 = GETJSAMPLE(inptr1[downsampled_width - 1]) * 3 +
GETJSAMPLE(inptr2[downsampled_width - 1]);
outptr1[2 * downsampled_width - 1] = (JSAMPLE)((s1colsum1 * 4 + 7) >> 4);
inrow++;
}
}
/* The diagram below shows a column of samples produced by h1v2 downsampling
* (or by losslessly rotating or transposing an h2v1-downsampled image.)
*
* +---------+
* | p0 |
* sA | |
* | p1 |
* +---------+
* | p2 |
* sB | |
* | p3 |
* +---------+
* | p4 |
* sC | |
* | p5 |
* +---------+
*
* Samples sA-sC were created by averaging the original pixel component values
* centered at positions p0-p5 above. To approximate those original pixel
* component values, we proportionally blend the adjacent samples in each
* column.
*
* An upsampled pixel component value is computed by blending the sample
* containing the pixel center with the nearest neighboring sample, in the
* ratio 3:1. For example:
* p1(upsampled) = 3/4 * sA + 1/4 * sB
* p2(upsampled) = 3/4 * sB + 1/4 * sA
* When computing the first and last pixel component values in the column,
* there is no adjacent sample to blend, so:
* p0(upsampled) = sA
* p5(upsampled) = sC
*/
void jsimd_h1v2_fancy_upsample_neon(int max_v_samp_factor,
JDIMENSION downsampled_width,
JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr)
{
JSAMPARRAY output_data = *output_data_ptr;
JSAMPROW inptr0, inptr1, inptr2, outptr0, outptr1;
int inrow, outrow;
unsigned colctr;
/* Set up constants. */
const uint16x8_t one_u16 = vdupq_n_u16(1);
const uint8x8_t three_u8 = vdup_n_u8(3);
inrow = outrow = 0;
while (outrow < max_v_samp_factor) {
inptr0 = input_data[inrow - 1];
inptr1 = input_data[inrow];
inptr2 = input_data[inrow + 1];
/* Suffixes 0 and 1 denote the upper and lower rows of output pixels,
* respectively.
*/
outptr0 = output_data[outrow++];
outptr1 = output_data[outrow++];
inrow++;
/* The size of the input and output buffers is always a multiple of 32
* bytes => no need to worry about buffer overflow when reading/writing
* memory. See "Creation of 2-D sample arrays" in jmemmgr.c for more
* details.
*/
for (colctr = 0; colctr < downsampled_width; colctr += 16) {
/* Load samples. */
uint8x16_t sA = vld1q_u8(inptr0 + colctr);
uint8x16_t sB = vld1q_u8(inptr1 + colctr);
uint8x16_t sC = vld1q_u8(inptr2 + colctr);
/* Blend samples vertically. */
uint16x8_t colsum0_l = vmlal_u8(vmovl_u8(vget_low_u8(sA)),
vget_low_u8(sB), three_u8);
uint16x8_t colsum0_h = vmlal_u8(vmovl_u8(vget_high_u8(sA)),
vget_high_u8(sB), three_u8);
uint16x8_t colsum1_l = vmlal_u8(vmovl_u8(vget_low_u8(sC)),
vget_low_u8(sB), three_u8);
uint16x8_t colsum1_h = vmlal_u8(vmovl_u8(vget_high_u8(sC)),
vget_high_u8(sB), three_u8);
/* Add ordered dithering bias to pixel values in even output rows. */
colsum0_l = vaddq_u16(colsum0_l, one_u16);
colsum0_h = vaddq_u16(colsum0_h, one_u16);
/* Right-shift by 2 (divide by 4), narrow to 8-bit, and combine. */
uint8x16_t output_pixels0 = vcombine_u8(vshrn_n_u16(colsum0_l, 2),
vshrn_n_u16(colsum0_h, 2));
uint8x16_t output_pixels1 = vcombine_u8(vrshrn_n_u16(colsum1_l, 2),
vrshrn_n_u16(colsum1_h, 2));
/* Store pixel component values to memory. */
vst1q_u8(outptr0 + colctr, output_pixels0);
vst1q_u8(outptr1 + colctr, output_pixels1);
}
}
}
/* The diagram below shows a row of samples produced by h2v1 downsampling.
*
* s0 s1
* +---------+---------+
* | | |
* | p0 p1 | p2 p3 |
* | | |
* +---------+---------+
*
* Samples s0 and s1 were created by averaging the original pixel component
* values centered at positions p0-p3 above. To approximate those original
* pixel component values, we duplicate the samples horizontally:
* p0(upsampled) = p1(upsampled) = s0
* p2(upsampled) = p3(upsampled) = s1
*/
void jsimd_h2v1_upsample_neon(int max_v_samp_factor, JDIMENSION output_width,
JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr)
{
JSAMPARRAY output_data = *output_data_ptr;
JSAMPROW inptr, outptr;
int inrow;
unsigned colctr;
for (inrow = 0; inrow < max_v_samp_factor; inrow++) {
inptr = input_data[inrow];
outptr = output_data[inrow];
for (colctr = 0; 2 * colctr < output_width; colctr += 16) {
uint8x16_t samples = vld1q_u8(inptr + colctr);
/* Duplicate the samples. The store operation below interleaves them so
* that adjacent pixel component values take on the same sample value,
* per above.
*/
uint8x16x2_t output_pixels = { { samples, samples } };
/* Store pixel component values to memory.
* Due to the way sample buffers are allocated, we don't need to worry
* about tail cases when output_width is not a multiple of 32. See
* "Creation of 2-D sample arrays" in jmemmgr.c for details.
*/
vst2q_u8(outptr + 2 * colctr, output_pixels);
}
}
}
/* The diagram below shows an array of samples produced by h2v2 downsampling.
*
* s0 s1
* +---------+---------+
* | p0 p1 | p2 p3 |
* sA | | |
* | p4 p5 | p6 p7 |
* +---------+---------+
* | p8 p9 | p10 p11|
* sB | | |
* | p12 p13| p14 p15|
* +---------+---------+
*
* Samples s0A-s1B were created by averaging the original pixel component
* values centered at positions p0-p15 above. To approximate those original
* pixel component values, we duplicate the samples both horizontally and
* vertically:
* p0(upsampled) = p1(upsampled) = p4(upsampled) = p5(upsampled) = s0A
* p2(upsampled) = p3(upsampled) = p6(upsampled) = p7(upsampled) = s1A
* p8(upsampled) = p9(upsampled) = p12(upsampled) = p13(upsampled) = s0B
* p10(upsampled) = p11(upsampled) = p14(upsampled) = p15(upsampled) = s1B
*/
void jsimd_h2v2_upsample_neon(int max_v_samp_factor, JDIMENSION output_width,
JSAMPARRAY input_data,
JSAMPARRAY *output_data_ptr)
{
JSAMPARRAY output_data = *output_data_ptr;
JSAMPROW inptr, outptr0, outptr1;
int inrow, outrow;
unsigned colctr;
for (inrow = 0, outrow = 0; outrow < max_v_samp_factor; inrow++) {
inptr = input_data[inrow];
outptr0 = output_data[outrow++];
outptr1 = output_data[outrow++];
for (colctr = 0; 2 * colctr < output_width; colctr += 16) {
uint8x16_t samples = vld1q_u8(inptr + colctr);
/* Duplicate the samples. The store operation below interleaves them so
* that adjacent pixel component values take on the same sample value,
* per above.
*/
uint8x16x2_t output_pixels = { { samples, samples } };
/* Store pixel component values for both output rows to memory.
* Due to the way sample buffers are allocated, we don't need to worry
* about tail cases when output_width is not a multiple of 32. See
* "Creation of 2-D sample arrays" in jmemmgr.c for details.
*/
vst2q_u8(outptr0 + 2 * colctr, output_pixels);
vst2q_u8(outptr1 + 2 * colctr, output_pixels);
}
}
}

214
simd/arm/jfdctfst-neon.c Normal file
View File

@@ -0,0 +1,214 @@
/*
* jfdctfst-neon.c - fast integer FDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include <arm_neon.h>
/* jsimd_fdct_ifast_neon() performs a fast, not so accurate forward DCT
* (Discrete Cosine Transform) on one block of samples. It uses the same
* calculations and produces exactly the same output as IJG's original
* jpeg_fdct_ifast() function, which can be found in jfdctfst.c.
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.382683433 = 12544 * 2^-15
* 0.541196100 = 17795 * 2^-15
* 0.707106781 = 23168 * 2^-15
* 0.306562965 = 9984 * 2^-15
*
* See jfdctfst.c for further details of the DCT algorithm. Where possible,
* the variable names and comments here in jsimd_fdct_ifast_neon() match up
* with those in jpeg_fdct_ifast().
*/
#define F_0_382 12544
#define F_0_541 17792
#define F_0_707 23168
#define F_0_306 9984
ALIGN(16) static const int16_t jsimd_fdct_ifast_neon_consts[] = {
F_0_382, F_0_541, F_0_707, F_0_306
};
void jsimd_fdct_ifast_neon(DCTELEM *data)
{
/* Load an 8x8 block of samples into Neon registers. De-interleaving loads
* are used, followed by vuzp to transpose the block such that we have a
* column of samples per vector - allowing all rows to be processed at once.
*/
int16x8x4_t data1 = vld4q_s16(data);
int16x8x4_t data2 = vld4q_s16(data + 4 * DCTSIZE);
int16x8x2_t cols_04 = vuzpq_s16(data1.val[0], data2.val[0]);
int16x8x2_t cols_15 = vuzpq_s16(data1.val[1], data2.val[1]);
int16x8x2_t cols_26 = vuzpq_s16(data1.val[2], data2.val[2]);
int16x8x2_t cols_37 = vuzpq_s16(data1.val[3], data2.val[3]);
int16x8_t col0 = cols_04.val[0];
int16x8_t col1 = cols_15.val[0];
int16x8_t col2 = cols_26.val[0];
int16x8_t col3 = cols_37.val[0];
int16x8_t col4 = cols_04.val[1];
int16x8_t col5 = cols_15.val[1];
int16x8_t col6 = cols_26.val[1];
int16x8_t col7 = cols_37.val[1];
/* Pass 1: process rows. */
/* Load DCT conversion constants. */
const int16x4_t consts = vld1_s16(jsimd_fdct_ifast_neon_consts);
int16x8_t tmp0 = vaddq_s16(col0, col7);
int16x8_t tmp7 = vsubq_s16(col0, col7);
int16x8_t tmp1 = vaddq_s16(col1, col6);
int16x8_t tmp6 = vsubq_s16(col1, col6);
int16x8_t tmp2 = vaddq_s16(col2, col5);
int16x8_t tmp5 = vsubq_s16(col2, col5);
int16x8_t tmp3 = vaddq_s16(col3, col4);
int16x8_t tmp4 = vsubq_s16(col3, col4);
/* Even part */
int16x8_t tmp10 = vaddq_s16(tmp0, tmp3); /* phase 2 */
int16x8_t tmp13 = vsubq_s16(tmp0, tmp3);
int16x8_t tmp11 = vaddq_s16(tmp1, tmp2);
int16x8_t tmp12 = vsubq_s16(tmp1, tmp2);
col0 = vaddq_s16(tmp10, tmp11); /* phase 3 */
col4 = vsubq_s16(tmp10, tmp11);
int16x8_t z1 = vqdmulhq_lane_s16(vaddq_s16(tmp12, tmp13), consts, 2);
col2 = vaddq_s16(tmp13, z1); /* phase 5 */
col6 = vsubq_s16(tmp13, z1);
/* Odd part */
tmp10 = vaddq_s16(tmp4, tmp5); /* phase 2 */
tmp11 = vaddq_s16(tmp5, tmp6);
tmp12 = vaddq_s16(tmp6, tmp7);
int16x8_t z5 = vqdmulhq_lane_s16(vsubq_s16(tmp10, tmp12), consts, 0);
int16x8_t z2 = vqdmulhq_lane_s16(tmp10, consts, 1);
z2 = vaddq_s16(z2, z5);
int16x8_t z4 = vqdmulhq_lane_s16(tmp12, consts, 3);
z5 = vaddq_s16(tmp12, z5);
z4 = vaddq_s16(z4, z5);
int16x8_t z3 = vqdmulhq_lane_s16(tmp11, consts, 2);
int16x8_t z11 = vaddq_s16(tmp7, z3); /* phase 5 */
int16x8_t z13 = vsubq_s16(tmp7, z3);
col5 = vaddq_s16(z13, z2); /* phase 6 */
col3 = vsubq_s16(z13, z2);
col1 = vaddq_s16(z11, z4);
col7 = vsubq_s16(z11, z4);
/* Transpose to work on columns in pass 2. */
int16x8x2_t cols_01 = vtrnq_s16(col0, col1);
int16x8x2_t cols_23 = vtrnq_s16(col2, col3);
int16x8x2_t cols_45 = vtrnq_s16(col4, col5);
int16x8x2_t cols_67 = vtrnq_s16(col6, col7);
int32x4x2_t cols_0145_l = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[0]),
vreinterpretq_s32_s16(cols_45.val[0]));
int32x4x2_t cols_0145_h = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[1]),
vreinterpretq_s32_s16(cols_45.val[1]));
int32x4x2_t cols_2367_l = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[0]),
vreinterpretq_s32_s16(cols_67.val[0]));
int32x4x2_t cols_2367_h = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[1]),
vreinterpretq_s32_s16(cols_67.val[1]));
int32x4x2_t rows_04 = vzipq_s32(cols_0145_l.val[0], cols_2367_l.val[0]);
int32x4x2_t rows_15 = vzipq_s32(cols_0145_h.val[0], cols_2367_h.val[0]);
int32x4x2_t rows_26 = vzipq_s32(cols_0145_l.val[1], cols_2367_l.val[1]);
int32x4x2_t rows_37 = vzipq_s32(cols_0145_h.val[1], cols_2367_h.val[1]);
int16x8_t row0 = vreinterpretq_s16_s32(rows_04.val[0]);
int16x8_t row1 = vreinterpretq_s16_s32(rows_15.val[0]);
int16x8_t row2 = vreinterpretq_s16_s32(rows_26.val[0]);
int16x8_t row3 = vreinterpretq_s16_s32(rows_37.val[0]);
int16x8_t row4 = vreinterpretq_s16_s32(rows_04.val[1]);
int16x8_t row5 = vreinterpretq_s16_s32(rows_15.val[1]);
int16x8_t row6 = vreinterpretq_s16_s32(rows_26.val[1]);
int16x8_t row7 = vreinterpretq_s16_s32(rows_37.val[1]);
/* Pass 2: process columns. */
tmp0 = vaddq_s16(row0, row7);
tmp7 = vsubq_s16(row0, row7);
tmp1 = vaddq_s16(row1, row6);
tmp6 = vsubq_s16(row1, row6);
tmp2 = vaddq_s16(row2, row5);
tmp5 = vsubq_s16(row2, row5);
tmp3 = vaddq_s16(row3, row4);
tmp4 = vsubq_s16(row3, row4);
/* Even part */
tmp10 = vaddq_s16(tmp0, tmp3); /* phase 2 */
tmp13 = vsubq_s16(tmp0, tmp3);
tmp11 = vaddq_s16(tmp1, tmp2);
tmp12 = vsubq_s16(tmp1, tmp2);
row0 = vaddq_s16(tmp10, tmp11); /* phase 3 */
row4 = vsubq_s16(tmp10, tmp11);
z1 = vqdmulhq_lane_s16(vaddq_s16(tmp12, tmp13), consts, 2);
row2 = vaddq_s16(tmp13, z1); /* phase 5 */
row6 = vsubq_s16(tmp13, z1);
/* Odd part */
tmp10 = vaddq_s16(tmp4, tmp5); /* phase 2 */
tmp11 = vaddq_s16(tmp5, tmp6);
tmp12 = vaddq_s16(tmp6, tmp7);
z5 = vqdmulhq_lane_s16(vsubq_s16(tmp10, tmp12), consts, 0);
z2 = vqdmulhq_lane_s16(tmp10, consts, 1);
z2 = vaddq_s16(z2, z5);
z4 = vqdmulhq_lane_s16(tmp12, consts, 3);
z5 = vaddq_s16(tmp12, z5);
z4 = vaddq_s16(z4, z5);
z3 = vqdmulhq_lane_s16(tmp11, consts, 2);
z11 = vaddq_s16(tmp7, z3); /* phase 5 */
z13 = vsubq_s16(tmp7, z3);
row5 = vaddq_s16(z13, z2); /* phase 6 */
row3 = vsubq_s16(z13, z2);
row1 = vaddq_s16(z11, z4);
row7 = vsubq_s16(z11, z4);
vst1q_s16(data + 0 * DCTSIZE, row0);
vst1q_s16(data + 1 * DCTSIZE, row1);
vst1q_s16(data + 2 * DCTSIZE, row2);
vst1q_s16(data + 3 * DCTSIZE, row3);
vst1q_s16(data + 4 * DCTSIZE, row4);
vst1q_s16(data + 5 * DCTSIZE, row5);
vst1q_s16(data + 6 * DCTSIZE, row6);
vst1q_s16(data + 7 * DCTSIZE, row7);
}

376
simd/arm/jfdctint-neon.c Normal file
View File

@@ -0,0 +1,376 @@
/*
* jfdctint-neon.c - accurate integer FDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>
/* jsimd_fdct_islow_neon() performs a slower but more accurate forward DCT
* (Discrete Cosine Transform) on one block of samples. It uses the same
* calculations and produces exactly the same output as IJG's original
* jpeg_fdct_islow() function, which can be found in jfdctint.c.
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.298631336 = 2446 * 2^-13
* 0.390180644 = 3196 * 2^-13
* 0.541196100 = 4433 * 2^-13
* 0.765366865 = 6270 * 2^-13
* 0.899976223 = 7373 * 2^-13
* 1.175875602 = 9633 * 2^-13
* 1.501321110 = 12299 * 2^-13
* 1.847759065 = 15137 * 2^-13
* 1.961570560 = 16069 * 2^-13
* 2.053119869 = 16819 * 2^-13
* 2.562915447 = 20995 * 2^-13
* 3.072711026 = 25172 * 2^-13
*
* See jfdctint.c for further details of the DCT algorithm. Where possible,
* the variable names and comments here in jsimd_fdct_islow_neon() match up
* with those in jpeg_fdct_islow().
*/
#define CONST_BITS 13
#define PASS1_BITS 2
#define DESCALE_P1 (CONST_BITS - PASS1_BITS)
#define DESCALE_P2 (CONST_BITS + PASS1_BITS)
#define F_0_298 2446
#define F_0_390 3196
#define F_0_541 4433
#define F_0_765 6270
#define F_0_899 7373
#define F_1_175 9633
#define F_1_501 12299
#define F_1_847 15137
#define F_1_961 16069
#define F_2_053 16819
#define F_2_562 20995
#define F_3_072 25172
ALIGN(16) static const int16_t jsimd_fdct_islow_neon_consts[] = {
F_0_298, -F_0_390, F_0_541, F_0_765,
-F_0_899, F_1_175, F_1_501, -F_1_847,
-F_1_961, F_2_053, -F_2_562, F_3_072
};
void jsimd_fdct_islow_neon(DCTELEM *data)
{
/* Load DCT constants. */
#ifdef HAVE_VLD1_S16_X3
const int16x4x3_t consts = vld1_s16_x3(jsimd_fdct_islow_neon_consts);
#else
/* GCC does not currently support the intrinsic vld1_<type>_x3(). */
const int16x4_t consts1 = vld1_s16(jsimd_fdct_islow_neon_consts);
const int16x4_t consts2 = vld1_s16(jsimd_fdct_islow_neon_consts + 4);
const int16x4_t consts3 = vld1_s16(jsimd_fdct_islow_neon_consts + 8);
const int16x4x3_t consts = { { consts1, consts2, consts3 } };
#endif
/* Load an 8x8 block of samples into Neon registers. De-interleaving loads
* are used, followed by vuzp to transpose the block such that we have a
* column of samples per vector - allowing all rows to be processed at once.
*/
int16x8x4_t s_rows_0123 = vld4q_s16(data);
int16x8x4_t s_rows_4567 = vld4q_s16(data + 4 * DCTSIZE);
int16x8x2_t cols_04 = vuzpq_s16(s_rows_0123.val[0], s_rows_4567.val[0]);
int16x8x2_t cols_15 = vuzpq_s16(s_rows_0123.val[1], s_rows_4567.val[1]);
int16x8x2_t cols_26 = vuzpq_s16(s_rows_0123.val[2], s_rows_4567.val[2]);
int16x8x2_t cols_37 = vuzpq_s16(s_rows_0123.val[3], s_rows_4567.val[3]);
int16x8_t col0 = cols_04.val[0];
int16x8_t col1 = cols_15.val[0];
int16x8_t col2 = cols_26.val[0];
int16x8_t col3 = cols_37.val[0];
int16x8_t col4 = cols_04.val[1];
int16x8_t col5 = cols_15.val[1];
int16x8_t col6 = cols_26.val[1];
int16x8_t col7 = cols_37.val[1];
/* Pass 1: process rows. */
int16x8_t tmp0 = vaddq_s16(col0, col7);
int16x8_t tmp7 = vsubq_s16(col0, col7);
int16x8_t tmp1 = vaddq_s16(col1, col6);
int16x8_t tmp6 = vsubq_s16(col1, col6);
int16x8_t tmp2 = vaddq_s16(col2, col5);
int16x8_t tmp5 = vsubq_s16(col2, col5);
int16x8_t tmp3 = vaddq_s16(col3, col4);
int16x8_t tmp4 = vsubq_s16(col3, col4);
/* Even part */
int16x8_t tmp10 = vaddq_s16(tmp0, tmp3);
int16x8_t tmp13 = vsubq_s16(tmp0, tmp3);
int16x8_t tmp11 = vaddq_s16(tmp1, tmp2);
int16x8_t tmp12 = vsubq_s16(tmp1, tmp2);
col0 = vshlq_n_s16(vaddq_s16(tmp10, tmp11), PASS1_BITS);
col4 = vshlq_n_s16(vsubq_s16(tmp10, tmp11), PASS1_BITS);
int16x8_t tmp12_add_tmp13 = vaddq_s16(tmp12, tmp13);
int32x4_t z1_l =
vmull_lane_s16(vget_low_s16(tmp12_add_tmp13), consts.val[0], 2);
int32x4_t z1_h =
vmull_lane_s16(vget_high_s16(tmp12_add_tmp13), consts.val[0], 2);
int32x4_t col2_scaled_l =
vmlal_lane_s16(z1_l, vget_low_s16(tmp13), consts.val[0], 3);
int32x4_t col2_scaled_h =
vmlal_lane_s16(z1_h, vget_high_s16(tmp13), consts.val[0], 3);
col2 = vcombine_s16(vrshrn_n_s32(col2_scaled_l, DESCALE_P1),
vrshrn_n_s32(col2_scaled_h, DESCALE_P1));
int32x4_t col6_scaled_l =
vmlal_lane_s16(z1_l, vget_low_s16(tmp12), consts.val[1], 3);
int32x4_t col6_scaled_h =
vmlal_lane_s16(z1_h, vget_high_s16(tmp12), consts.val[1], 3);
col6 = vcombine_s16(vrshrn_n_s32(col6_scaled_l, DESCALE_P1),
vrshrn_n_s32(col6_scaled_h, DESCALE_P1));
/* Odd part */
int16x8_t z1 = vaddq_s16(tmp4, tmp7);
int16x8_t z2 = vaddq_s16(tmp5, tmp6);
int16x8_t z3 = vaddq_s16(tmp4, tmp6);
int16x8_t z4 = vaddq_s16(tmp5, tmp7);
/* sqrt(2) * c3 */
int32x4_t z5_l = vmull_lane_s16(vget_low_s16(z3), consts.val[1], 1);
int32x4_t z5_h = vmull_lane_s16(vget_high_s16(z3), consts.val[1], 1);
z5_l = vmlal_lane_s16(z5_l, vget_low_s16(z4), consts.val[1], 1);
z5_h = vmlal_lane_s16(z5_h, vget_high_s16(z4), consts.val[1], 1);
/* sqrt(2) * (-c1+c3+c5-c7) */
int32x4_t tmp4_l = vmull_lane_s16(vget_low_s16(tmp4), consts.val[0], 0);
int32x4_t tmp4_h = vmull_lane_s16(vget_high_s16(tmp4), consts.val[0], 0);
/* sqrt(2) * ( c1+c3-c5+c7) */
int32x4_t tmp5_l = vmull_lane_s16(vget_low_s16(tmp5), consts.val[2], 1);
int32x4_t tmp5_h = vmull_lane_s16(vget_high_s16(tmp5), consts.val[2], 1);
/* sqrt(2) * ( c1+c3+c5-c7) */
int32x4_t tmp6_l = vmull_lane_s16(vget_low_s16(tmp6), consts.val[2], 3);
int32x4_t tmp6_h = vmull_lane_s16(vget_high_s16(tmp6), consts.val[2], 3);
/* sqrt(2) * ( c1+c3-c5-c7) */
int32x4_t tmp7_l = vmull_lane_s16(vget_low_s16(tmp7), consts.val[1], 2);
int32x4_t tmp7_h = vmull_lane_s16(vget_high_s16(tmp7), consts.val[1], 2);
/* sqrt(2) * (c7-c3) */
z1_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 0);
z1_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 0);
/* sqrt(2) * (-c1-c3) */
int32x4_t z2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[2], 2);
int32x4_t z2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[2], 2);
/* sqrt(2) * (-c3-c5) */
int32x4_t z3_l = vmull_lane_s16(vget_low_s16(z3), consts.val[2], 0);
int32x4_t z3_h = vmull_lane_s16(vget_high_s16(z3), consts.val[2], 0);
/* sqrt(2) * (c5-c3) */
int32x4_t z4_l = vmull_lane_s16(vget_low_s16(z4), consts.val[0], 1);
int32x4_t z4_h = vmull_lane_s16(vget_high_s16(z4), consts.val[0], 1);
z3_l = vaddq_s32(z3_l, z5_l);
z3_h = vaddq_s32(z3_h, z5_h);
z4_l = vaddq_s32(z4_l, z5_l);
z4_h = vaddq_s32(z4_h, z5_h);
tmp4_l = vaddq_s32(tmp4_l, z1_l);
tmp4_h = vaddq_s32(tmp4_h, z1_h);
tmp4_l = vaddq_s32(tmp4_l, z3_l);
tmp4_h = vaddq_s32(tmp4_h, z3_h);
col7 = vcombine_s16(vrshrn_n_s32(tmp4_l, DESCALE_P1),
vrshrn_n_s32(tmp4_h, DESCALE_P1));
tmp5_l = vaddq_s32(tmp5_l, z2_l);
tmp5_h = vaddq_s32(tmp5_h, z2_h);
tmp5_l = vaddq_s32(tmp5_l, z4_l);
tmp5_h = vaddq_s32(tmp5_h, z4_h);
col5 = vcombine_s16(vrshrn_n_s32(tmp5_l, DESCALE_P1),
vrshrn_n_s32(tmp5_h, DESCALE_P1));
tmp6_l = vaddq_s32(tmp6_l, z2_l);
tmp6_h = vaddq_s32(tmp6_h, z2_h);
tmp6_l = vaddq_s32(tmp6_l, z3_l);
tmp6_h = vaddq_s32(tmp6_h, z3_h);
col3 = vcombine_s16(vrshrn_n_s32(tmp6_l, DESCALE_P1),
vrshrn_n_s32(tmp6_h, DESCALE_P1));
tmp7_l = vaddq_s32(tmp7_l, z1_l);
tmp7_h = vaddq_s32(tmp7_h, z1_h);
tmp7_l = vaddq_s32(tmp7_l, z4_l);
tmp7_h = vaddq_s32(tmp7_h, z4_h);
col1 = vcombine_s16(vrshrn_n_s32(tmp7_l, DESCALE_P1),
vrshrn_n_s32(tmp7_h, DESCALE_P1));
/* Transpose to work on columns in pass 2. */
int16x8x2_t cols_01 = vtrnq_s16(col0, col1);
int16x8x2_t cols_23 = vtrnq_s16(col2, col3);
int16x8x2_t cols_45 = vtrnq_s16(col4, col5);
int16x8x2_t cols_67 = vtrnq_s16(col6, col7);
int32x4x2_t cols_0145_l = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[0]),
vreinterpretq_s32_s16(cols_45.val[0]));
int32x4x2_t cols_0145_h = vtrnq_s32(vreinterpretq_s32_s16(cols_01.val[1]),
vreinterpretq_s32_s16(cols_45.val[1]));
int32x4x2_t cols_2367_l = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[0]),
vreinterpretq_s32_s16(cols_67.val[0]));
int32x4x2_t cols_2367_h = vtrnq_s32(vreinterpretq_s32_s16(cols_23.val[1]),
vreinterpretq_s32_s16(cols_67.val[1]));
int32x4x2_t rows_04 = vzipq_s32(cols_0145_l.val[0], cols_2367_l.val[0]);
int32x4x2_t rows_15 = vzipq_s32(cols_0145_h.val[0], cols_2367_h.val[0]);
int32x4x2_t rows_26 = vzipq_s32(cols_0145_l.val[1], cols_2367_l.val[1]);
int32x4x2_t rows_37 = vzipq_s32(cols_0145_h.val[1], cols_2367_h.val[1]);
int16x8_t row0 = vreinterpretq_s16_s32(rows_04.val[0]);
int16x8_t row1 = vreinterpretq_s16_s32(rows_15.val[0]);
int16x8_t row2 = vreinterpretq_s16_s32(rows_26.val[0]);
int16x8_t row3 = vreinterpretq_s16_s32(rows_37.val[0]);
int16x8_t row4 = vreinterpretq_s16_s32(rows_04.val[1]);
int16x8_t row5 = vreinterpretq_s16_s32(rows_15.val[1]);
int16x8_t row6 = vreinterpretq_s16_s32(rows_26.val[1]);
int16x8_t row7 = vreinterpretq_s16_s32(rows_37.val[1]);
/* Pass 2: process columns. */
tmp0 = vaddq_s16(row0, row7);
tmp7 = vsubq_s16(row0, row7);
tmp1 = vaddq_s16(row1, row6);
tmp6 = vsubq_s16(row1, row6);
tmp2 = vaddq_s16(row2, row5);
tmp5 = vsubq_s16(row2, row5);
tmp3 = vaddq_s16(row3, row4);
tmp4 = vsubq_s16(row3, row4);
/* Even part */
tmp10 = vaddq_s16(tmp0, tmp3);
tmp13 = vsubq_s16(tmp0, tmp3);
tmp11 = vaddq_s16(tmp1, tmp2);
tmp12 = vsubq_s16(tmp1, tmp2);
row0 = vrshrq_n_s16(vaddq_s16(tmp10, tmp11), PASS1_BITS);
row4 = vrshrq_n_s16(vsubq_s16(tmp10, tmp11), PASS1_BITS);
tmp12_add_tmp13 = vaddq_s16(tmp12, tmp13);
z1_l = vmull_lane_s16(vget_low_s16(tmp12_add_tmp13), consts.val[0], 2);
z1_h = vmull_lane_s16(vget_high_s16(tmp12_add_tmp13), consts.val[0], 2);
int32x4_t row2_scaled_l =
vmlal_lane_s16(z1_l, vget_low_s16(tmp13), consts.val[0], 3);
int32x4_t row2_scaled_h =
vmlal_lane_s16(z1_h, vget_high_s16(tmp13), consts.val[0], 3);
row2 = vcombine_s16(vrshrn_n_s32(row2_scaled_l, DESCALE_P2),
vrshrn_n_s32(row2_scaled_h, DESCALE_P2));
int32x4_t row6_scaled_l =
vmlal_lane_s16(z1_l, vget_low_s16(tmp12), consts.val[1], 3);
int32x4_t row6_scaled_h =
vmlal_lane_s16(z1_h, vget_high_s16(tmp12), consts.val[1], 3);
row6 = vcombine_s16(vrshrn_n_s32(row6_scaled_l, DESCALE_P2),
vrshrn_n_s32(row6_scaled_h, DESCALE_P2));
/* Odd part */
z1 = vaddq_s16(tmp4, tmp7);
z2 = vaddq_s16(tmp5, tmp6);
z3 = vaddq_s16(tmp4, tmp6);
z4 = vaddq_s16(tmp5, tmp7);
/* sqrt(2) * c3 */
z5_l = vmull_lane_s16(vget_low_s16(z3), consts.val[1], 1);
z5_h = vmull_lane_s16(vget_high_s16(z3), consts.val[1], 1);
z5_l = vmlal_lane_s16(z5_l, vget_low_s16(z4), consts.val[1], 1);
z5_h = vmlal_lane_s16(z5_h, vget_high_s16(z4), consts.val[1], 1);
/* sqrt(2) * (-c1+c3+c5-c7) */
tmp4_l = vmull_lane_s16(vget_low_s16(tmp4), consts.val[0], 0);
tmp4_h = vmull_lane_s16(vget_high_s16(tmp4), consts.val[0], 0);
/* sqrt(2) * ( c1+c3-c5+c7) */
tmp5_l = vmull_lane_s16(vget_low_s16(tmp5), consts.val[2], 1);
tmp5_h = vmull_lane_s16(vget_high_s16(tmp5), consts.val[2], 1);
/* sqrt(2) * ( c1+c3+c5-c7) */
tmp6_l = vmull_lane_s16(vget_low_s16(tmp6), consts.val[2], 3);
tmp6_h = vmull_lane_s16(vget_high_s16(tmp6), consts.val[2], 3);
/* sqrt(2) * ( c1+c3-c5-c7) */
tmp7_l = vmull_lane_s16(vget_low_s16(tmp7), consts.val[1], 2);
tmp7_h = vmull_lane_s16(vget_high_s16(tmp7), consts.val[1], 2);
/* sqrt(2) * (c7-c3) */
z1_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 0);
z1_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 0);
/* sqrt(2) * (-c1-c3) */
z2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[2], 2);
z2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[2], 2);
/* sqrt(2) * (-c3-c5) */
z3_l = vmull_lane_s16(vget_low_s16(z3), consts.val[2], 0);
z3_h = vmull_lane_s16(vget_high_s16(z3), consts.val[2], 0);
/* sqrt(2) * (c5-c3) */
z4_l = vmull_lane_s16(vget_low_s16(z4), consts.val[0], 1);
z4_h = vmull_lane_s16(vget_high_s16(z4), consts.val[0], 1);
z3_l = vaddq_s32(z3_l, z5_l);
z3_h = vaddq_s32(z3_h, z5_h);
z4_l = vaddq_s32(z4_l, z5_l);
z4_h = vaddq_s32(z4_h, z5_h);
tmp4_l = vaddq_s32(tmp4_l, z1_l);
tmp4_h = vaddq_s32(tmp4_h, z1_h);
tmp4_l = vaddq_s32(tmp4_l, z3_l);
tmp4_h = vaddq_s32(tmp4_h, z3_h);
row7 = vcombine_s16(vrshrn_n_s32(tmp4_l, DESCALE_P2),
vrshrn_n_s32(tmp4_h, DESCALE_P2));
tmp5_l = vaddq_s32(tmp5_l, z2_l);
tmp5_h = vaddq_s32(tmp5_h, z2_h);
tmp5_l = vaddq_s32(tmp5_l, z4_l);
tmp5_h = vaddq_s32(tmp5_h, z4_h);
row5 = vcombine_s16(vrshrn_n_s32(tmp5_l, DESCALE_P2),
vrshrn_n_s32(tmp5_h, DESCALE_P2));
tmp6_l = vaddq_s32(tmp6_l, z2_l);
tmp6_h = vaddq_s32(tmp6_h, z2_h);
tmp6_l = vaddq_s32(tmp6_l, z3_l);
tmp6_h = vaddq_s32(tmp6_h, z3_h);
row3 = vcombine_s16(vrshrn_n_s32(tmp6_l, DESCALE_P2),
vrshrn_n_s32(tmp6_h, DESCALE_P2));
tmp7_l = vaddq_s32(tmp7_l, z1_l);
tmp7_h = vaddq_s32(tmp7_h, z1_h);
tmp7_l = vaddq_s32(tmp7_l, z4_l);
tmp7_h = vaddq_s32(tmp7_h, z4_h);
row1 = vcombine_s16(vrshrn_n_s32(tmp7_l, DESCALE_P2),
vrshrn_n_s32(tmp7_h, DESCALE_P2));
vst1q_s16(data + 0 * DCTSIZE, row0);
vst1q_s16(data + 1 * DCTSIZE, row1);
vst1q_s16(data + 2 * DCTSIZE, row2);
vst1q_s16(data + 3 * DCTSIZE, row3);
vst1q_s16(data + 4 * DCTSIZE, row4);
vst1q_s16(data + 5 * DCTSIZE, row5);
vst1q_s16(data + 6 * DCTSIZE, row6);
vst1q_s16(data + 7 * DCTSIZE, row7);
}

472
simd/arm/jidctfst-neon.c Normal file
View File

@@ -0,0 +1,472 @@
/*
* jidctfst-neon.c - fast integer IDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include <arm_neon.h>
/* jsimd_idct_ifast_neon() performs dequantization and a fast, not so accurate
* inverse DCT (Discrete Cosine Transform) on one block of coefficients. It
* uses the same calculations and produces exactly the same output as IJG's
* original jpeg_idct_ifast() function, which can be found in jidctfst.c.
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.082392200 = 2688 * 2^-15
* 0.414213562 = 13568 * 2^-15
* 0.847759065 = 27776 * 2^-15
* 0.613125930 = 20096 * 2^-15
*
* See jidctfst.c for further details of the IDCT algorithm. Where possible,
* the variable names and comments here in jsimd_idct_ifast_neon() match up
* with those in jpeg_idct_ifast().
*/
#define PASS1_BITS 2
#define F_0_082 2688
#define F_0_414 13568
#define F_0_847 27776
#define F_0_613 20096
ALIGN(16) static const int16_t jsimd_idct_ifast_neon_consts[] = {
F_0_082, F_0_414, F_0_847, F_0_613
};
void jsimd_idct_ifast_neon(void *dct_table, JCOEFPTR coef_block,
JSAMPARRAY output_buf, JDIMENSION output_col)
{
IFAST_MULT_TYPE *quantptr = dct_table;
/* Load DCT coefficients. */
int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE);
int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE);
int16x8_t row2 = vld1q_s16(coef_block + 2 * DCTSIZE);
int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE);
int16x8_t row4 = vld1q_s16(coef_block + 4 * DCTSIZE);
int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE);
int16x8_t row6 = vld1q_s16(coef_block + 6 * DCTSIZE);
int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE);
/* Load quantization table values for DC coefficients. */
int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE);
/* Dequantize DC coefficients. */
row0 = vmulq_s16(row0, quant_row0);
/* Construct bitmap to test if all AC coefficients are 0. */
int16x8_t bitmap = vorrq_s16(row1, row2);
bitmap = vorrq_s16(bitmap, row3);
bitmap = vorrq_s16(bitmap, row4);
bitmap = vorrq_s16(bitmap, row5);
bitmap = vorrq_s16(bitmap, row6);
bitmap = vorrq_s16(bitmap, row7);
int64_t left_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 0);
int64_t right_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 1);
/* Load IDCT conversion constants. */
const int16x4_t consts = vld1_s16(jsimd_idct_ifast_neon_consts);
if (left_ac_bitmap == 0 && right_ac_bitmap == 0) {
/* All AC coefficients are zero.
* Compute DC values and duplicate into vectors.
*/
int16x8_t dcval = row0;
row1 = dcval;
row2 = dcval;
row3 = dcval;
row4 = dcval;
row5 = dcval;
row6 = dcval;
row7 = dcval;
} else if (left_ac_bitmap == 0) {
/* AC coefficients are zero for columns 0, 1, 2, and 3.
* Use DC values for these columns.
*/
int16x4_t dcval = vget_low_s16(row0);
/* Commence regular fast IDCT computation for columns 4, 5, 6, and 7. */
/* Load quantization table. */
int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4);
int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4);
int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4);
int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE + 4);
int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4);
int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4);
int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4);
/* Even part: dequantize DCT coefficients. */
int16x4_t tmp0 = vget_high_s16(row0);
int16x4_t tmp1 = vmul_s16(vget_high_s16(row2), quant_row2);
int16x4_t tmp2 = vmul_s16(vget_high_s16(row4), quant_row4);
int16x4_t tmp3 = vmul_s16(vget_high_s16(row6), quant_row6);
int16x4_t tmp10 = vadd_s16(tmp0, tmp2); /* phase 3 */
int16x4_t tmp11 = vsub_s16(tmp0, tmp2);
int16x4_t tmp13 = vadd_s16(tmp1, tmp3); /* phases 5-3 */
int16x4_t tmp1_sub_tmp3 = vsub_s16(tmp1, tmp3);
int16x4_t tmp12 = vqdmulh_lane_s16(tmp1_sub_tmp3, consts, 1);
tmp12 = vadd_s16(tmp12, tmp1_sub_tmp3);
tmp12 = vsub_s16(tmp12, tmp13);
tmp0 = vadd_s16(tmp10, tmp13); /* phase 2 */
tmp3 = vsub_s16(tmp10, tmp13);
tmp1 = vadd_s16(tmp11, tmp12);
tmp2 = vsub_s16(tmp11, tmp12);
/* Odd part: dequantize DCT coefficients. */
int16x4_t tmp4 = vmul_s16(vget_high_s16(row1), quant_row1);
int16x4_t tmp5 = vmul_s16(vget_high_s16(row3), quant_row3);
int16x4_t tmp6 = vmul_s16(vget_high_s16(row5), quant_row5);
int16x4_t tmp7 = vmul_s16(vget_high_s16(row7), quant_row7);
int16x4_t z13 = vadd_s16(tmp6, tmp5); /* phase 6 */
int16x4_t neg_z10 = vsub_s16(tmp5, tmp6);
int16x4_t z11 = vadd_s16(tmp4, tmp7);
int16x4_t z12 = vsub_s16(tmp4, tmp7);
tmp7 = vadd_s16(z11, z13); /* phase 5 */
int16x4_t z11_sub_z13 = vsub_s16(z11, z13);
tmp11 = vqdmulh_lane_s16(z11_sub_z13, consts, 1);
tmp11 = vadd_s16(tmp11, z11_sub_z13);
int16x4_t z10_add_z12 = vsub_s16(z12, neg_z10);
int16x4_t z5 = vqdmulh_lane_s16(z10_add_z12, consts, 2);
z5 = vadd_s16(z5, z10_add_z12);
tmp10 = vqdmulh_lane_s16(z12, consts, 0);
tmp10 = vadd_s16(tmp10, z12);
tmp10 = vsub_s16(tmp10, z5);
tmp12 = vqdmulh_lane_s16(neg_z10, consts, 3);
tmp12 = vadd_s16(tmp12, vadd_s16(neg_z10, neg_z10));
tmp12 = vadd_s16(tmp12, z5);
tmp6 = vsub_s16(tmp12, tmp7); /* phase 2 */
tmp5 = vsub_s16(tmp11, tmp6);
tmp4 = vadd_s16(tmp10, tmp5);
row0 = vcombine_s16(dcval, vadd_s16(tmp0, tmp7));
row7 = vcombine_s16(dcval, vsub_s16(tmp0, tmp7));
row1 = vcombine_s16(dcval, vadd_s16(tmp1, tmp6));
row6 = vcombine_s16(dcval, vsub_s16(tmp1, tmp6));
row2 = vcombine_s16(dcval, vadd_s16(tmp2, tmp5));
row5 = vcombine_s16(dcval, vsub_s16(tmp2, tmp5));
row4 = vcombine_s16(dcval, vadd_s16(tmp3, tmp4));
row3 = vcombine_s16(dcval, vsub_s16(tmp3, tmp4));
} else if (right_ac_bitmap == 0) {
/* AC coefficients are zero for columns 4, 5, 6, and 7.
* Use DC values for these columns.
*/
int16x4_t dcval = vget_high_s16(row0);
/* Commence regular fast IDCT computation for columns 0, 1, 2, and 3. */
/* Load quantization table. */
int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE);
int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE);
int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE);
int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE);
int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE);
int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE);
int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE);
/* Even part: dequantize DCT coefficients. */
int16x4_t tmp0 = vget_low_s16(row0);
int16x4_t tmp1 = vmul_s16(vget_low_s16(row2), quant_row2);
int16x4_t tmp2 = vmul_s16(vget_low_s16(row4), quant_row4);
int16x4_t tmp3 = vmul_s16(vget_low_s16(row6), quant_row6);
int16x4_t tmp10 = vadd_s16(tmp0, tmp2); /* phase 3 */
int16x4_t tmp11 = vsub_s16(tmp0, tmp2);
int16x4_t tmp13 = vadd_s16(tmp1, tmp3); /* phases 5-3 */
int16x4_t tmp1_sub_tmp3 = vsub_s16(tmp1, tmp3);
int16x4_t tmp12 = vqdmulh_lane_s16(tmp1_sub_tmp3, consts, 1);
tmp12 = vadd_s16(tmp12, tmp1_sub_tmp3);
tmp12 = vsub_s16(tmp12, tmp13);
tmp0 = vadd_s16(tmp10, tmp13); /* phase 2 */
tmp3 = vsub_s16(tmp10, tmp13);
tmp1 = vadd_s16(tmp11, tmp12);
tmp2 = vsub_s16(tmp11, tmp12);
/* Odd part: dequantize DCT coefficients. */
int16x4_t tmp4 = vmul_s16(vget_low_s16(row1), quant_row1);
int16x4_t tmp5 = vmul_s16(vget_low_s16(row3), quant_row3);
int16x4_t tmp6 = vmul_s16(vget_low_s16(row5), quant_row5);
int16x4_t tmp7 = vmul_s16(vget_low_s16(row7), quant_row7);
int16x4_t z13 = vadd_s16(tmp6, tmp5); /* phase 6 */
int16x4_t neg_z10 = vsub_s16(tmp5, tmp6);
int16x4_t z11 = vadd_s16(tmp4, tmp7);
int16x4_t z12 = vsub_s16(tmp4, tmp7);
tmp7 = vadd_s16(z11, z13); /* phase 5 */
int16x4_t z11_sub_z13 = vsub_s16(z11, z13);
tmp11 = vqdmulh_lane_s16(z11_sub_z13, consts, 1);
tmp11 = vadd_s16(tmp11, z11_sub_z13);
int16x4_t z10_add_z12 = vsub_s16(z12, neg_z10);
int16x4_t z5 = vqdmulh_lane_s16(z10_add_z12, consts, 2);
z5 = vadd_s16(z5, z10_add_z12);
tmp10 = vqdmulh_lane_s16(z12, consts, 0);
tmp10 = vadd_s16(tmp10, z12);
tmp10 = vsub_s16(tmp10, z5);
tmp12 = vqdmulh_lane_s16(neg_z10, consts, 3);
tmp12 = vadd_s16(tmp12, vadd_s16(neg_z10, neg_z10));
tmp12 = vadd_s16(tmp12, z5);
tmp6 = vsub_s16(tmp12, tmp7); /* phase 2 */
tmp5 = vsub_s16(tmp11, tmp6);
tmp4 = vadd_s16(tmp10, tmp5);
row0 = vcombine_s16(vadd_s16(tmp0, tmp7), dcval);
row7 = vcombine_s16(vsub_s16(tmp0, tmp7), dcval);
row1 = vcombine_s16(vadd_s16(tmp1, tmp6), dcval);
row6 = vcombine_s16(vsub_s16(tmp1, tmp6), dcval);
row2 = vcombine_s16(vadd_s16(tmp2, tmp5), dcval);
row5 = vcombine_s16(vsub_s16(tmp2, tmp5), dcval);
row4 = vcombine_s16(vadd_s16(tmp3, tmp4), dcval);
row3 = vcombine_s16(vsub_s16(tmp3, tmp4), dcval);
} else {
/* Some AC coefficients are non-zero; full IDCT calculation required. */
/* Load quantization table. */
int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE);
int16x8_t quant_row2 = vld1q_s16(quantptr + 2 * DCTSIZE);
int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE);
int16x8_t quant_row4 = vld1q_s16(quantptr + 4 * DCTSIZE);
int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE);
int16x8_t quant_row6 = vld1q_s16(quantptr + 6 * DCTSIZE);
int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE);
/* Even part: dequantize DCT coefficients. */
int16x8_t tmp0 = row0;
int16x8_t tmp1 = vmulq_s16(row2, quant_row2);
int16x8_t tmp2 = vmulq_s16(row4, quant_row4);
int16x8_t tmp3 = vmulq_s16(row6, quant_row6);
int16x8_t tmp10 = vaddq_s16(tmp0, tmp2); /* phase 3 */
int16x8_t tmp11 = vsubq_s16(tmp0, tmp2);
int16x8_t tmp13 = vaddq_s16(tmp1, tmp3); /* phases 5-3 */
int16x8_t tmp1_sub_tmp3 = vsubq_s16(tmp1, tmp3);
int16x8_t tmp12 = vqdmulhq_lane_s16(tmp1_sub_tmp3, consts, 1);
tmp12 = vaddq_s16(tmp12, tmp1_sub_tmp3);
tmp12 = vsubq_s16(tmp12, tmp13);
tmp0 = vaddq_s16(tmp10, tmp13); /* phase 2 */
tmp3 = vsubq_s16(tmp10, tmp13);
tmp1 = vaddq_s16(tmp11, tmp12);
tmp2 = vsubq_s16(tmp11, tmp12);
/* Odd part: dequantize DCT coefficients. */
int16x8_t tmp4 = vmulq_s16(row1, quant_row1);
int16x8_t tmp5 = vmulq_s16(row3, quant_row3);
int16x8_t tmp6 = vmulq_s16(row5, quant_row5);
int16x8_t tmp7 = vmulq_s16(row7, quant_row7);
int16x8_t z13 = vaddq_s16(tmp6, tmp5); /* phase 6 */
int16x8_t neg_z10 = vsubq_s16(tmp5, tmp6);
int16x8_t z11 = vaddq_s16(tmp4, tmp7);
int16x8_t z12 = vsubq_s16(tmp4, tmp7);
tmp7 = vaddq_s16(z11, z13); /* phase 5 */
int16x8_t z11_sub_z13 = vsubq_s16(z11, z13);
tmp11 = vqdmulhq_lane_s16(z11_sub_z13, consts, 1);
tmp11 = vaddq_s16(tmp11, z11_sub_z13);
int16x8_t z10_add_z12 = vsubq_s16(z12, neg_z10);
int16x8_t z5 = vqdmulhq_lane_s16(z10_add_z12, consts, 2);
z5 = vaddq_s16(z5, z10_add_z12);
tmp10 = vqdmulhq_lane_s16(z12, consts, 0);
tmp10 = vaddq_s16(tmp10, z12);
tmp10 = vsubq_s16(tmp10, z5);
tmp12 = vqdmulhq_lane_s16(neg_z10, consts, 3);
tmp12 = vaddq_s16(tmp12, vaddq_s16(neg_z10, neg_z10));
tmp12 = vaddq_s16(tmp12, z5);
tmp6 = vsubq_s16(tmp12, tmp7); /* phase 2 */
tmp5 = vsubq_s16(tmp11, tmp6);
tmp4 = vaddq_s16(tmp10, tmp5);
row0 = vaddq_s16(tmp0, tmp7);
row7 = vsubq_s16(tmp0, tmp7);
row1 = vaddq_s16(tmp1, tmp6);
row6 = vsubq_s16(tmp1, tmp6);
row2 = vaddq_s16(tmp2, tmp5);
row5 = vsubq_s16(tmp2, tmp5);
row4 = vaddq_s16(tmp3, tmp4);
row3 = vsubq_s16(tmp3, tmp4);
}
/* Transpose rows to work on columns in pass 2. */
int16x8x2_t rows_01 = vtrnq_s16(row0, row1);
int16x8x2_t rows_23 = vtrnq_s16(row2, row3);
int16x8x2_t rows_45 = vtrnq_s16(row4, row5);
int16x8x2_t rows_67 = vtrnq_s16(row6, row7);
int32x4x2_t rows_0145_l = vtrnq_s32(vreinterpretq_s32_s16(rows_01.val[0]),
vreinterpretq_s32_s16(rows_45.val[0]));
int32x4x2_t rows_0145_h = vtrnq_s32(vreinterpretq_s32_s16(rows_01.val[1]),
vreinterpretq_s32_s16(rows_45.val[1]));
int32x4x2_t rows_2367_l = vtrnq_s32(vreinterpretq_s32_s16(rows_23.val[0]),
vreinterpretq_s32_s16(rows_67.val[0]));
int32x4x2_t rows_2367_h = vtrnq_s32(vreinterpretq_s32_s16(rows_23.val[1]),
vreinterpretq_s32_s16(rows_67.val[1]));
int32x4x2_t cols_04 = vzipq_s32(rows_0145_l.val[0], rows_2367_l.val[0]);
int32x4x2_t cols_15 = vzipq_s32(rows_0145_h.val[0], rows_2367_h.val[0]);
int32x4x2_t cols_26 = vzipq_s32(rows_0145_l.val[1], rows_2367_l.val[1]);
int32x4x2_t cols_37 = vzipq_s32(rows_0145_h.val[1], rows_2367_h.val[1]);
int16x8_t col0 = vreinterpretq_s16_s32(cols_04.val[0]);
int16x8_t col1 = vreinterpretq_s16_s32(cols_15.val[0]);
int16x8_t col2 = vreinterpretq_s16_s32(cols_26.val[0]);
int16x8_t col3 = vreinterpretq_s16_s32(cols_37.val[0]);
int16x8_t col4 = vreinterpretq_s16_s32(cols_04.val[1]);
int16x8_t col5 = vreinterpretq_s16_s32(cols_15.val[1]);
int16x8_t col6 = vreinterpretq_s16_s32(cols_26.val[1]);
int16x8_t col7 = vreinterpretq_s16_s32(cols_37.val[1]);
/* 1-D IDCT, pass 2 */
/* Even part */
int16x8_t tmp10 = vaddq_s16(col0, col4);
int16x8_t tmp11 = vsubq_s16(col0, col4);
int16x8_t tmp13 = vaddq_s16(col2, col6);
int16x8_t col2_sub_col6 = vsubq_s16(col2, col6);
int16x8_t tmp12 = vqdmulhq_lane_s16(col2_sub_col6, consts, 1);
tmp12 = vaddq_s16(tmp12, col2_sub_col6);
tmp12 = vsubq_s16(tmp12, tmp13);
int16x8_t tmp0 = vaddq_s16(tmp10, tmp13);
int16x8_t tmp3 = vsubq_s16(tmp10, tmp13);
int16x8_t tmp1 = vaddq_s16(tmp11, tmp12);
int16x8_t tmp2 = vsubq_s16(tmp11, tmp12);
/* Odd part */
int16x8_t z13 = vaddq_s16(col5, col3);
int16x8_t neg_z10 = vsubq_s16(col3, col5);
int16x8_t z11 = vaddq_s16(col1, col7);
int16x8_t z12 = vsubq_s16(col1, col7);
int16x8_t tmp7 = vaddq_s16(z11, z13); /* phase 5 */
int16x8_t z11_sub_z13 = vsubq_s16(z11, z13);
tmp11 = vqdmulhq_lane_s16(z11_sub_z13, consts, 1);
tmp11 = vaddq_s16(tmp11, z11_sub_z13);
int16x8_t z10_add_z12 = vsubq_s16(z12, neg_z10);
int16x8_t z5 = vqdmulhq_lane_s16(z10_add_z12, consts, 2);
z5 = vaddq_s16(z5, z10_add_z12);
tmp10 = vqdmulhq_lane_s16(z12, consts, 0);
tmp10 = vaddq_s16(tmp10, z12);
tmp10 = vsubq_s16(tmp10, z5);
tmp12 = vqdmulhq_lane_s16(neg_z10, consts, 3);
tmp12 = vaddq_s16(tmp12, vaddq_s16(neg_z10, neg_z10));
tmp12 = vaddq_s16(tmp12, z5);
int16x8_t tmp6 = vsubq_s16(tmp12, tmp7); /* phase 2 */
int16x8_t tmp5 = vsubq_s16(tmp11, tmp6);
int16x8_t tmp4 = vaddq_s16(tmp10, tmp5);
col0 = vaddq_s16(tmp0, tmp7);
col7 = vsubq_s16(tmp0, tmp7);
col1 = vaddq_s16(tmp1, tmp6);
col6 = vsubq_s16(tmp1, tmp6);
col2 = vaddq_s16(tmp2, tmp5);
col5 = vsubq_s16(tmp2, tmp5);
col4 = vaddq_s16(tmp3, tmp4);
col3 = vsubq_s16(tmp3, tmp4);
/* Scale down by a factor of 8, narrowing to 8-bit. */
int8x16_t cols_01_s8 = vcombine_s8(vqshrn_n_s16(col0, PASS1_BITS + 3),
vqshrn_n_s16(col1, PASS1_BITS + 3));
int8x16_t cols_45_s8 = vcombine_s8(vqshrn_n_s16(col4, PASS1_BITS + 3),
vqshrn_n_s16(col5, PASS1_BITS + 3));
int8x16_t cols_23_s8 = vcombine_s8(vqshrn_n_s16(col2, PASS1_BITS + 3),
vqshrn_n_s16(col3, PASS1_BITS + 3));
int8x16_t cols_67_s8 = vcombine_s8(vqshrn_n_s16(col6, PASS1_BITS + 3),
vqshrn_n_s16(col7, PASS1_BITS + 3));
/* Clamp to range [0-255]. */
uint8x16_t cols_01 =
vreinterpretq_u8_s8
(vaddq_s8(cols_01_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE))));
uint8x16_t cols_45 =
vreinterpretq_u8_s8
(vaddq_s8(cols_45_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE))));
uint8x16_t cols_23 =
vreinterpretq_u8_s8
(vaddq_s8(cols_23_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE))));
uint8x16_t cols_67 =
vreinterpretq_u8_s8
(vaddq_s8(cols_67_s8, vreinterpretq_s8_u8(vdupq_n_u8(CENTERJSAMPLE))));
/* Transpose block to prepare for store. */
uint32x4x2_t cols_0415 = vzipq_u32(vreinterpretq_u32_u8(cols_01),
vreinterpretq_u32_u8(cols_45));
uint32x4x2_t cols_2637 = vzipq_u32(vreinterpretq_u32_u8(cols_23),
vreinterpretq_u32_u8(cols_67));
uint8x16x2_t cols_0145 = vtrnq_u8(vreinterpretq_u8_u32(cols_0415.val[0]),
vreinterpretq_u8_u32(cols_0415.val[1]));
uint8x16x2_t cols_2367 = vtrnq_u8(vreinterpretq_u8_u32(cols_2637.val[0]),
vreinterpretq_u8_u32(cols_2637.val[1]));
uint16x8x2_t rows_0426 = vtrnq_u16(vreinterpretq_u16_u8(cols_0145.val[0]),
vreinterpretq_u16_u8(cols_2367.val[0]));
uint16x8x2_t rows_1537 = vtrnq_u16(vreinterpretq_u16_u8(cols_0145.val[1]),
vreinterpretq_u16_u8(cols_2367.val[1]));
uint8x16_t rows_04 = vreinterpretq_u8_u16(rows_0426.val[0]);
uint8x16_t rows_15 = vreinterpretq_u8_u16(rows_1537.val[0]);
uint8x16_t rows_26 = vreinterpretq_u8_u16(rows_0426.val[1]);
uint8x16_t rows_37 = vreinterpretq_u8_u16(rows_1537.val[1]);
JSAMPROW outptr0 = output_buf[0] + output_col;
JSAMPROW outptr1 = output_buf[1] + output_col;
JSAMPROW outptr2 = output_buf[2] + output_col;
JSAMPROW outptr3 = output_buf[3] + output_col;
JSAMPROW outptr4 = output_buf[4] + output_col;
JSAMPROW outptr5 = output_buf[5] + output_col;
JSAMPROW outptr6 = output_buf[6] + output_col;
JSAMPROW outptr7 = output_buf[7] + output_col;
/* Store DCT block to memory. */
vst1q_lane_u64((uint64_t *)outptr0, vreinterpretq_u64_u8(rows_04), 0);
vst1q_lane_u64((uint64_t *)outptr1, vreinterpretq_u64_u8(rows_15), 0);
vst1q_lane_u64((uint64_t *)outptr2, vreinterpretq_u64_u8(rows_26), 0);
vst1q_lane_u64((uint64_t *)outptr3, vreinterpretq_u64_u8(rows_37), 0);
vst1q_lane_u64((uint64_t *)outptr4, vreinterpretq_u64_u8(rows_04), 1);
vst1q_lane_u64((uint64_t *)outptr5, vreinterpretq_u64_u8(rows_15), 1);
vst1q_lane_u64((uint64_t *)outptr6, vreinterpretq_u64_u8(rows_26), 1);
vst1q_lane_u64((uint64_t *)outptr7, vreinterpretq_u64_u8(rows_37), 1);
}

802
simd/arm/jidctint-neon.c Normal file
View File

@@ -0,0 +1,802 @@
/*
* jidctint-neon.c - accurate integer IDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "jconfigint.h"
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>
#define CONST_BITS 13
#define PASS1_BITS 2
#define DESCALE_P1 (CONST_BITS - PASS1_BITS)
#define DESCALE_P2 (CONST_BITS + PASS1_BITS + 3)
/* The computation of the inverse DCT requires the use of constants known at
* compile time. Scaled integer constants are used to avoid floating-point
* arithmetic:
* 0.298631336 = 2446 * 2^-13
* 0.390180644 = 3196 * 2^-13
* 0.541196100 = 4433 * 2^-13
* 0.765366865 = 6270 * 2^-13
* 0.899976223 = 7373 * 2^-13
* 1.175875602 = 9633 * 2^-13
* 1.501321110 = 12299 * 2^-13
* 1.847759065 = 15137 * 2^-13
* 1.961570560 = 16069 * 2^-13
* 2.053119869 = 16819 * 2^-13
* 2.562915447 = 20995 * 2^-13
* 3.072711026 = 25172 * 2^-13
*/
#define F_0_298 2446
#define F_0_390 3196
#define F_0_541 4433
#define F_0_765 6270
#define F_0_899 7373
#define F_1_175 9633
#define F_1_501 12299
#define F_1_847 15137
#define F_1_961 16069
#define F_2_053 16819
#define F_2_562 20995
#define F_3_072 25172
#define F_1_175_MINUS_1_961 (F_1_175 - F_1_961)
#define F_1_175_MINUS_0_390 (F_1_175 - F_0_390)
#define F_0_541_MINUS_1_847 (F_0_541 - F_1_847)
#define F_3_072_MINUS_2_562 (F_3_072 - F_2_562)
#define F_0_298_MINUS_0_899 (F_0_298 - F_0_899)
#define F_1_501_MINUS_0_899 (F_1_501 - F_0_899)
#define F_2_053_MINUS_2_562 (F_2_053 - F_2_562)
#define F_0_541_PLUS_0_765 (F_0_541 + F_0_765)
ALIGN(16) static const int16_t jsimd_idct_islow_neon_consts[] = {
F_0_899, F_0_541,
F_2_562, F_0_298_MINUS_0_899,
F_1_501_MINUS_0_899, F_2_053_MINUS_2_562,
F_0_541_PLUS_0_765, F_1_175,
F_1_175_MINUS_0_390, F_0_541_MINUS_1_847,
F_3_072_MINUS_2_562, F_1_175_MINUS_1_961,
0, 0, 0, 0
};
/* Forward declaration of regular and sparse IDCT helper functions */
static INLINE void jsimd_idct_islow_pass1_regular(int16x4_t row0,
int16x4_t row1,
int16x4_t row2,
int16x4_t row3,
int16x4_t row4,
int16x4_t row5,
int16x4_t row6,
int16x4_t row7,
int16x4_t quant_row0,
int16x4_t quant_row1,
int16x4_t quant_row2,
int16x4_t quant_row3,
int16x4_t quant_row4,
int16x4_t quant_row5,
int16x4_t quant_row6,
int16x4_t quant_row7,
int16_t *workspace_1,
int16_t *workspace_2);
static INLINE void jsimd_idct_islow_pass1_sparse(int16x4_t row0,
int16x4_t row1,
int16x4_t row2,
int16x4_t row3,
int16x4_t quant_row0,
int16x4_t quant_row1,
int16x4_t quant_row2,
int16x4_t quant_row3,
int16_t *workspace_1,
int16_t *workspace_2);
static INLINE void jsimd_idct_islow_pass2_regular(int16_t *workspace,
JSAMPARRAY output_buf,
JDIMENSION output_col,
unsigned buf_offset);
static INLINE void jsimd_idct_islow_pass2_sparse(int16_t *workspace,
JSAMPARRAY output_buf,
JDIMENSION output_col,
unsigned buf_offset);
/* Perform dequantization and inverse DCT on one block of coefficients. For
* reference, the C implementation (jpeg_idct_slow()) can be found in
* jidctint.c.
*
* Optimization techniques used for fast data access:
*
* In each pass, the inverse DCT is computed for the left and right 4x8 halves
* of the DCT block. This avoids spilling due to register pressure, and the
* increased granularity allows for an optimized calculation depending on the
* values of the DCT coefficients. Between passes, intermediate data is stored
* in 4x8 workspace buffers.
*
* Transposing the 8x8 DCT block after each pass can be achieved by transposing
* each of the four 4x4 quadrants and swapping quadrants 1 and 2 (refer to the
* diagram below.) Swapping quadrants is cheap, since the second pass can just
* swap the workspace buffer pointers.
*
* +-------+-------+ +-------+-------+
* | | | | | |
* | 0 | 1 | | 0 | 2 |
* | | | transpose | | |
* +-------+-------+ ------> +-------+-------+
* | | | | | |
* | 2 | 3 | | 1 | 3 |
* | | | | | |
* +-------+-------+ +-------+-------+
*
* Optimization techniques used to accelerate the inverse DCT calculation:
*
* In a DCT coefficient block, the coefficients are increasingly likely to be 0
* as you move diagonally from top left to bottom right. If whole rows of
* coefficients are 0, then the inverse DCT calculation can be simplified. On
* the first pass of the inverse DCT, we test for three special cases before
* defaulting to a full "regular" inverse DCT:
*
* 1) Coefficients in rows 4-7 are all zero. In this case, we perform a
* "sparse" simplified inverse DCT on rows 0-3.
* 2) AC coefficients (rows 1-7) are all zero. In this case, the inverse DCT
* result is equal to the dequantized DC coefficients.
* 3) AC and DC coefficients are all zero. In this case, the inverse DCT
* result is all zero. For the left 4x8 half, this is handled identically
* to Case 2 above. For the right 4x8 half, we do no work and signal that
* the "sparse" algorithm is required for the second pass.
*
* In the second pass, only a single special case is tested: whether the AC and
* DC coefficients were all zero in the right 4x8 block during the first pass
* (refer to Case 3 above.) If this is the case, then a "sparse" variant of
* the second pass is performed for both the left and right halves of the DCT
* block. (The transposition after the first pass means that the right 4x8
* block during the first pass becomes rows 4-7 during the second pass.)
*/
void jsimd_idct_islow_neon(void *dct_table, JCOEFPTR coef_block,
JSAMPARRAY output_buf, JDIMENSION output_col)
{
ISLOW_MULT_TYPE *quantptr = dct_table;
int16_t workspace_l[8 * DCTSIZE / 2];
int16_t workspace_r[8 * DCTSIZE / 2];
/* Compute IDCT first pass on left 4x8 coefficient block. */
/* Load DCT coefficients in left 4x8 block. */
int16x4_t row0 = vld1_s16(coef_block + 0 * DCTSIZE);
int16x4_t row1 = vld1_s16(coef_block + 1 * DCTSIZE);
int16x4_t row2 = vld1_s16(coef_block + 2 * DCTSIZE);
int16x4_t row3 = vld1_s16(coef_block + 3 * DCTSIZE);
int16x4_t row4 = vld1_s16(coef_block + 4 * DCTSIZE);
int16x4_t row5 = vld1_s16(coef_block + 5 * DCTSIZE);
int16x4_t row6 = vld1_s16(coef_block + 6 * DCTSIZE);
int16x4_t row7 = vld1_s16(coef_block + 7 * DCTSIZE);
/* Load quantization table for left 4x8 block. */
int16x4_t quant_row0 = vld1_s16(quantptr + 0 * DCTSIZE);
int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE);
int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE);
int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE);
int16x4_t quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE);
int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE);
int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE);
int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE);
/* Construct bitmap to test if DCT coefficients in left 4x8 block are 0. */
int16x4_t bitmap = vorr_s16(row7, row6);
bitmap = vorr_s16(bitmap, row5);
bitmap = vorr_s16(bitmap, row4);
int64_t bitmap_rows_4567 = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0);
if (bitmap_rows_4567 == 0) {
bitmap = vorr_s16(bitmap, row3);
bitmap = vorr_s16(bitmap, row2);
bitmap = vorr_s16(bitmap, row1);
int64_t left_ac_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0);
if (left_ac_bitmap == 0) {
int16x4_t dcval = vshl_n_s16(vmul_s16(row0, quant_row0), PASS1_BITS);
int16x4x4_t quadrant = { { dcval, dcval, dcval, dcval } };
/* Store 4x4 blocks to workspace, transposing in the process. */
vst4_s16(workspace_l, quadrant);
vst4_s16(workspace_r, quadrant);
} else {
jsimd_idct_islow_pass1_sparse(row0, row1, row2, row3, quant_row0,
quant_row1, quant_row2, quant_row3,
workspace_l, workspace_r);
}
} else {
jsimd_idct_islow_pass1_regular(row0, row1, row2, row3, row4, row5,
row6, row7, quant_row0, quant_row1,
quant_row2, quant_row3, quant_row4,
quant_row5, quant_row6, quant_row7,
workspace_l, workspace_r);
}
/* Compute IDCT first pass on right 4x8 coefficient block. */
/* Load DCT coefficients in right 4x8 block. */
row0 = vld1_s16(coef_block + 0 * DCTSIZE + 4);
row1 = vld1_s16(coef_block + 1 * DCTSIZE + 4);
row2 = vld1_s16(coef_block + 2 * DCTSIZE + 4);
row3 = vld1_s16(coef_block + 3 * DCTSIZE + 4);
row4 = vld1_s16(coef_block + 4 * DCTSIZE + 4);
row5 = vld1_s16(coef_block + 5 * DCTSIZE + 4);
row6 = vld1_s16(coef_block + 6 * DCTSIZE + 4);
row7 = vld1_s16(coef_block + 7 * DCTSIZE + 4);
/* Load quantization table for right 4x8 block. */
quant_row0 = vld1_s16(quantptr + 0 * DCTSIZE + 4);
quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4);
quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4);
quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4);
quant_row4 = vld1_s16(quantptr + 4 * DCTSIZE + 4);
quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4);
quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4);
quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4);
/* Construct bitmap to test if DCT coefficients in right 4x8 block are 0. */
bitmap = vorr_s16(row7, row6);
bitmap = vorr_s16(bitmap, row5);
bitmap = vorr_s16(bitmap, row4);
bitmap_rows_4567 = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0);
bitmap = vorr_s16(bitmap, row3);
bitmap = vorr_s16(bitmap, row2);
bitmap = vorr_s16(bitmap, row1);
int64_t right_ac_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0);
/* If this remains non-zero, a "regular" second pass will be performed. */
int64_t right_ac_dc_bitmap = 1;
if (right_ac_bitmap == 0) {
bitmap = vorr_s16(bitmap, row0);
right_ac_dc_bitmap = vget_lane_s64(vreinterpret_s64_s16(bitmap), 0);
if (right_ac_dc_bitmap != 0) {
int16x4_t dcval = vshl_n_s16(vmul_s16(row0, quant_row0), PASS1_BITS);
int16x4x4_t quadrant = { { dcval, dcval, dcval, dcval } };
/* Store 4x4 blocks to workspace, transposing in the process. */
vst4_s16(workspace_l + 4 * DCTSIZE / 2, quadrant);
vst4_s16(workspace_r + 4 * DCTSIZE / 2, quadrant);
}
} else {
if (bitmap_rows_4567 == 0) {
jsimd_idct_islow_pass1_sparse(row0, row1, row2, row3, quant_row0,
quant_row1, quant_row2, quant_row3,
workspace_l + 4 * DCTSIZE / 2,
workspace_r + 4 * DCTSIZE / 2);
} else {
jsimd_idct_islow_pass1_regular(row0, row1, row2, row3, row4, row5,
row6, row7, quant_row0, quant_row1,
quant_row2, quant_row3, quant_row4,
quant_row5, quant_row6, quant_row7,
workspace_l + 4 * DCTSIZE / 2,
workspace_r + 4 * DCTSIZE / 2);
}
}
/* Second pass: compute IDCT on rows in workspace. */
/* If all coefficients in right 4x8 block are 0, use "sparse" second pass. */
if (right_ac_dc_bitmap == 0) {
jsimd_idct_islow_pass2_sparse(workspace_l, output_buf, output_col, 0);
jsimd_idct_islow_pass2_sparse(workspace_r, output_buf, output_col, 4);
} else {
jsimd_idct_islow_pass2_regular(workspace_l, output_buf, output_col, 0);
jsimd_idct_islow_pass2_regular(workspace_r, output_buf, output_col, 4);
}
}
/* Perform dequantization and the first pass of the accurate inverse DCT on a
* 4x8 block of coefficients. (To process the full 8x8 DCT block, this
* function-- or some other optimized variant-- needs to be called for both the
* left and right 4x8 blocks.)
*
* This "regular" version assumes that no optimization can be made to the IDCT
* calculation, since no useful set of AC coefficients is all 0.
*
* The original C implementation of the accurate IDCT (jpeg_idct_slow()) can be
* found in jidctint.c. Algorithmic changes made here are documented inline.
*/
static INLINE void jsimd_idct_islow_pass1_regular(int16x4_t row0,
int16x4_t row1,
int16x4_t row2,
int16x4_t row3,
int16x4_t row4,
int16x4_t row5,
int16x4_t row6,
int16x4_t row7,
int16x4_t quant_row0,
int16x4_t quant_row1,
int16x4_t quant_row2,
int16x4_t quant_row3,
int16x4_t quant_row4,
int16x4_t quant_row5,
int16x4_t quant_row6,
int16x4_t quant_row7,
int16_t *workspace_1,
int16_t *workspace_2)
{
/* Load constants for IDCT computation. */
#ifdef HAVE_VLD1_S16_X3
const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts);
#else
const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts);
const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4);
const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8);
const int16x4x3_t consts = { { consts1, consts2, consts3 } };
#endif
/* Even part */
int16x4_t z2_s16 = vmul_s16(row2, quant_row2);
int16x4_t z3_s16 = vmul_s16(row6, quant_row6);
int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1);
int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2);
tmp2 = vmlal_lane_s16(tmp2, z3_s16, consts.val[2], 1);
tmp3 = vmlal_lane_s16(tmp3, z3_s16, consts.val[0], 1);
z2_s16 = vmul_s16(row0, quant_row0);
z3_s16 = vmul_s16(row4, quant_row4);
int32x4_t tmp0 = vshll_n_s16(vadd_s16(z2_s16, z3_s16), CONST_BITS);
int32x4_t tmp1 = vshll_n_s16(vsub_s16(z2_s16, z3_s16), CONST_BITS);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp3);
int32x4_t tmp13 = vsubq_s32(tmp0, tmp3);
int32x4_t tmp11 = vaddq_s32(tmp1, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp1, tmp2);
/* Odd part */
int16x4_t tmp0_s16 = vmul_s16(row7, quant_row7);
int16x4_t tmp1_s16 = vmul_s16(row5, quant_row5);
int16x4_t tmp2_s16 = vmul_s16(row3, quant_row3);
int16x4_t tmp3_s16 = vmul_s16(row1, quant_row1);
z3_s16 = vadd_s16(tmp0_s16, tmp2_s16);
int16x4_t z4_s16 = vadd_s16(tmp1_s16, tmp3_s16);
/* Implementation as per jpeg_idct_islow() in jidctint.c:
* z5 = (z3 + z4) * 1.175875602;
* z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
* z3 += z5; z4 += z5;
*
* This implementation:
* z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
* z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
*/
int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3);
int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3);
z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3);
z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0);
/* Implementation as per jpeg_idct_islow() in jidctint.c:
* z1 = tmp0 + tmp3; z2 = tmp1 + tmp2;
* tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869;
* tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110;
* z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
* tmp0 += z1 + z3; tmp1 += z2 + z4;
* tmp2 += z2 + z3; tmp3 += z1 + z4;
*
* This implementation:
* tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223;
* tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447;
* tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447);
* tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223);
* tmp0 += z3; tmp1 += z4;
* tmp2 += z3; tmp3 += z4;
*/
tmp0 = vmull_lane_s16(tmp0_s16, consts.val[0], 3);
tmp1 = vmull_lane_s16(tmp1_s16, consts.val[1], 1);
tmp2 = vmull_lane_s16(tmp2_s16, consts.val[2], 2);
tmp3 = vmull_lane_s16(tmp3_s16, consts.val[1], 0);
tmp0 = vmlsl_lane_s16(tmp0, tmp3_s16, consts.val[0], 0);
tmp1 = vmlsl_lane_s16(tmp1, tmp2_s16, consts.val[0], 2);
tmp2 = vmlsl_lane_s16(tmp2, tmp1_s16, consts.val[0], 2);
tmp3 = vmlsl_lane_s16(tmp3, tmp0_s16, consts.val[0], 0);
tmp0 = vaddq_s32(tmp0, z3);
tmp1 = vaddq_s32(tmp1, z4);
tmp2 = vaddq_s32(tmp2, z3);
tmp3 = vaddq_s32(tmp3, z4);
/* Final output stage: descale and narrow to 16-bit. */
int16x4x4_t rows_0123 = { {
vrshrn_n_s32(vaddq_s32(tmp10, tmp3), DESCALE_P1),
vrshrn_n_s32(vaddq_s32(tmp11, tmp2), DESCALE_P1),
vrshrn_n_s32(vaddq_s32(tmp12, tmp1), DESCALE_P1),
vrshrn_n_s32(vaddq_s32(tmp13, tmp0), DESCALE_P1)
} };
int16x4x4_t rows_4567 = { {
vrshrn_n_s32(vsubq_s32(tmp13, tmp0), DESCALE_P1),
vrshrn_n_s32(vsubq_s32(tmp12, tmp1), DESCALE_P1),
vrshrn_n_s32(vsubq_s32(tmp11, tmp2), DESCALE_P1),
vrshrn_n_s32(vsubq_s32(tmp10, tmp3), DESCALE_P1)
} };
/* Store 4x4 blocks to the intermediate workspace, ready for the second pass.
* (VST4 transposes the blocks. We need to operate on rows in the next
* pass.)
*/
vst4_s16(workspace_1, rows_0123);
vst4_s16(workspace_2, rows_4567);
}
/* Perform dequantization and the first pass of the accurate inverse DCT on a
* 4x8 block of coefficients.
*
* This "sparse" version assumes that the AC coefficients in rows 4-7 are all
* 0. This simplifies the IDCT calculation, accelerating overall performance.
*/
static INLINE void jsimd_idct_islow_pass1_sparse(int16x4_t row0,
int16x4_t row1,
int16x4_t row2,
int16x4_t row3,
int16x4_t quant_row0,
int16x4_t quant_row1,
int16x4_t quant_row2,
int16x4_t quant_row3,
int16_t *workspace_1,
int16_t *workspace_2)
{
/* Load constants for IDCT computation. */
#ifdef HAVE_VLD1_S16_X3
const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts);
#else
const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts);
const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4);
const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8);
const int16x4x3_t consts = { { consts1, consts2, consts3 } };
#endif
/* Even part (z3 is all 0) */
int16x4_t z2_s16 = vmul_s16(row2, quant_row2);
int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1);
int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2);
z2_s16 = vmul_s16(row0, quant_row0);
int32x4_t tmp0 = vshll_n_s16(z2_s16, CONST_BITS);
int32x4_t tmp1 = vshll_n_s16(z2_s16, CONST_BITS);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp3);
int32x4_t tmp13 = vsubq_s32(tmp0, tmp3);
int32x4_t tmp11 = vaddq_s32(tmp1, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp1, tmp2);
/* Odd part (tmp0 and tmp1 are both all 0) */
int16x4_t tmp2_s16 = vmul_s16(row3, quant_row3);
int16x4_t tmp3_s16 = vmul_s16(row1, quant_row1);
int16x4_t z3_s16 = tmp2_s16;
int16x4_t z4_s16 = tmp3_s16;
int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3);
int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3);
z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3);
z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0);
tmp0 = vmlsl_lane_s16(z3, tmp3_s16, consts.val[0], 0);
tmp1 = vmlsl_lane_s16(z4, tmp2_s16, consts.val[0], 2);
tmp2 = vmlal_lane_s16(z3, tmp2_s16, consts.val[2], 2);
tmp3 = vmlal_lane_s16(z4, tmp3_s16, consts.val[1], 0);
/* Final output stage: descale and narrow to 16-bit. */
int16x4x4_t rows_0123 = { {
vrshrn_n_s32(vaddq_s32(tmp10, tmp3), DESCALE_P1),
vrshrn_n_s32(vaddq_s32(tmp11, tmp2), DESCALE_P1),
vrshrn_n_s32(vaddq_s32(tmp12, tmp1), DESCALE_P1),
vrshrn_n_s32(vaddq_s32(tmp13, tmp0), DESCALE_P1)
} };
int16x4x4_t rows_4567 = { {
vrshrn_n_s32(vsubq_s32(tmp13, tmp0), DESCALE_P1),
vrshrn_n_s32(vsubq_s32(tmp12, tmp1), DESCALE_P1),
vrshrn_n_s32(vsubq_s32(tmp11, tmp2), DESCALE_P1),
vrshrn_n_s32(vsubq_s32(tmp10, tmp3), DESCALE_P1)
} };
/* Store 4x4 blocks to the intermediate workspace, ready for the second pass.
* (VST4 transposes the blocks. We need to operate on rows in the next
* pass.)
*/
vst4_s16(workspace_1, rows_0123);
vst4_s16(workspace_2, rows_4567);
}
/* Perform the second pass of the accurate inverse DCT on a 4x8 block of
* coefficients. (To process the full 8x8 DCT block, this function-- or some
* other optimized variant-- needs to be called for both the right and left 4x8
* blocks.)
*
* This "regular" version assumes that no optimization can be made to the IDCT
* calculation, since no useful set of coefficient values are all 0 after the
* first pass.
*
* Again, the original C implementation of the accurate IDCT (jpeg_idct_slow())
* can be found in jidctint.c. Algorithmic changes made here are documented
* inline.
*/
static INLINE void jsimd_idct_islow_pass2_regular(int16_t *workspace,
JSAMPARRAY output_buf,
JDIMENSION output_col,
unsigned buf_offset)
{
/* Load constants for IDCT computation. */
#ifdef HAVE_VLD1_S16_X3
const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts);
#else
const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts);
const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4);
const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8);
const int16x4x3_t consts = { { consts1, consts2, consts3 } };
#endif
/* Even part */
int16x4_t z2_s16 = vld1_s16(workspace + 2 * DCTSIZE / 2);
int16x4_t z3_s16 = vld1_s16(workspace + 6 * DCTSIZE / 2);
int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1);
int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2);
tmp2 = vmlal_lane_s16(tmp2, z3_s16, consts.val[2], 1);
tmp3 = vmlal_lane_s16(tmp3, z3_s16, consts.val[0], 1);
z2_s16 = vld1_s16(workspace + 0 * DCTSIZE / 2);
z3_s16 = vld1_s16(workspace + 4 * DCTSIZE / 2);
int32x4_t tmp0 = vshll_n_s16(vadd_s16(z2_s16, z3_s16), CONST_BITS);
int32x4_t tmp1 = vshll_n_s16(vsub_s16(z2_s16, z3_s16), CONST_BITS);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp3);
int32x4_t tmp13 = vsubq_s32(tmp0, tmp3);
int32x4_t tmp11 = vaddq_s32(tmp1, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp1, tmp2);
/* Odd part */
int16x4_t tmp0_s16 = vld1_s16(workspace + 7 * DCTSIZE / 2);
int16x4_t tmp1_s16 = vld1_s16(workspace + 5 * DCTSIZE / 2);
int16x4_t tmp2_s16 = vld1_s16(workspace + 3 * DCTSIZE / 2);
int16x4_t tmp3_s16 = vld1_s16(workspace + 1 * DCTSIZE / 2);
z3_s16 = vadd_s16(tmp0_s16, tmp2_s16);
int16x4_t z4_s16 = vadd_s16(tmp1_s16, tmp3_s16);
/* Implementation as per jpeg_idct_islow() in jidctint.c:
* z5 = (z3 + z4) * 1.175875602;
* z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
* z3 += z5; z4 += z5;
*
* This implementation:
* z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
* z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
*/
int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3);
int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3);
z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3);
z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0);
/* Implementation as per jpeg_idct_islow() in jidctint.c:
* z1 = tmp0 + tmp3; z2 = tmp1 + tmp2;
* tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869;
* tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110;
* z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
* tmp0 += z1 + z3; tmp1 += z2 + z4;
* tmp2 += z2 + z3; tmp3 += z1 + z4;
*
* This implementation:
* tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223;
* tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447;
* tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447);
* tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223);
* tmp0 += z3; tmp1 += z4;
* tmp2 += z3; tmp3 += z4;
*/
tmp0 = vmull_lane_s16(tmp0_s16, consts.val[0], 3);
tmp1 = vmull_lane_s16(tmp1_s16, consts.val[1], 1);
tmp2 = vmull_lane_s16(tmp2_s16, consts.val[2], 2);
tmp3 = vmull_lane_s16(tmp3_s16, consts.val[1], 0);
tmp0 = vmlsl_lane_s16(tmp0, tmp3_s16, consts.val[0], 0);
tmp1 = vmlsl_lane_s16(tmp1, tmp2_s16, consts.val[0], 2);
tmp2 = vmlsl_lane_s16(tmp2, tmp1_s16, consts.val[0], 2);
tmp3 = vmlsl_lane_s16(tmp3, tmp0_s16, consts.val[0], 0);
tmp0 = vaddq_s32(tmp0, z3);
tmp1 = vaddq_s32(tmp1, z4);
tmp2 = vaddq_s32(tmp2, z3);
tmp3 = vaddq_s32(tmp3, z4);
/* Final output stage: descale and narrow to 16-bit. */
int16x8_t cols_02_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp3),
vaddhn_s32(tmp12, tmp1));
int16x8_t cols_13_s16 = vcombine_s16(vaddhn_s32(tmp11, tmp2),
vaddhn_s32(tmp13, tmp0));
int16x8_t cols_46_s16 = vcombine_s16(vsubhn_s32(tmp13, tmp0),
vsubhn_s32(tmp11, tmp2));
int16x8_t cols_57_s16 = vcombine_s16(vsubhn_s32(tmp12, tmp1),
vsubhn_s32(tmp10, tmp3));
/* Descale and narrow to 8-bit. */
int8x8_t cols_02_s8 = vqrshrn_n_s16(cols_02_s16, DESCALE_P2 - 16);
int8x8_t cols_13_s8 = vqrshrn_n_s16(cols_13_s16, DESCALE_P2 - 16);
int8x8_t cols_46_s8 = vqrshrn_n_s16(cols_46_s16, DESCALE_P2 - 16);
int8x8_t cols_57_s8 = vqrshrn_n_s16(cols_57_s16, DESCALE_P2 - 16);
/* Clamp to range [0-255]. */
uint8x8_t cols_02_u8 = vadd_u8(vreinterpret_u8_s8(cols_02_s8),
vdup_n_u8(CENTERJSAMPLE));
uint8x8_t cols_13_u8 = vadd_u8(vreinterpret_u8_s8(cols_13_s8),
vdup_n_u8(CENTERJSAMPLE));
uint8x8_t cols_46_u8 = vadd_u8(vreinterpret_u8_s8(cols_46_s8),
vdup_n_u8(CENTERJSAMPLE));
uint8x8_t cols_57_u8 = vadd_u8(vreinterpret_u8_s8(cols_57_s8),
vdup_n_u8(CENTERJSAMPLE));
/* Transpose 4x8 block and store to memory. (Zipping adjacent columns
* together allows us to store 16-bit elements.)
*/
uint8x8x2_t cols_01_23 = vzip_u8(cols_02_u8, cols_13_u8);
uint8x8x2_t cols_45_67 = vzip_u8(cols_46_u8, cols_57_u8);
uint16x4x4_t cols_01_23_45_67 = { {
vreinterpret_u16_u8(cols_01_23.val[0]),
vreinterpret_u16_u8(cols_01_23.val[1]),
vreinterpret_u16_u8(cols_45_67.val[0]),
vreinterpret_u16_u8(cols_45_67.val[1])
} };
JSAMPROW outptr0 = output_buf[buf_offset + 0] + output_col;
JSAMPROW outptr1 = output_buf[buf_offset + 1] + output_col;
JSAMPROW outptr2 = output_buf[buf_offset + 2] + output_col;
JSAMPROW outptr3 = output_buf[buf_offset + 3] + output_col;
/* VST4 of 16-bit elements completes the transpose. */
vst4_lane_u16((uint16_t *)outptr0, cols_01_23_45_67, 0);
vst4_lane_u16((uint16_t *)outptr1, cols_01_23_45_67, 1);
vst4_lane_u16((uint16_t *)outptr2, cols_01_23_45_67, 2);
vst4_lane_u16((uint16_t *)outptr3, cols_01_23_45_67, 3);
}
/* Performs the second pass of the accurate inverse DCT on a 4x8 block
* of coefficients.
*
* This "sparse" version assumes that the coefficient values (after the first
* pass) in rows 4-7 are all 0. This simplifies the IDCT calculation,
* accelerating overall performance.
*/
static INLINE void jsimd_idct_islow_pass2_sparse(int16_t *workspace,
JSAMPARRAY output_buf,
JDIMENSION output_col,
unsigned buf_offset)
{
/* Load constants for IDCT computation. */
#ifdef HAVE_VLD1_S16_X3
const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_islow_neon_consts);
#else
const int16x4_t consts1 = vld1_s16(jsimd_idct_islow_neon_consts);
const int16x4_t consts2 = vld1_s16(jsimd_idct_islow_neon_consts + 4);
const int16x4_t consts3 = vld1_s16(jsimd_idct_islow_neon_consts + 8);
const int16x4x3_t consts = { { consts1, consts2, consts3 } };
#endif
/* Even part (z3 is all 0) */
int16x4_t z2_s16 = vld1_s16(workspace + 2 * DCTSIZE / 2);
int32x4_t tmp2 = vmull_lane_s16(z2_s16, consts.val[0], 1);
int32x4_t tmp3 = vmull_lane_s16(z2_s16, consts.val[1], 2);
z2_s16 = vld1_s16(workspace + 0 * DCTSIZE / 2);
int32x4_t tmp0 = vshll_n_s16(z2_s16, CONST_BITS);
int32x4_t tmp1 = vshll_n_s16(z2_s16, CONST_BITS);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp3);
int32x4_t tmp13 = vsubq_s32(tmp0, tmp3);
int32x4_t tmp11 = vaddq_s32(tmp1, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp1, tmp2);
/* Odd part (tmp0 and tmp1 are both all 0) */
int16x4_t tmp2_s16 = vld1_s16(workspace + 3 * DCTSIZE / 2);
int16x4_t tmp3_s16 = vld1_s16(workspace + 1 * DCTSIZE / 2);
int16x4_t z3_s16 = tmp2_s16;
int16x4_t z4_s16 = tmp3_s16;
int32x4_t z3 = vmull_lane_s16(z3_s16, consts.val[2], 3);
z3 = vmlal_lane_s16(z3, z4_s16, consts.val[1], 3);
int32x4_t z4 = vmull_lane_s16(z3_s16, consts.val[1], 3);
z4 = vmlal_lane_s16(z4, z4_s16, consts.val[2], 0);
tmp0 = vmlsl_lane_s16(z3, tmp3_s16, consts.val[0], 0);
tmp1 = vmlsl_lane_s16(z4, tmp2_s16, consts.val[0], 2);
tmp2 = vmlal_lane_s16(z3, tmp2_s16, consts.val[2], 2);
tmp3 = vmlal_lane_s16(z4, tmp3_s16, consts.val[1], 0);
/* Final output stage: descale and narrow to 16-bit. */
int16x8_t cols_02_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp3),
vaddhn_s32(tmp12, tmp1));
int16x8_t cols_13_s16 = vcombine_s16(vaddhn_s32(tmp11, tmp2),
vaddhn_s32(tmp13, tmp0));
int16x8_t cols_46_s16 = vcombine_s16(vsubhn_s32(tmp13, tmp0),
vsubhn_s32(tmp11, tmp2));
int16x8_t cols_57_s16 = vcombine_s16(vsubhn_s32(tmp12, tmp1),
vsubhn_s32(tmp10, tmp3));
/* Descale and narrow to 8-bit. */
int8x8_t cols_02_s8 = vqrshrn_n_s16(cols_02_s16, DESCALE_P2 - 16);
int8x8_t cols_13_s8 = vqrshrn_n_s16(cols_13_s16, DESCALE_P2 - 16);
int8x8_t cols_46_s8 = vqrshrn_n_s16(cols_46_s16, DESCALE_P2 - 16);
int8x8_t cols_57_s8 = vqrshrn_n_s16(cols_57_s16, DESCALE_P2 - 16);
/* Clamp to range [0-255]. */
uint8x8_t cols_02_u8 = vadd_u8(vreinterpret_u8_s8(cols_02_s8),
vdup_n_u8(CENTERJSAMPLE));
uint8x8_t cols_13_u8 = vadd_u8(vreinterpret_u8_s8(cols_13_s8),
vdup_n_u8(CENTERJSAMPLE));
uint8x8_t cols_46_u8 = vadd_u8(vreinterpret_u8_s8(cols_46_s8),
vdup_n_u8(CENTERJSAMPLE));
uint8x8_t cols_57_u8 = vadd_u8(vreinterpret_u8_s8(cols_57_s8),
vdup_n_u8(CENTERJSAMPLE));
/* Transpose 4x8 block and store to memory. (Zipping adjacent columns
* together allows us to store 16-bit elements.)
*/
uint8x8x2_t cols_01_23 = vzip_u8(cols_02_u8, cols_13_u8);
uint8x8x2_t cols_45_67 = vzip_u8(cols_46_u8, cols_57_u8);
uint16x4x4_t cols_01_23_45_67 = { {
vreinterpret_u16_u8(cols_01_23.val[0]),
vreinterpret_u16_u8(cols_01_23.val[1]),
vreinterpret_u16_u8(cols_45_67.val[0]),
vreinterpret_u16_u8(cols_45_67.val[1])
} };
JSAMPROW outptr0 = output_buf[buf_offset + 0] + output_col;
JSAMPROW outptr1 = output_buf[buf_offset + 1] + output_col;
JSAMPROW outptr2 = output_buf[buf_offset + 2] + output_col;
JSAMPROW outptr3 = output_buf[buf_offset + 3] + output_col;
/* VST4 of 16-bit elements completes the transpose. */
vst4_lane_u16((uint16_t *)outptr0, cols_01_23_45_67, 0);
vst4_lane_u16((uint16_t *)outptr1, cols_01_23_45_67, 1);
vst4_lane_u16((uint16_t *)outptr2, cols_01_23_45_67, 2);
vst4_lane_u16((uint16_t *)outptr3, cols_01_23_45_67, 3);
}

486
simd/arm/jidctred-neon.c Normal file
View File

@@ -0,0 +1,486 @@
/*
* jidctred-neon.c - reduced-size IDCT (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include "align.h"
#include "neon-compat.h"
#include <arm_neon.h>
#define CONST_BITS 13
#define PASS1_BITS 2
#define F_0_211 1730
#define F_0_509 4176
#define F_0_601 4926
#define F_0_720 5906
#define F_0_765 6270
#define F_0_850 6967
#define F_0_899 7373
#define F_1_061 8697
#define F_1_272 10426
#define F_1_451 11893
#define F_1_847 15137
#define F_2_172 17799
#define F_2_562 20995
#define F_3_624 29692
/* jsimd_idct_2x2_neon() is an inverse DCT function that produces reduced-size
* 2x2 output from an 8x8 DCT block. It uses the same calculations and
* produces exactly the same output as IJG's original jpeg_idct_2x2() function
* from jpeg-6b, which can be found in jidctred.c.
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.720959822 = 5906 * 2^-13
* 0.850430095 = 6967 * 2^-13
* 1.272758580 = 10426 * 2^-13
* 3.624509785 = 29692 * 2^-13
*
* See jidctred.c for further details of the 2x2 IDCT algorithm. Where
* possible, the variable names and comments here in jsimd_idct_2x2_neon()
* match up with those in jpeg_idct_2x2().
*/
ALIGN(16) static const int16_t jsimd_idct_2x2_neon_consts[] = {
-F_0_720, F_0_850, -F_1_272, F_3_624
};
void jsimd_idct_2x2_neon(void *dct_table, JCOEFPTR coef_block,
JSAMPARRAY output_buf, JDIMENSION output_col)
{
ISLOW_MULT_TYPE *quantptr = dct_table;
/* Load DCT coefficients. */
int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE);
int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE);
int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE);
int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE);
int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE);
/* Load quantization table values. */
int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE);
int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE);
int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE);
int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE);
int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE);
/* Dequantize DCT coefficients. */
row0 = vmulq_s16(row0, quant_row0);
row1 = vmulq_s16(row1, quant_row1);
row3 = vmulq_s16(row3, quant_row3);
row5 = vmulq_s16(row5, quant_row5);
row7 = vmulq_s16(row7, quant_row7);
/* Load IDCT conversion constants. */
const int16x4_t consts = vld1_s16(jsimd_idct_2x2_neon_consts);
/* Pass 1: process columns from input, put results in vectors row0 and
* row1.
*/
/* Even part */
int32x4_t tmp10_l = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 2);
int32x4_t tmp10_h = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 2);
/* Odd part */
int32x4_t tmp0_l = vmull_lane_s16(vget_low_s16(row1), consts, 3);
tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(row3), consts, 2);
tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(row5), consts, 1);
tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(row7), consts, 0);
int32x4_t tmp0_h = vmull_lane_s16(vget_high_s16(row1), consts, 3);
tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(row3), consts, 2);
tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(row5), consts, 1);
tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(row7), consts, 0);
/* Final output stage: descale and narrow to 16-bit. */
row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10_l, tmp0_l), CONST_BITS),
vrshrn_n_s32(vaddq_s32(tmp10_h, tmp0_h), CONST_BITS));
row1 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10_l, tmp0_l), CONST_BITS),
vrshrn_n_s32(vsubq_s32(tmp10_h, tmp0_h), CONST_BITS));
/* Transpose two rows, ready for second pass. */
int16x8x2_t cols_0246_1357 = vtrnq_s16(row0, row1);
int16x8_t cols_0246 = cols_0246_1357.val[0];
int16x8_t cols_1357 = cols_0246_1357.val[1];
/* Duplicate columns such that each is accessible in its own vector. */
int32x4x2_t cols_1155_3377 = vtrnq_s32(vreinterpretq_s32_s16(cols_1357),
vreinterpretq_s32_s16(cols_1357));
int16x8_t cols_1155 = vreinterpretq_s16_s32(cols_1155_3377.val[0]);
int16x8_t cols_3377 = vreinterpretq_s16_s32(cols_1155_3377.val[1]);
/* Pass 2: process two rows, store to output array. */
/* Even part: we're only interested in col0; the top half of tmp10 is "don't
* care."
*/
int32x4_t tmp10 = vshll_n_s16(vget_low_s16(cols_0246), CONST_BITS + 2);
/* Odd part: we're only interested in the bottom half of tmp0. */
int32x4_t tmp0 = vmull_lane_s16(vget_low_s16(cols_1155), consts, 3);
tmp0 = vmlal_lane_s16(tmp0, vget_low_s16(cols_3377), consts, 2);
tmp0 = vmlal_lane_s16(tmp0, vget_high_s16(cols_1155), consts, 1);
tmp0 = vmlal_lane_s16(tmp0, vget_high_s16(cols_3377), consts, 0);
/* Final output stage: descale and clamp to range [0-255]. */
int16x8_t output_s16 = vcombine_s16(vaddhn_s32(tmp10, tmp0),
vsubhn_s32(tmp10, tmp0));
output_s16 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_s16,
CONST_BITS + PASS1_BITS + 3 + 2 - 16);
/* Narrow to 8-bit and convert to unsigned. */
uint8x8_t output_u8 = vqmovun_s16(output_s16);
/* Store 2x2 block to memory. */
vst1_lane_u8(output_buf[0] + output_col, output_u8, 0);
vst1_lane_u8(output_buf[1] + output_col, output_u8, 1);
vst1_lane_u8(output_buf[0] + output_col + 1, output_u8, 4);
vst1_lane_u8(output_buf[1] + output_col + 1, output_u8, 5);
}
/* jsimd_idct_4x4_neon() is an inverse DCT function that produces reduced-size
* 4x4 output from an 8x8 DCT block. It uses the same calculations and
* produces exactly the same output as IJG's original jpeg_idct_4x4() function
* from jpeg-6b, which can be found in jidctred.c.
*
* Scaled integer constants are used to avoid floating-point arithmetic:
* 0.211164243 = 1730 * 2^-13
* 0.509795579 = 4176 * 2^-13
* 0.601344887 = 4926 * 2^-13
* 0.765366865 = 6270 * 2^-13
* 0.899976223 = 7373 * 2^-13
* 1.061594337 = 8697 * 2^-13
* 1.451774981 = 11893 * 2^-13
* 1.847759065 = 15137 * 2^-13
* 2.172734803 = 17799 * 2^-13
* 2.562915447 = 20995 * 2^-13
*
* See jidctred.c for further details of the 4x4 IDCT algorithm. Where
* possible, the variable names and comments here in jsimd_idct_4x4_neon()
* match up with those in jpeg_idct_4x4().
*/
ALIGN(16) static const int16_t jsimd_idct_4x4_neon_consts[] = {
F_1_847, -F_0_765, -F_0_211, F_1_451,
-F_2_172, F_1_061, -F_0_509, -F_0_601,
F_0_899, F_2_562, 0, 0
};
void jsimd_idct_4x4_neon(void *dct_table, JCOEFPTR coef_block,
JSAMPARRAY output_buf, JDIMENSION output_col)
{
ISLOW_MULT_TYPE *quantptr = dct_table;
/* Load DCT coefficients. */
int16x8_t row0 = vld1q_s16(coef_block + 0 * DCTSIZE);
int16x8_t row1 = vld1q_s16(coef_block + 1 * DCTSIZE);
int16x8_t row2 = vld1q_s16(coef_block + 2 * DCTSIZE);
int16x8_t row3 = vld1q_s16(coef_block + 3 * DCTSIZE);
int16x8_t row5 = vld1q_s16(coef_block + 5 * DCTSIZE);
int16x8_t row6 = vld1q_s16(coef_block + 6 * DCTSIZE);
int16x8_t row7 = vld1q_s16(coef_block + 7 * DCTSIZE);
/* Load quantization table values for DC coefficients. */
int16x8_t quant_row0 = vld1q_s16(quantptr + 0 * DCTSIZE);
/* Dequantize DC coefficients. */
row0 = vmulq_s16(row0, quant_row0);
/* Construct bitmap to test if all AC coefficients are 0. */
int16x8_t bitmap = vorrq_s16(row1, row2);
bitmap = vorrq_s16(bitmap, row3);
bitmap = vorrq_s16(bitmap, row5);
bitmap = vorrq_s16(bitmap, row6);
bitmap = vorrq_s16(bitmap, row7);
int64_t left_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 0);
int64_t right_ac_bitmap = vgetq_lane_s64(vreinterpretq_s64_s16(bitmap), 1);
/* Load constants for IDCT computation. */
#ifdef HAVE_VLD1_S16_X3
const int16x4x3_t consts = vld1_s16_x3(jsimd_idct_4x4_neon_consts);
#else
/* GCC does not currently support the intrinsic vld1_<type>_x3(). */
const int16x4_t consts1 = vld1_s16(jsimd_idct_4x4_neon_consts);
const int16x4_t consts2 = vld1_s16(jsimd_idct_4x4_neon_consts + 4);
const int16x4_t consts3 = vld1_s16(jsimd_idct_4x4_neon_consts + 8);
const int16x4x3_t consts = { { consts1, consts2, consts3 } };
#endif
if (left_ac_bitmap == 0 && right_ac_bitmap == 0) {
/* All AC coefficients are zero.
* Compute DC values and duplicate into row vectors 0, 1, 2, and 3.
*/
int16x8_t dcval = vshlq_n_s16(row0, PASS1_BITS);
row0 = dcval;
row1 = dcval;
row2 = dcval;
row3 = dcval;
} else if (left_ac_bitmap == 0) {
/* AC coefficients are zero for columns 0, 1, 2, and 3.
* Compute DC values for these columns.
*/
int16x4_t dcval = vshl_n_s16(vget_low_s16(row0), PASS1_BITS);
/* Commence regular IDCT computation for columns 4, 5, 6, and 7. */
/* Load quantization table. */
int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE + 4);
int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE + 4);
int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE + 4);
int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE + 4);
int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE + 4);
int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE + 4);
/* Even part */
int32x4_t tmp0 = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 1);
int16x4_t z2 = vmul_s16(vget_high_s16(row2), quant_row2);
int16x4_t z3 = vmul_s16(vget_high_s16(row6), quant_row6);
int32x4_t tmp2 = vmull_lane_s16(z2, consts.val[0], 0);
tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[0], 1);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp0, tmp2);
/* Odd part */
int16x4_t z1 = vmul_s16(vget_high_s16(row7), quant_row7);
z2 = vmul_s16(vget_high_s16(row5), quant_row5);
z3 = vmul_s16(vget_high_s16(row3), quant_row3);
int16x4_t z4 = vmul_s16(vget_high_s16(row1), quant_row1);
tmp0 = vmull_lane_s16(z1, consts.val[0], 2);
tmp0 = vmlal_lane_s16(tmp0, z2, consts.val[0], 3);
tmp0 = vmlal_lane_s16(tmp0, z3, consts.val[1], 0);
tmp0 = vmlal_lane_s16(tmp0, z4, consts.val[1], 1);
tmp2 = vmull_lane_s16(z1, consts.val[1], 2);
tmp2 = vmlal_lane_s16(tmp2, z2, consts.val[1], 3);
tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[2], 0);
tmp2 = vmlal_lane_s16(tmp2, z4, consts.val[2], 1);
/* Final output stage: descale and narrow to 16-bit. */
row0 = vcombine_s16(dcval, vrshrn_n_s32(vaddq_s32(tmp10, tmp2),
CONST_BITS - PASS1_BITS + 1));
row3 = vcombine_s16(dcval, vrshrn_n_s32(vsubq_s32(tmp10, tmp2),
CONST_BITS - PASS1_BITS + 1));
row1 = vcombine_s16(dcval, vrshrn_n_s32(vaddq_s32(tmp12, tmp0),
CONST_BITS - PASS1_BITS + 1));
row2 = vcombine_s16(dcval, vrshrn_n_s32(vsubq_s32(tmp12, tmp0),
CONST_BITS - PASS1_BITS + 1));
} else if (right_ac_bitmap == 0) {
/* AC coefficients are zero for columns 4, 5, 6, and 7.
* Compute DC values for these columns.
*/
int16x4_t dcval = vshl_n_s16(vget_high_s16(row0), PASS1_BITS);
/* Commence regular IDCT computation for columns 0, 1, 2, and 3. */
/* Load quantization table. */
int16x4_t quant_row1 = vld1_s16(quantptr + 1 * DCTSIZE);
int16x4_t quant_row2 = vld1_s16(quantptr + 2 * DCTSIZE);
int16x4_t quant_row3 = vld1_s16(quantptr + 3 * DCTSIZE);
int16x4_t quant_row5 = vld1_s16(quantptr + 5 * DCTSIZE);
int16x4_t quant_row6 = vld1_s16(quantptr + 6 * DCTSIZE);
int16x4_t quant_row7 = vld1_s16(quantptr + 7 * DCTSIZE);
/* Even part */
int32x4_t tmp0 = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 1);
int16x4_t z2 = vmul_s16(vget_low_s16(row2), quant_row2);
int16x4_t z3 = vmul_s16(vget_low_s16(row6), quant_row6);
int32x4_t tmp2 = vmull_lane_s16(z2, consts.val[0], 0);
tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[0], 1);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp0, tmp2);
/* Odd part */
int16x4_t z1 = vmul_s16(vget_low_s16(row7), quant_row7);
z2 = vmul_s16(vget_low_s16(row5), quant_row5);
z3 = vmul_s16(vget_low_s16(row3), quant_row3);
int16x4_t z4 = vmul_s16(vget_low_s16(row1), quant_row1);
tmp0 = vmull_lane_s16(z1, consts.val[0], 2);
tmp0 = vmlal_lane_s16(tmp0, z2, consts.val[0], 3);
tmp0 = vmlal_lane_s16(tmp0, z3, consts.val[1], 0);
tmp0 = vmlal_lane_s16(tmp0, z4, consts.val[1], 1);
tmp2 = vmull_lane_s16(z1, consts.val[1], 2);
tmp2 = vmlal_lane_s16(tmp2, z2, consts.val[1], 3);
tmp2 = vmlal_lane_s16(tmp2, z3, consts.val[2], 0);
tmp2 = vmlal_lane_s16(tmp2, z4, consts.val[2], 1);
/* Final output stage: descale and narrow to 16-bit. */
row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10, tmp2),
CONST_BITS - PASS1_BITS + 1), dcval);
row3 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10, tmp2),
CONST_BITS - PASS1_BITS + 1), dcval);
row1 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp12, tmp0),
CONST_BITS - PASS1_BITS + 1), dcval);
row2 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp12, tmp0),
CONST_BITS - PASS1_BITS + 1), dcval);
} else {
/* All AC coefficients are non-zero; full IDCT calculation required. */
int16x8_t quant_row1 = vld1q_s16(quantptr + 1 * DCTSIZE);
int16x8_t quant_row2 = vld1q_s16(quantptr + 2 * DCTSIZE);
int16x8_t quant_row3 = vld1q_s16(quantptr + 3 * DCTSIZE);
int16x8_t quant_row5 = vld1q_s16(quantptr + 5 * DCTSIZE);
int16x8_t quant_row6 = vld1q_s16(quantptr + 6 * DCTSIZE);
int16x8_t quant_row7 = vld1q_s16(quantptr + 7 * DCTSIZE);
/* Even part */
int32x4_t tmp0_l = vshll_n_s16(vget_low_s16(row0), CONST_BITS + 1);
int32x4_t tmp0_h = vshll_n_s16(vget_high_s16(row0), CONST_BITS + 1);
int16x8_t z2 = vmulq_s16(row2, quant_row2);
int16x8_t z3 = vmulq_s16(row6, quant_row6);
int32x4_t tmp2_l = vmull_lane_s16(vget_low_s16(z2), consts.val[0], 0);
int32x4_t tmp2_h = vmull_lane_s16(vget_high_s16(z2), consts.val[0], 0);
tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z3), consts.val[0], 1);
tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z3), consts.val[0], 1);
int32x4_t tmp10_l = vaddq_s32(tmp0_l, tmp2_l);
int32x4_t tmp10_h = vaddq_s32(tmp0_h, tmp2_h);
int32x4_t tmp12_l = vsubq_s32(tmp0_l, tmp2_l);
int32x4_t tmp12_h = vsubq_s32(tmp0_h, tmp2_h);
/* Odd part */
int16x8_t z1 = vmulq_s16(row7, quant_row7);
z2 = vmulq_s16(row5, quant_row5);
z3 = vmulq_s16(row3, quant_row3);
int16x8_t z4 = vmulq_s16(row1, quant_row1);
tmp0_l = vmull_lane_s16(vget_low_s16(z1), consts.val[0], 2);
tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z2), consts.val[0], 3);
tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z3), consts.val[1], 0);
tmp0_l = vmlal_lane_s16(tmp0_l, vget_low_s16(z4), consts.val[1], 1);
tmp0_h = vmull_lane_s16(vget_high_s16(z1), consts.val[0], 2);
tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z2), consts.val[0], 3);
tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z3), consts.val[1], 0);
tmp0_h = vmlal_lane_s16(tmp0_h, vget_high_s16(z4), consts.val[1], 1);
tmp2_l = vmull_lane_s16(vget_low_s16(z1), consts.val[1], 2);
tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z2), consts.val[1], 3);
tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z3), consts.val[2], 0);
tmp2_l = vmlal_lane_s16(tmp2_l, vget_low_s16(z4), consts.val[2], 1);
tmp2_h = vmull_lane_s16(vget_high_s16(z1), consts.val[1], 2);
tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z2), consts.val[1], 3);
tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z3), consts.val[2], 0);
tmp2_h = vmlal_lane_s16(tmp2_h, vget_high_s16(z4), consts.val[2], 1);
/* Final output stage: descale and narrow to 16-bit. */
row0 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp10_l, tmp2_l),
CONST_BITS - PASS1_BITS + 1),
vrshrn_n_s32(vaddq_s32(tmp10_h, tmp2_h),
CONST_BITS - PASS1_BITS + 1));
row3 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp10_l, tmp2_l),
CONST_BITS - PASS1_BITS + 1),
vrshrn_n_s32(vsubq_s32(tmp10_h, tmp2_h),
CONST_BITS - PASS1_BITS + 1));
row1 = vcombine_s16(vrshrn_n_s32(vaddq_s32(tmp12_l, tmp0_l),
CONST_BITS - PASS1_BITS + 1),
vrshrn_n_s32(vaddq_s32(tmp12_h, tmp0_h),
CONST_BITS - PASS1_BITS + 1));
row2 = vcombine_s16(vrshrn_n_s32(vsubq_s32(tmp12_l, tmp0_l),
CONST_BITS - PASS1_BITS + 1),
vrshrn_n_s32(vsubq_s32(tmp12_h, tmp0_h),
CONST_BITS - PASS1_BITS + 1));
}
/* Transpose 8x4 block to perform IDCT on rows in second pass. */
int16x8x2_t row_01 = vtrnq_s16(row0, row1);
int16x8x2_t row_23 = vtrnq_s16(row2, row3);
int32x4x2_t cols_0426 = vtrnq_s32(vreinterpretq_s32_s16(row_01.val[0]),
vreinterpretq_s32_s16(row_23.val[0]));
int32x4x2_t cols_1537 = vtrnq_s32(vreinterpretq_s32_s16(row_01.val[1]),
vreinterpretq_s32_s16(row_23.val[1]));
int16x4_t col0 = vreinterpret_s16_s32(vget_low_s32(cols_0426.val[0]));
int16x4_t col1 = vreinterpret_s16_s32(vget_low_s32(cols_1537.val[0]));
int16x4_t col2 = vreinterpret_s16_s32(vget_low_s32(cols_0426.val[1]));
int16x4_t col3 = vreinterpret_s16_s32(vget_low_s32(cols_1537.val[1]));
int16x4_t col5 = vreinterpret_s16_s32(vget_high_s32(cols_1537.val[0]));
int16x4_t col6 = vreinterpret_s16_s32(vget_high_s32(cols_0426.val[1]));
int16x4_t col7 = vreinterpret_s16_s32(vget_high_s32(cols_1537.val[1]));
/* Commence second pass of IDCT. */
/* Even part */
int32x4_t tmp0 = vshll_n_s16(col0, CONST_BITS + 1);
int32x4_t tmp2 = vmull_lane_s16(col2, consts.val[0], 0);
tmp2 = vmlal_lane_s16(tmp2, col6, consts.val[0], 1);
int32x4_t tmp10 = vaddq_s32(tmp0, tmp2);
int32x4_t tmp12 = vsubq_s32(tmp0, tmp2);
/* Odd part */
tmp0 = vmull_lane_s16(col7, consts.val[0], 2);
tmp0 = vmlal_lane_s16(tmp0, col5, consts.val[0], 3);
tmp0 = vmlal_lane_s16(tmp0, col3, consts.val[1], 0);
tmp0 = vmlal_lane_s16(tmp0, col1, consts.val[1], 1);
tmp2 = vmull_lane_s16(col7, consts.val[1], 2);
tmp2 = vmlal_lane_s16(tmp2, col5, consts.val[1], 3);
tmp2 = vmlal_lane_s16(tmp2, col3, consts.val[2], 0);
tmp2 = vmlal_lane_s16(tmp2, col1, consts.val[2], 1);
/* Final output stage: descale and clamp to range [0-255]. */
int16x8_t output_cols_02 = vcombine_s16(vaddhn_s32(tmp10, tmp2),
vsubhn_s32(tmp12, tmp0));
int16x8_t output_cols_13 = vcombine_s16(vaddhn_s32(tmp12, tmp0),
vsubhn_s32(tmp10, tmp2));
output_cols_02 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_cols_02,
CONST_BITS + PASS1_BITS + 3 + 1 - 16);
output_cols_13 = vrsraq_n_s16(vdupq_n_s16(CENTERJSAMPLE), output_cols_13,
CONST_BITS + PASS1_BITS + 3 + 1 - 16);
/* Narrow to 8-bit and convert to unsigned while zipping 8-bit elements.
* An interleaving store completes the transpose.
*/
uint8x8x2_t output_0123 = vzip_u8(vqmovun_s16(output_cols_02),
vqmovun_s16(output_cols_13));
uint16x4x2_t output_01_23 = { {
vreinterpret_u16_u8(output_0123.val[0]),
vreinterpret_u16_u8(output_0123.val[1])
} };
/* Store 4x4 block to memory. */
JSAMPROW outptr0 = output_buf[0] + output_col;
JSAMPROW outptr1 = output_buf[1] + output_col;
JSAMPROW outptr2 = output_buf[2] + output_col;
JSAMPROW outptr3 = output_buf[3] + output_col;
vst2_lane_u16((uint16_t *)outptr0, output_01_23, 0);
vst2_lane_u16((uint16_t *)outptr1, output_01_23, 1);
vst2_lane_u16((uint16_t *)outptr2, output_01_23, 2);
vst2_lane_u16((uint16_t *)outptr3, output_01_23, 3);
}

190
simd/arm/jquanti-neon.c Normal file
View File

@@ -0,0 +1,190 @@
/*
* jquanti-neon.c - sample data conversion and quantization (Arm Neon)
*
* Copyright (C) 2020, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#define JPEG_INTERNALS
#include "../../jinclude.h"
#include "../../jpeglib.h"
#include "../../jsimd.h"
#include "../../jdct.h"
#include "../../jsimddct.h"
#include "../jsimd.h"
#include <arm_neon.h>
/* After downsampling, the resulting sample values are in the range [0, 255],
* but the Discrete Cosine Transform (DCT) operates on values centered around
* 0.
*
* To prepare sample values for the DCT, load samples into a DCT workspace,
* subtracting CENTERJSAMPLE (128). The samples, now in the range [-128, 127],
* are also widened from 8- to 16-bit.
*
* The equivalent scalar C function convsamp() can be found in jcdctmgr.c.
*/
void jsimd_convsamp_neon(JSAMPARRAY sample_data, JDIMENSION start_col,
DCTELEM *workspace)
{
uint8x8_t samp_row0 = vld1_u8(sample_data[0] + start_col);
uint8x8_t samp_row1 = vld1_u8(sample_data[1] + start_col);
uint8x8_t samp_row2 = vld1_u8(sample_data[2] + start_col);
uint8x8_t samp_row3 = vld1_u8(sample_data[3] + start_col);
uint8x8_t samp_row4 = vld1_u8(sample_data[4] + start_col);
uint8x8_t samp_row5 = vld1_u8(sample_data[5] + start_col);
uint8x8_t samp_row6 = vld1_u8(sample_data[6] + start_col);
uint8x8_t samp_row7 = vld1_u8(sample_data[7] + start_col);
int16x8_t row0 =
vreinterpretq_s16_u16(vsubl_u8(samp_row0, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row1 =
vreinterpretq_s16_u16(vsubl_u8(samp_row1, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row2 =
vreinterpretq_s16_u16(vsubl_u8(samp_row2, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row3 =
vreinterpretq_s16_u16(vsubl_u8(samp_row3, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row4 =
vreinterpretq_s16_u16(vsubl_u8(samp_row4, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row5 =
vreinterpretq_s16_u16(vsubl_u8(samp_row5, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row6 =
vreinterpretq_s16_u16(vsubl_u8(samp_row6, vdup_n_u8(CENTERJSAMPLE)));
int16x8_t row7 =
vreinterpretq_s16_u16(vsubl_u8(samp_row7, vdup_n_u8(CENTERJSAMPLE)));
vst1q_s16(workspace + 0 * DCTSIZE, row0);
vst1q_s16(workspace + 1 * DCTSIZE, row1);
vst1q_s16(workspace + 2 * DCTSIZE, row2);
vst1q_s16(workspace + 3 * DCTSIZE, row3);
vst1q_s16(workspace + 4 * DCTSIZE, row4);
vst1q_s16(workspace + 5 * DCTSIZE, row5);
vst1q_s16(workspace + 6 * DCTSIZE, row6);
vst1q_s16(workspace + 7 * DCTSIZE, row7);
}
/* After the DCT, the resulting array of coefficient values needs to be divided
* by an array of quantization values.
*
* To avoid a slow division operation, the DCT coefficients are multiplied by
* the (scaled) reciprocals of the quantization values and then right-shifted.
*
* The equivalent scalar C function quantize() can be found in jcdctmgr.c.
*/
void jsimd_quantize_neon(JCOEFPTR coef_block, DCTELEM *divisors,
DCTELEM *workspace)
{
JCOEFPTR out_ptr = coef_block;
UDCTELEM *recip_ptr = (UDCTELEM *)divisors;
UDCTELEM *corr_ptr = (UDCTELEM *)divisors + DCTSIZE2;
DCTELEM *shift_ptr = divisors + 3 * DCTSIZE2;
int i;
for (i = 0; i < DCTSIZE; i += DCTSIZE / 2) {
/* Load DCT coefficients. */
int16x8_t row0 = vld1q_s16(workspace + (i + 0) * DCTSIZE);
int16x8_t row1 = vld1q_s16(workspace + (i + 1) * DCTSIZE);
int16x8_t row2 = vld1q_s16(workspace + (i + 2) * DCTSIZE);
int16x8_t row3 = vld1q_s16(workspace + (i + 3) * DCTSIZE);
/* Load reciprocals of quantization values. */
uint16x8_t recip0 = vld1q_u16(recip_ptr + (i + 0) * DCTSIZE);
uint16x8_t recip1 = vld1q_u16(recip_ptr + (i + 1) * DCTSIZE);
uint16x8_t recip2 = vld1q_u16(recip_ptr + (i + 2) * DCTSIZE);
uint16x8_t recip3 = vld1q_u16(recip_ptr + (i + 3) * DCTSIZE);
uint16x8_t corr0 = vld1q_u16(corr_ptr + (i + 0) * DCTSIZE);
uint16x8_t corr1 = vld1q_u16(corr_ptr + (i + 1) * DCTSIZE);
uint16x8_t corr2 = vld1q_u16(corr_ptr + (i + 2) * DCTSIZE);
uint16x8_t corr3 = vld1q_u16(corr_ptr + (i + 3) * DCTSIZE);
int16x8_t shift0 = vld1q_s16(shift_ptr + (i + 0) * DCTSIZE);
int16x8_t shift1 = vld1q_s16(shift_ptr + (i + 1) * DCTSIZE);
int16x8_t shift2 = vld1q_s16(shift_ptr + (i + 2) * DCTSIZE);
int16x8_t shift3 = vld1q_s16(shift_ptr + (i + 3) * DCTSIZE);
/* Extract sign from coefficients. */
int16x8_t sign_row0 = vshrq_n_s16(row0, 15);
int16x8_t sign_row1 = vshrq_n_s16(row1, 15);
int16x8_t sign_row2 = vshrq_n_s16(row2, 15);
int16x8_t sign_row3 = vshrq_n_s16(row3, 15);
/* Get absolute value of DCT coefficients. */
uint16x8_t abs_row0 = vreinterpretq_u16_s16(vabsq_s16(row0));
uint16x8_t abs_row1 = vreinterpretq_u16_s16(vabsq_s16(row1));
uint16x8_t abs_row2 = vreinterpretq_u16_s16(vabsq_s16(row2));
uint16x8_t abs_row3 = vreinterpretq_u16_s16(vabsq_s16(row3));
/* Add correction. */
abs_row0 = vaddq_u16(abs_row0, corr0);
abs_row1 = vaddq_u16(abs_row1, corr1);
abs_row2 = vaddq_u16(abs_row2, corr2);
abs_row3 = vaddq_u16(abs_row3, corr3);
/* Multiply DCT coefficients by quantization reciprocals. */
int32x4_t row0_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row0),
vget_low_u16(recip0)));
int32x4_t row0_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row0),
vget_high_u16(recip0)));
int32x4_t row1_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row1),
vget_low_u16(recip1)));
int32x4_t row1_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row1),
vget_high_u16(recip1)));
int32x4_t row2_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row2),
vget_low_u16(recip2)));
int32x4_t row2_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row2),
vget_high_u16(recip2)));
int32x4_t row3_l = vreinterpretq_s32_u32(vmull_u16(vget_low_u16(abs_row3),
vget_low_u16(recip3)));
int32x4_t row3_h = vreinterpretq_s32_u32(vmull_u16(vget_high_u16(abs_row3),
vget_high_u16(recip3)));
/* Narrow back to 16-bit. */
row0 = vcombine_s16(vshrn_n_s32(row0_l, 16), vshrn_n_s32(row0_h, 16));
row1 = vcombine_s16(vshrn_n_s32(row1_l, 16), vshrn_n_s32(row1_h, 16));
row2 = vcombine_s16(vshrn_n_s32(row2_l, 16), vshrn_n_s32(row2_h, 16));
row3 = vcombine_s16(vshrn_n_s32(row3_l, 16), vshrn_n_s32(row3_h, 16));
/* Since VSHR only supports an immediate as its second argument, negate the
* shift value and shift left.
*/
row0 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row0),
vnegq_s16(shift0)));
row1 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row1),
vnegq_s16(shift1)));
row2 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row2),
vnegq_s16(shift2)));
row3 = vreinterpretq_s16_u16(vshlq_u16(vreinterpretq_u16_s16(row3),
vnegq_s16(shift3)));
/* Restore sign to original product. */
row0 = veorq_s16(row0, sign_row0);
row0 = vsubq_s16(row0, sign_row0);
row1 = veorq_s16(row1, sign_row1);
row1 = vsubq_s16(row1, sign_row1);
row2 = veorq_s16(row2, sign_row2);
row2 = vsubq_s16(row2, sign_row2);
row3 = veorq_s16(row3, sign_row3);
row3 = vsubq_s16(row3, sign_row3);
/* Store quantized coefficients to memory. */
vst1q_s16(out_ptr + (i + 0) * DCTSIZE, row0);
vst1q_s16(out_ptr + (i + 1) * DCTSIZE, row1);
vst1q_s16(out_ptr + (i + 2) * DCTSIZE, row2);
vst1q_s16(out_ptr + (i + 3) * DCTSIZE, row3);
}
}

File diff suppressed because it is too large Load Diff

35
simd/arm/neon-compat.h.in Normal file
View File

@@ -0,0 +1,35 @@
/*
* Copyright (C) 2020, D. R. Commander. All Rights Reserved.
* Copyright (C) 2020-2021, Arm Limited. All Rights Reserved.
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*/
#cmakedefine HAVE_VLD1_S16_X3
#cmakedefine HAVE_VLD1_U16_X2
#cmakedefine HAVE_VLD1Q_U8_X4
/* Define compiler-independent count-leading-zeros macros */
#if defined(_MSC_VER) && !defined(__clang__)
#define BUILTIN_CLZ(x) _CountLeadingZeros(x)
#define BUILTIN_CLZLL(x) _CountLeadingZeros64(x)
#elif defined(__clang__) || defined(__GNUC__)
#define BUILTIN_CLZ(x) __builtin_clz(x)
#define BUILTIN_CLZLL(x) __builtin_clzll(x)
#else
#error "Unknown compiler"
#endif

Some files were not shown because too many files have changed in this diff Show More