Compare commits

...

14 Commits
v3.2 ... 1.5.1

Author SHA1 Message Date
mayeut
cb88e5da80 ARM64 NEON: Fix another ABI conformance issue
Based on
98a5a9dc89
with wordsmithing by DRC.

In the AArch64 ABI, as in many others, it's forbidden to read/store data
below the stack pointer.  Some SIMD functions were doing just that
(stack pointer misuse) when trying to preserve callee-saved registers,
and this resulted in those registers being restored with incorrect
contents under certain circumstances.

This patch fixes that behavior, and callee-saved registers are now
stored above the stack pointer throughout the function call.  The patch
also removes register saving in places where it is unnecessary for this
ABI, or it makes use of unused scratch regiters instead of callee-saved
registers.

Fixes #97.  Closes #101.

Refer also to https://bugzilla.redhat.com/show_bug.cgi?id=1368569
2016-09-20 17:38:39 -05:00
DRC
e9d9c31fd2 Build: Remove ARMv6 support from 'make iosdmg'
The last iDevice to require ARMv6 was the iPhone 3G, which required iOS
4.2.1 or older.  Our binaries have always required iOS 4.3 or newer,
so I'm not sure if the ARMv6 fork of our binaries was ever useful to
begin with.  In any case, if it ever was useful, it no longer is.  Fat
binaries can still be generated with ARMv6 support by invoking
{build_directory}/pkgscripts/makemacpkg manually.
2016-09-20 11:29:22 -05:00
DRC
077e5bb4e0 Fix out-of-bounds write in partial decomp. feature
Reported by Clang UBSan (refer to
https://bugzilla.mozilla.org/show_bug.cgi?id=1301252 for test image.)
This appears to be a legitimate bug introduced by
3ab68cf563.  Any component array, such
as first_MCU_col and last_MCU_col, should always be able to accommodate
MAX_COMPONENTS values.  The aforementioned test image had 8 components,
which was not enough to make the out-of-bounds write bust out of the
jpeg_decomp_master struct (and fortunately the memory after last_MCU_col
is an integer used as a boolean, so stomping on it will do nothing other
than change the decoder state.)  I crafted another special image that
has 10 components (the maximum allowable), but that was apparently not
enough to bust out of the allocated memory, either.  Thus, it is
posited that the security threat posed by this bug is either extremely
minimal or non-existent.
2016-09-08 22:01:09 -05:00
DRC
a1dd35680d Silence additional UBSan warnings
NOTE: The jdhuff.c/jdphuff.c warnings should have already been silenced
by 8e9cef2e6f, but apparently I need to
be REALLY clear that I'm trying to do pointer arithmetic rather than
dereference an array.  Grrr...

Refer to:
https://bugzilla.mozilla.org/show_bug.cgi?id=1301250
https://bugzilla.mozilla.org/show_bug.cgi?id=1301256
2016-09-08 21:29:58 -05:00
DRC
a09ba29a55 Fix unsigned int overflow in libjpeg memory mgr.
When attempting to decode a malformed JPEG image (refer to
https://bugzilla.mozilla.org/show_bug.cgi?id=1295044) with dimensions
61472 x 32800, the maximum_space variable within the
realize_virt_arrays() function will exceed the maximum value of a 32-bit
integer and will wrap around.  The memory manager subsequently fails
with an "Insufficient memory" error (case 4, in alloc_large()), so this
commit simply causes that error to be triggered earlier, before UBSan
has a chance to complain.

Note that this issue did not ever represent an exploitable security
threat, because the POSIX-based memory manager that we use doesn't ever
do anything meaningful with the value of maximum_space.
jpeg_mem_available() simply sets avail_mem = maximum_space, so the
subsequent behavior of the memory manager is the same regardless of
whether maximum_space is correct or not.  This commit simply removes a
UBSan warning in order to make it easier to detect actual security
issues.
2016-09-08 16:17:05 -05:00
DRC
8ce2c9119a TurboJPEG: Decomp. 4:2:2/4:4:0 JPEGs w/unusual SFs
Normally, 4:2:2 JPEGs have horizontal x vertical luminance,chrominance
sampling factors of 2x1,1x1, and 4:4:0 JPEGs have horizontal x vertical
luminance,chrominance sampling factors of 1x2,1x1.  However, it is
technically legal to create 4:2:2 JPEGs with sampling factors of
2x2,1x2 and 4:4:0 JPEGs with sampling factors of 2x2,2x1, since the
sums of the products of those sampling factors (2x2 + 1x2 + 1x2 and
2x2 + 2x1 + 2x1) are still <= 10.  The libjpeg API correctly decodes
such images, so the TurboJPEG API should as well.

Fixes #92
2016-08-01 11:59:31 -05:00
DRC
db04435165 Silence pedantic GCC6 code formatting warnings
Apparently it's "misleading" to put two self-contained if statements
on a single line.  Who knew?
2016-07-14 13:36:47 -05:00
DRC
7723d7f7d0 Use plain upsampling if merged isn't accelerated
Currently, this only affects ARM, since it is the only platform that
accelerates YCbCr-to-RGB conversion but not merged upsampling.  Even if
"plain" upsampling isn't accelerated, the combination of accelerated
color conversion + unaccelerated plain upsampling is still faster than
the unaccelerated merged upsampling algorithms.

Closes #81
2016-07-13 22:06:11 -05:00
Kornel Lesiński
628c168c86 Implement h1v2 fancy upsampling
This allows fancy upsampling to be used when decompressing 4:2:2 images
that have been losslessly rotated or transposed.

(docs and comments added by DRC)

Based on f63aca945d

Closes #89
2016-07-13 17:28:19 -05:00
DRC
1120ff29a1 Fix AArch64 ABI conformance issue in SIMD code
In the AArch64 ABI, the high (unused) DWORD of a 32-bit argument's
register is undefined, so it was incorrect to use 64-bit
instructions to transfer a JDIMENSION argument in the 64-bit NEON SIMD
functions.  The code worked thus far only because the existing compiler
optimizers weren't smart enough to do anything else with the register in
question, so the upper 32 bits happened to be all zeroes.

The latest builds of Clang/LLVM have a smarter optimizer, and under
certain circumstances, it will attempt to load-combine adjacent 32-bit
integers from one of the libjpeg structures into a single 64-bit integer
and pass that 64-bit integer as a 32-bit argument to one of the SIMD
functions (which is allowed by the ABI, since the upper 32 bits of the
32-bit argument's register are undefined.)  This caused the
libjpeg-turbo regression tests to crash.

This patch tries to use the Wn registers whenever possible.  Otherwise,
it uses a zero-extend instruction to avoid using the upper 32 bits of
the 64-bit registers, which are not guaranteed to be valid for 32-bit
arguments.

Based on 1fbae13021

Closes #91.  Refer also to android-ndk/ndk#110 and
https://llvm.org/bugs/show_bug.cgi?id=28393
2016-07-13 14:36:19 -05:00
DRC
1945ad961b Don't install libturbojpeg.pc if TJPEG disabled 2016-07-12 22:21:20 -05:00
DRC
6e9d43e085 Linux/PPC: Only enable AltiVec if CPU supports it
This eliminates "illegal instruction" errors when running libjpeg-turbo
under Linux on PowerPC chips that lack AltiVec support (e.g. the old
7XX/G3 models but also the newer e5500 series.)
2016-07-07 19:02:44 +00:00
DRC
9055fb408d ARM/MIPS: Change the behavior of JSIMD_FORCE*
The JSIMD_FORCE* environment variables previously meant "force the use
of this instruction set if it is available but others are available as
well", but that did nothing on ARM platforms, since there is only ever
one instruction set available.  Since the ARM and MIPS CPU feature
detection code is less than bulletproof, and since there is only one
SIMD instruction set (currently) supported on those platforms, it makes
sense for the JSIMD_FORCE* environment variables on those platforms to
actually force the use of the SIMD instruction set, thus bypassing the
CPU feature detection code.

This addresses a concern raised in #88 whereby parsing /proc/cpuinfo
didn't work within a QEMU environment.  This at least provides a
workaround, allowing users to force-enable or force-disable SIMD
instructions for ARM and MIPS builds of libjpeg-turbo.
2016-07-07 13:38:48 -05:00
DRC
9e6c6a14f8 Bump version to 1.5.1 to prepare for new commits 2016-07-06 16:22:27 +00:00
21 changed files with 418 additions and 154 deletions

View File

@@ -323,11 +323,6 @@ Set the following shell variables for simplicity:
IOS_SYSROOT=$IOS_PLATFORMDIR/Developer/SDKs/iPhoneOS*.sdk
IOS_GCC=$IOS_PLATFORMDIR/Developer/usr/bin/arm-apple-darwin10-llvm-gcc-4.2
*ARMv6 (code will run on all iOS devices, not SIMD-accelerated)*
[NOTE: Requires Xcode 4.4.x or earlier]
IOS_CFLAGS="-march=armv6 -mcpu=arm1176jzf-s -mfpu=vfp"
*ARMv7 (code will run on iPhone 3GS-4S/iPad 1st-3rd Generation and newer)*
IOS_CFLAGS="-march=armv7 -mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon"
@@ -399,8 +394,8 @@ NOTE: You can also add `-miphoneos-version-min={version}` to `$IOS_CFLAGS`
above in order to support older versions of iOS than the default version
supported by the SDK.
Once built, lipo can be used to combine the ARMv6, v7, v7s, and/or v8 variants
into a universal library.
Once built, lipo can be used to combine the ARMv7, v7s, and/or v8 variants into
a universal library.
### Building libjpeg-turbo for Android
@@ -782,7 +777,6 @@ default, but you can override this by setting the `BUILDDIR32` variable on the
make command line as shown above.
make iosdmg [BUILDDIR32={32-bit build directory}] \
[BUILDDIRARMV6={ARMv6 build directory}] \
[BUILDDIRARMV7={ARMv7 build directory}] \
[BUILDDIRARMV7S={ARMv7s build directory}] \
[BUILDDIRARMV8={ARMv8 build directory}]
@@ -791,19 +785,17 @@ On OS X systems, this creates a Macintosh package and disk image in which the
libjpeg-turbo static libraries contain ARM architectures necessary to build
iOS applications. If building on an x86-64 system, the binaries will also
contain the i386 architecture, as with `make udmg` above. You should first
configure ARMv6, ARMv7, ARMv7s, and/or ARMv8 out-of-tree builds of
libjpeg-turbo (see "Building libjpeg-turbo for iOS" above.) If you are
building an x86-64 version of libjpeg-turbo, you should configure a 32-bit
out-of-tree build as well. Next, build libjpeg-turbo as you would normally,
using an out-of-tree build. When it is built, run `make iosdmg` from the
build directory. The build system will look for the ARMv6 build under
*{source_directory}*/iosarmv6 by default, the ARMv7 build under
*{source_directory}*/iosarmv7 by default, the ARMv7s build under
*{source_directory}*/iosarmv7s by default, the ARMv8 build under
*{source_directory}*/iosarmv8 by default, and (if applicable) the 32-bit build
under *{source_directory}*/osxx86 by default, but you can override this by
setting the `BUILDDIR32`, `BUILDDIRARMV6`, `BUILDDIRARMV7`, `BUILDDIRARMV7S`,
and/or `BUILDDIRARMV8` variables on the `make` command line as shown above.
configure ARMv7, ARMv7s, and/or ARMv8 out-of-tree builds of libjpeg-turbo (see
"Building libjpeg-turbo for iOS" above.) If you are building an x86-64 version
of libjpeg-turbo, you should configure a 32-bit out-of-tree build as well.
Next, build libjpeg-turbo as you would normally, using an out-of-tree build.
When it is built, run `make iosdmg` from the build directory. The build system
will look for the ARMv7 build under *{source_directory}*/iosarmv7 by default,
the ARMv7s build under *{source_directory}*/iosarmv7s by default, the ARMv8
build under *{source_directory}*/iosarmv8 by default, and (if applicable) the
32-bit build under *{source_directory}*/osxx86 by default, but you can override
this by setting the `BUILDDIR32`, `BUILDDIRARMV7`, `BUILDDIRARMV7S`, and/or
`BUILDDIRARMV8` variables on the `make` command line as shown above.
NOTE: If including an ARMv8 build in the package, then you may need to use
Xcode's version of lipo instead of the operating system's. To do this, pass

View File

@@ -9,7 +9,7 @@ if(POLICY CMP0022)
endif()
project(libjpeg-turbo C)
set(VERSION 1.5.0)
set(VERSION 1.5.1)
string(REPLACE "." ";" VERSION_TRIPLET ${VERSION})
list(GET VERSION_TRIPLET 0 VERSION_MAJOR)
list(GET VERSION_TRIPLET 1 VERSION_MINOR)

View File

@@ -1,3 +1,94 @@
1.5.1
=====
### Significant changes relative to 1.5.0:
1. Previously, the undocumented `JSIMD_FORCE*` environment variables could be
used to force-enable a particular SIMD instruction set if multiple instruction
sets were available on a particular platform. On x86 platforms, where CPU
feature detection is bulletproof and multiple SIMD instruction sets are
available, it makes sense for those environment variables to allow forcing the
use of an instruction set only if that instruction set is available. However,
since the ARM implementations of libjpeg-turbo can only use one SIMD
instruction set, and since their feature detection code is less bulletproof
(parsing /proc/cpuinfo), it makes sense for the `JSIMD_FORCENEON` environment
variable to bypass the feature detection code and really force the use of NEON
instructions. A new environment variable (`JSIMD_FORCEDSPR2`) was introduced
in the MIPS implementation for the same reasons, and the existing
`JSIMD_FORCENONE` environment variable was extended to that implementation.
These environment variables provide a workaround for those attempting to test
ARM and MIPS builds of libjpeg-turbo in QEMU, which passes through
/proc/cpuinfo from the host system.
2. libjpeg-turbo previously assumed that AltiVec instructions were always
available on PowerPC platforms, which led to "illegal instruction" errors when
running on PowerPC chips that lack AltiVec support (such as the older 7xx/G3
and newer e5500 series.) libjpeg-turbo now examines /proc/cpuinfo on
Linux/Android systems and enables AltiVec instructions only if the CPU supports
them. It also now provides two environment variables, `JSIMD_FORCEALTIVEC` and
`JSIMD_FORCENONE`, to force-enable and force-disable AltiVec instructions in
environments where /proc/cpuinfo is an unreliable means of CPU feature
detection (such as when running in QEMU.) On OS X, libjpeg-turbo continues to
assume that AltiVec support is always available, which means that libjpeg-turbo
cannot be used with G3 Macs unless you set the environment variable
`JSIMD_FORCENONE` to `1`.
3. Fixed an issue whereby 64-bit ARM (AArch64) builds of libjpeg-turbo would
crash when built with recent releases of the Clang/LLVM compiler. This was
caused by an ABI conformance issue in some of libjpeg-turbo's 64-bit NEON SIMD
routines. Those routines were incorrectly using 64-bit instructions to
transfer a 32-bit JDIMENSION argument, whereas the ABI allows the upper
(unused) 32 bits of a 32-bit argument's register to be undefined. The new
Clang/LLVM optimizer uses load combining to transfer multiple adjacent 32-bit
structure members into a single 64-bit register, and this exposed the ABI
conformance issue.
4. Fancy upsampling is now supported when decompressing JPEG images that use
4:4:0 (h1v2) chroma subsampling. These images are generated when losslessly
rotating or transposing JPEG images that use 4:2:2 (h2v1) chroma subsampling.
The h1v2 fancy upsampling algorithm is not currently SIMD-accelerated.
5. If merged upsampling isn't SIMD-accelerated but YCbCr-to-RGB conversion is,
then libjpeg-turbo will now disable merged upsampling when decompressing YCbCr
JPEG images into RGB or extended RGB output images. This significantly speeds
up the decompression of 4:2:0 and 4:2:2 JPEGs on ARM platforms if fancy
upsampling is not used (for example, if the `-nosmooth` option to djpeg is
specified.)
6. The TurboJPEG API will now decompress 4:2:2 and 4:4:0 JPEG images with
2x2 luminance sampling factors and 2x1 or 1x2 chrominance sampling factors.
This is a non-standard way of specifying 2x subsampling (normally 4:2:2 JPEGs
have 2x1 luminance and 1x1 chrominance sampling factors, and 4:4:0 JPEGs have
1x2 luminance and 1x1 chrominance sampling factors), but the JPEG specification
and the libjpeg API both allow it.
7. Fixed an unsigned integer overflow in the libjpeg memory manager, detected
by the Clang undefined behavior sanitizer, that could be triggered by
attempting to decompress a specially-crafted malformed JPEG image. This issue
affected only 32-bit code and did not pose a security threat, but removing the
warning makes it easier to detect actual security issues, should they arise in
the future.
8. Fixed additional negative left shifts and other issues reported by the GCC
and Clang undefined behavior sanitizers when attempting to decompress
specially-crafted malformed JPEG images. None of these issues posed a security
threat, but removing the warnings makes it easier to detect actual security
issues, should they arise in the future.
9. Fixed an out-of-bounds array reference, introduced by 1.4.90[2] (partial
image decompression) and detected by the Clang undefined behavior sanitizer,
that could be triggered by a specially-crafted malformed JPEG image with more
than four components. Because the out-of-bounds reference was still within the
same structure, it was not known to pose a security threat, but removing the
warning makes it easier to detect actual security issues, should they arise in
the future.
10. Fixed another ABI conformance issue in the 64-bit ARM (AArch64) NEON SIMD
code. Some of the routines were incorrectly reading and storing data below the
stack pointer, which caused segfaults in certain applications under specific
circumstances.
1.5.0
=====

View File

@@ -11,7 +11,10 @@ endif
nodist_include_HEADERS = jconfig.h
pkgconfigdir = $(libdir)/pkgconfig
pkgconfig_DATA = pkgscripts/libjpeg.pc pkgscripts/libturbojpeg.pc
pkgconfig_DATA = pkgscripts/libjpeg.pc
if WITH_TURBOJPEG
pkgconfig_DATA += pkgscripts/libturbojpeg.pc
endif
HDRS = jchuff.h jdct.h jdhuff.h jerror.h jinclude.h jmemsys.h jmorecfg.h \
jpegint.h jpeglib.h jversion.h jsimd.h jsimddct.h jpegcomp.h \
@@ -757,12 +760,12 @@ udmg: all pkgscripts/makemacpkg pkgscripts/uninstall
sh pkgscripts/makemacpkg -build32 ${BUILDDIR32}
iosdmg: all pkgscripts/makemacpkg pkgscripts/uninstall
sh pkgscripts/makemacpkg -build32 ${BUILDDIR32} -buildarmv6 ${BUILDDIRARMV6} -buildarmv7 ${BUILDDIRARMV7} -buildarmv7s ${BUILDDIRARMV7S} -buildarmv8 ${BUILDDIRARMV8} -lipo "${LIPO}"
sh pkgscripts/makemacpkg -build32 ${BUILDDIR32} -buildarmv7 ${BUILDDIRARMV7} -buildarmv7s ${BUILDDIRARMV7S} -buildarmv8 ${BUILDDIRARMV8} -lipo "${LIPO}"
else
iosdmg: all pkgscripts/makemacpkg pkgscripts/uninstall
sh pkgscripts/makemacpkg -buildarmv6 ${BUILDDIRARMV6} -buildarmv7 ${BUILDDIRARMV7} -buildarmv7s ${BUILDDIRARMV7S} -buildarmv8 ${BUILDDIRARMV8} -lipo "${LIPO}"
sh pkgscripts/makemacpkg -buildarmv7 ${BUILDDIRARMV7} -buildarmv7s ${BUILDDIRARMV7S} -buildarmv8 ${BUILDDIRARMV8} -lipo "${LIPO}"
endif

21
bmp.c
View File

@@ -108,10 +108,14 @@ static void pixelconvert(unsigned char *srcbuf, int srcpf, int srcbottomup,
m=(m-k)/(1.0-k);
y=(y-k)/(1.0-k);
}
if(c>1.0) c=1.0; if(c<0.) c=0.;
if(m>1.0) m=1.0; if(m<0.) m=0.;
if(y>1.0) y=1.0; if(y<0.) y=0.;
if(k>1.0) k=1.0; if(k<0.) k=0.;
if(c>1.0) c=1.0;
if(c<0.) c=0.;
if(m>1.0) m=1.0;
if(m<0.) m=0.;
if(y>1.0) y=1.0;
if(y<0.) y=0.;
if(k>1.0) k=1.0;
if(k<0.) k=0.;
*dstcolptr++=(unsigned char)(255.0-c*255.0+0.5);
*dstcolptr++=(unsigned char)(255.0-m*255.0+0.5);
*dstcolptr++=(unsigned char)(255.0-y*255.0+0.5);
@@ -133,9 +137,12 @@ static void pixelconvert(unsigned char *srcbuf, int srcpf, int srcbottomup,
double r=c*k/255.;
double g=m*k/255.;
double b=y*k/255.;
if(r>255.0) r=255.0; if(r<0.) r=0.;
if(g>255.0) g=255.0; if(g<0.) g=0.;
if(b>255.0) b=255.0; if(b<0.) b=0.;
if(r>255.0) r=255.0;
if(r<0.) r=0.;
if(g>255.0) g=255.0;
if(g<0.) g=0.;
if(b>255.0) b=255.0;
if(b<0.) b=0.;
dstcolptr[tjRedOffset[dstpf]]=(unsigned char)(r+0.5);
dstcolptr[tjGreenOffset[dstpf]]=(unsigned char)(g+0.5);
dstcolptr[tjBlueOffset[dstpf]]=(unsigned char)(b+0.5);

View File

@@ -2,7 +2,7 @@
# Process this file with autoconf to produce a configure script.
AC_PREREQ([2.56])
AC_INIT([libjpeg-turbo], [1.5.0])
AC_INIT([libjpeg-turbo], [1.5.1])
AM_INIT_AUTOMAKE([-Wall foreign dist-bzip2])
AC_PREFIX_DEFAULT(/opt/libjpeg-turbo)

View File

@@ -4,7 +4,7 @@
* This file was part of the Independent JPEG Group's software:
* Developed 1997-2015 by Guido Vollbeding.
* libjpeg-turbo Modifications:
* Copyright (C) 2015, D. R. Commander.
* Copyright (C) 2015-2016, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg
* file.
*
@@ -382,7 +382,7 @@ decode_mcu_AC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
if (arith_decode(cinfo, st)) v |= m;
v += 1; if (sign) v = -v;
/* Scale and output coefficient in natural (dezigzagged) order */
(*block)[jpeg_natural_order[k]] = (JCOEF) (v << cinfo->Al);
(*block)[jpeg_natural_order[k]] = (JCOEF) ((unsigned)v << cinfo->Al);
}
return TRUE;

View File

@@ -109,9 +109,9 @@ start_pass_huff_decoder (j_decompress_ptr cinfo)
actbl = compptr->ac_tbl_no;
/* Compute derived values for Huffman tables */
/* We may do this more than once for a table, but it's not expensive */
pdtbl = entropy->dc_derived_tbls + dctbl;
pdtbl = (d_derived_tbl **)(entropy->dc_derived_tbls) + dctbl;
jpeg_make_d_derived_tbl(cinfo, TRUE, dctbl, pdtbl);
pdtbl = entropy->ac_derived_tbls + actbl;
pdtbl = (d_derived_tbl **)(entropy->ac_derived_tbls) + actbl;
jpeg_make_d_derived_tbl(cinfo, FALSE, actbl, pdtbl);
/* Initialize DC predictions to 0 */
entropy->saved.last_dc_val[ci] = 0;

View File

@@ -22,6 +22,7 @@
#include "jpeglib.h"
#include "jpegcomp.h"
#include "jdmaster.h"
#include "jsimd.h"
/*
@@ -69,6 +70,17 @@ use_merged_upsample (j_decompress_ptr cinfo)
cinfo->comp_info[1]._DCT_scaled_size != cinfo->_min_DCT_scaled_size ||
cinfo->comp_info[2]._DCT_scaled_size != cinfo->_min_DCT_scaled_size)
return FALSE;
#ifdef WITH_SIMD
/* If YCbCr-to-RGB color conversion is SIMD-accelerated but merged upsampling
isn't, then disabling merged upsampling is likely to be faster when
decompressing YCbCr JPEG images. */
if (!jsimd_can_h2v2_merged_upsample() && !jsimd_can_h2v1_merged_upsample() &&
jsimd_can_ycc_rgb() && cinfo->jpeg_color_space == JCS_YCbCr &&
(cinfo->out_color_space == JCS_RGB ||
(cinfo->out_color_space >= JCS_EXT_RGB &&
cinfo->out_color_space <= JCS_EXT_ARGB)))
return FALSE;
#endif
/* ??? also need to test for upsample-time rescaling, when & if supported */
return TRUE; /* by golly, it'll work... */
#else

View File

@@ -4,7 +4,7 @@
* This file was part of the Independent JPEG Group's software:
* Copyright (C) 1995-1997, Thomas G. Lane.
* libjpeg-turbo Modifications:
* Copyright (C) 2015, D. R. Commander.
* Copyright (C) 2015-2016, D. R. Commander.
* For conditions of distribution and use, see the accompanying README.ijg
* file.
*
@@ -170,12 +170,12 @@ start_pass_phuff_decoder (j_decompress_ptr cinfo)
if (is_DC_band) {
if (cinfo->Ah == 0) { /* DC refinement needs no table */
tbl = compptr->dc_tbl_no;
pdtbl = entropy->derived_tbls + tbl;
pdtbl = (d_derived_tbl **)(entropy->derived_tbls) + tbl;
jpeg_make_d_derived_tbl(cinfo, TRUE, tbl, pdtbl);
}
} else {
tbl = compptr->ac_tbl_no;
pdtbl = entropy->derived_tbls + tbl;
pdtbl = (d_derived_tbl **)(entropy->derived_tbls) + tbl;
jpeg_make_d_derived_tbl(cinfo, FALSE, tbl, pdtbl);
/* remember the single active table */
entropy->ac_derived_tbl = entropy->derived_tbls[tbl];

View File

@@ -303,6 +303,48 @@ h2v1_fancy_upsample (j_decompress_ptr cinfo, jpeg_component_info *compptr,
}
/*
* Fancy processing for 1:1 horizontal and 2:1 vertical (4:4:0 subsampling).
*
* This is a less common case, but it can be encountered when losslessly
* rotating/transposing a JPEG file that uses 4:2:2 chroma subsampling.
*/
METHODDEF(void)
h1v2_fancy_upsample (j_decompress_ptr cinfo, jpeg_component_info *compptr,
JSAMPARRAY input_data, JSAMPARRAY *output_data_ptr)
{
JSAMPARRAY output_data = *output_data_ptr;
JSAMPROW inptr0, inptr1, outptr;
#if BITS_IN_JSAMPLE == 8
int thiscolsum;
#else
JLONG thiscolsum;
#endif
JDIMENSION colctr;
int inrow, outrow, v;
inrow = outrow = 0;
while (outrow < cinfo->max_v_samp_factor) {
for (v = 0; v < 2; v++) {
/* inptr0 points to nearest input row, inptr1 points to next nearest */
inptr0 = input_data[inrow];
if (v == 0) /* next nearest is row above */
inptr1 = input_data[inrow-1];
else /* next nearest is row below */
inptr1 = input_data[inrow+1];
outptr = output_data[outrow++];
for(colctr = 0; colctr < compptr->downsampled_width; colctr++) {
thiscolsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++);
*outptr++ = (JSAMPLE) ((thiscolsum + 1) >> 2);
}
}
inrow++;
}
}
/*
* Fancy processing for the common case of 2:1 horizontal and 2:1 vertical.
* Again a triangle filter; see comments for h2v1 case, above.
@@ -431,6 +473,11 @@ jinit_upsampler (j_decompress_ptr cinfo)
else
upsample->methods[ci] = h2v1_upsample;
}
} else if (h_in_group == h_out_group &&
v_in_group * 2 == v_out_group && do_fancy) {
/* Non-fancy upsampling is handled by the generic method */
upsample->methods[ci] = h1v2_fancy_upsample;
upsample->pub.need_context_rows = TRUE;
} else if (h_in_group * 2 == h_out_group &&
v_in_group * 2 == v_out_group) {
/* Special cases for 2h2v upsampling */

View File

@@ -32,6 +32,7 @@
#include "jinclude.h"
#include "jpeglib.h"
#include "jmemsys.h" /* import the system-dependent declarations */
#include <stdint.h>
#ifndef NO_GETENV
#ifndef HAVE_STDLIB_H /* <stdlib.h> should declare getenv() */
@@ -650,18 +651,26 @@ realize_virt_arrays (j_common_ptr cinfo)
maximum_space = 0;
for (sptr = mem->virt_sarray_list; sptr != NULL; sptr = sptr->next) {
if (sptr->mem_buffer == NULL) { /* if not realized yet */
size_t new_space = (long) sptr->rows_in_array *
(long) sptr->samplesperrow * sizeof(JSAMPLE);
space_per_minheight += (long) sptr->maxaccess *
(long) sptr->samplesperrow * sizeof(JSAMPLE);
maximum_space += (long) sptr->rows_in_array *
(long) sptr->samplesperrow * sizeof(JSAMPLE);
if (SIZE_MAX - maximum_space < new_space)
out_of_memory(cinfo, 10);
maximum_space += new_space;
}
}
for (bptr = mem->virt_barray_list; bptr != NULL; bptr = bptr->next) {
if (bptr->mem_buffer == NULL) { /* if not realized yet */
size_t new_space = (long) bptr->rows_in_array *
(long) bptr->blocksperrow * sizeof(JBLOCK);
space_per_minheight += (long) bptr->maxaccess *
(long) bptr->blocksperrow * sizeof(JBLOCK);
maximum_space += (long) bptr->rows_in_array *
(long) bptr->blocksperrow * sizeof(JBLOCK);
if (SIZE_MAX - maximum_space < new_space)
out_of_memory(cinfo, 11);
maximum_space += new_space;
}
}

View File

@@ -155,8 +155,8 @@ struct jpeg_decomp_master {
/* Partial decompression variables */
JDIMENSION first_iMCU_col;
JDIMENSION last_iMCU_col;
JDIMENSION first_MCU_col[MAX_COMPS_IN_SCAN];
JDIMENSION last_MCU_col[MAX_COMPS_IN_SCAN];
JDIMENSION first_MCU_col[MAX_COMPONENTS];
JDIMENSION last_MCU_col[MAX_COMPONENTS];
boolean jinit_upsampler_no_alloc;
};

View File

@@ -73,19 +73,24 @@ endif
if SIMD_POWERPC
libsimd_la_SOURCES = jsimd_powerpc.c jsimd_altivec.h jcsample.h \
noinst_LTLIBRARIES += libsimd_altivec.la
libsimd_altivec_la_SOURCES = \
jccolor-altivec.c jcgray-altivec.c jcsample-altivec.c \
jdcolor-altivec.c jdmerge-altivec.c jdsample-altivec.c \
jfdctfst-altivec.c jfdctint-altivec.c \
jidctfst-altivec.c jidctint-altivec.c \
jquanti-altivec.c
libsimd_la_CFLAGS = -maltivec
libsimd_altivec_la_CFLAGS = -maltivec
jccolor-altivec.lo: jccolext-altivec.c
jcgray-altivec.lo: jcgryext-altivec.c
jdcolor-altivec.lo: jdcolext-altivec.c
jdmerge-altivec.lo: jdmrgext-altivec.c
libsimd_la_SOURCES = jsimd_powerpc.c jsimd_altivec.h jcsample.h
libsimd_la_LIBADD = libsimd_altivec.la
endif
AM_CPPFLAGS = -I$(top_srcdir)

View File

@@ -125,7 +125,7 @@ init_simd (void)
/* Force different settings through environment variables */
env = getenv("JSIMD_FORCENEON");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support &= JSIMD_ARM_NEON;
simd_support = JSIMD_ARM_NEON;
env = getenv("JSIMD_FORCENONE");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support = 0;

View File

@@ -142,7 +142,7 @@ init_simd (void)
/* Force different settings through environment variables */
env = getenv("JSIMD_FORCENEON");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support &= JSIMD_ARM_NEON;
simd_support = JSIMD_ARM_NEON;
env = getenv("JSIMD_FORCENONE");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support = 0;

View File

@@ -210,10 +210,16 @@ asm_function jsimd_idct_islow_neon
TMP7 .req x13
TMP8 .req x14
/* OUTPUT_COL is a JDIMENSION (unsigned int) argument, so the ABI doesn't
guarantee that the upper (unused) 32 bits of x3 are valid. This
instruction ensures that those bits are set to zero. */
uxtw x3, w3
sub sp, sp, #64
adr x15, Ljsimd_idct_islow_neon_consts
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], #32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], #32
mov x10, sp
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x10], #32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x10], #32
ld1 {v0.8h, v1.8h}, [x15]
ld1 {v2.8h, v3.8h, v4.8h, v5.8h}, [COEF_BLOCK], #64
ld1 {v18.8h, v19.8h, v20.8h, v21.8h}, [DCT_TABLE], #64
@@ -238,7 +244,6 @@ asm_function jsimd_idct_islow_neon
shl v10.8h, v2.8h, #(PASS1_BITS)
sqxtn v16.8b, v15.8h
mov TMP1, v16.d[0]
sub sp, sp, #64
mvn TMP2, TMP1
cbnz TMP2, 2f
@@ -807,6 +812,11 @@ asm_function jsimd_idct_ifast_neon
TMP7 .req x13
TMP8 .req x14
/* OUTPUT_COL is a JDIMENSION (unsigned int) argument, so the ABI doesn't
guarantee that the upper (unused) 32 bits of x3 are valid. This
instruction ensures that those bits are set to zero. */
uxtw x3, w3
/* Load and dequantize coefficients into NEON registers
* with the following allocation:
* 0 1 2 3 | 4 5 6 7
@@ -1101,19 +1111,18 @@ asm_function jsimd_idct_4x4_neon
TMP3 .req x2
TMP4 .req x15
/* OUTPUT_COL is a JDIMENSION (unsigned int) argument, so the ABI doesn't
guarantee that the upper (unused) 32 bits of x3 are valid. This
instruction ensures that those bits are set to zero. */
uxtw x3, w3
/* Save all used NEON registers */
sub sp, sp, 272
str x15, [sp], 16
sub sp, sp, 64
mov x9, sp
/* Load constants (v3.4h is just used for padding) */
adr TMP4, Ljsimd_idct_4x4_neon_consts
st1 {v0.8b, v1.8b, v2.8b, v3.8b}, [sp], 32
st1 {v4.8b, v5.8b, v6.8b, v7.8b}, [sp], 32
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
st1 {v16.8b, v17.8b, v18.8b, v19.8b}, [sp], 32
st1 {v20.8b, v21.8b, v22.8b, v23.8b}, [sp], 32
st1 {v24.8b, v25.8b, v26.8b, v27.8b}, [sp], 32
st1 {v28.8b, v29.8b, v30.8b, v31.8b}, [sp], 32
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
ld1 {v0.4h, v1.4h, v2.4h, v3.4h}, [TMP4]
/* Load all COEF_BLOCK into NEON registers with the following allocation:
@@ -1222,16 +1231,8 @@ asm_function jsimd_idct_4x4_neon
#endif
/* vpop {v8.4h - v15.4h} ;not available */
sub sp, sp, #272
ldr x15, [sp], 16
ld1 {v0.8b, v1.8b, v2.8b, v3.8b}, [sp], 32
ld1 {v4.8b, v5.8b, v6.8b, v7.8b}, [sp], 32
ld1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
ld1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
ld1 {v16.8b, v17.8b, v18.8b, v19.8b}, [sp], 32
ld1 {v20.8b, v21.8b, v22.8b, v23.8b}, [sp], 32
ld1 {v24.8b, v25.8b, v26.8b, v27.8b}, [sp], 32
ld1 {v28.8b, v29.8b, v30.8b, v31.8b}, [sp], 32
blr x30
.unreq DCT_TABLE
@@ -1299,19 +1300,19 @@ asm_function jsimd_idct_2x2_neon
TMP1 .req x0
TMP2 .req x15
/* OUTPUT_COL is a JDIMENSION (unsigned int) argument, so the ABI doesn't
guarantee that the upper (unused) 32 bits of x3 are valid. This
instruction ensures that those bits are set to zero. */
uxtw x3, w3
/* vpush {v8.4h - v15.4h} ; not available */
sub sp, sp, 208
str x15, [sp], 16
sub sp, sp, 64
mov x9, sp
/* Load constants */
adr TMP2, Ljsimd_idct_2x2_neon_consts
st1 {v4.8b, v5.8b, v6.8b, v7.8b}, [sp], 32
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
st1 {v16.8b, v17.8b, v18.8b, v19.8b}, [sp], 32
st1 {v21.8b, v22.8b}, [sp], 16
st1 {v24.8b, v25.8b, v26.8b, v27.8b}, [sp], 32
st1 {v30.8b, v31.8b}, [sp], 16
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
ld1 {v14.4h}, [TMP2]
/* Load all COEF_BLOCK into NEON registers with the following allocation:
@@ -1411,15 +1412,8 @@ asm_function jsimd_idct_2x2_neon
st1 {v26.b}[1], [TMP2], 1
st1 {v27.b}[5], [TMP2], 1
sub sp, sp, #208
ldr x15, [sp], 16
ld1 {v4.8b, v5.8b, v6.8b, v7.8b}, [sp], 32
ld1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
ld1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
ld1 {v16.8b, v17.8b, v18.8b, v19.8b}, [sp], 32
ld1 {v21.8b, v22.8b}, [sp], 16
ld1 {v24.8b, v25.8b, v26.8b, v27.8b}, [sp], 32
ld1 {v30.8b, v31.8b}, [sp], 16
blr x30
.unreq DCT_TABLE
@@ -1688,24 +1682,24 @@ asm_function jsimd_ycc_\colorid\()_convert_neon
.else
asm_function jsimd_ycc_\colorid\()_convert_neon_slowst3
.endif
OUTPUT_WIDTH .req x0
OUTPUT_WIDTH .req w0
INPUT_BUF .req x1
INPUT_ROW .req x2
INPUT_ROW .req w2
OUTPUT_BUF .req x3
NUM_ROWS .req x4
NUM_ROWS .req w4
INPUT_BUF0 .req x5
INPUT_BUF1 .req x6
INPUT_BUF2 .req x1
RGB .req x7
Y .req x8
U .req x9
V .req x10
N .req x15
Y .req x9
U .req x10
V .req x11
N .req w15
sub sp, sp, 336
str x15, [sp], 16
sub sp, sp, 64
mov x9, sp
/* Load constants to d1, d2, d3 (v0.4h is just used for padding) */
.if \fast_st3 == 1
@@ -1715,23 +1709,11 @@ asm_function jsimd_ycc_\colorid\()_convert_neon_slowst3
.endif
/* Save NEON registers */
st1 {v0.8b, v1.8b, v2.8b, v3.8b}, [sp], 32
st1 {v4.8b, v5.8b, v6.8b, v7.8b}, [sp], 32
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
st1 {v16.8b, v17.8b, v18.8b, v19.8b}, [sp], 32
st1 {v20.8b, v21.8b, v22.8b, v23.8b}, [sp], 32
st1 {v24.8b, v25.8b, v26.8b, v27.8b}, [sp], 32
st1 {v28.8b, v29.8b, v30.8b, v31.8b}, [sp], 32
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
ld1 {v0.4h, v1.4h}, [x15], 16
ld1 {v2.8h}, [x15]
/* Save ARM registers and handle input arguments */
/* push {x4, x5, x6, x7, x8, x9, x10, x30} */
stp x4, x5, [sp], 16
stp x6, x7, [sp], 16
stp x8, x9, [sp], 16
stp x10, x30, [sp], 16
ldr INPUT_BUF0, [INPUT_BUF]
ldr INPUT_BUF1, [INPUT_BUF, #8]
ldr INPUT_BUF2, [INPUT_BUF, #16]
@@ -1745,11 +1727,10 @@ asm_function jsimd_ycc_\colorid\()_convert_neon_slowst3
cmp NUM_ROWS, #1
b.lt 9f
0:
lsl x16, INPUT_ROW, #3
ldr Y, [INPUT_BUF0, x16]
ldr U, [INPUT_BUF1, x16]
ldr Y, [INPUT_BUF0, INPUT_ROW, uxtw #3]
ldr U, [INPUT_BUF1, INPUT_ROW, uxtw #3]
mov N, OUTPUT_WIDTH
ldr V, [INPUT_BUF2, x16]
ldr V, [INPUT_BUF2, INPUT_ROW, uxtw #3]
add INPUT_ROW, INPUT_ROW, #1
ldr RGB, [OUTPUT_BUF], #8
@@ -1799,21 +1780,8 @@ asm_function jsimd_ycc_\colorid\()_convert_neon_slowst3
b.gt 0b
9:
/* Restore all registers and return */
sub sp, sp, #336
ldr x15, [sp], 16
ld1 {v0.8b, v1.8b, v2.8b, v3.8b}, [sp], 32
ld1 {v4.8b, v5.8b, v6.8b, v7.8b}, [sp], 32
ld1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
ld1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
ld1 {v16.8b, v17.8b, v18.8b, v19.8b}, [sp], 32
ld1 {v20.8b, v21.8b, v22.8b, v23.8b}, [sp], 32
ld1 {v24.8b, v25.8b, v26.8b, v27.8b}, [sp], 32
ld1 {v28.8b, v29.8b, v30.8b, v31.8b}, [sp], 32
/* pop {r4, r5, r6, r7, r8, r9, r10, pc} */
ldp x4, x5, [sp], 16
ldp x6, x7, [sp], 16
ldp x8, x9, [sp], 16
ldp x10, x30, [sp], 16
br x30
.unreq OUTPUT_WIDTH
.unreq INPUT_ROW
@@ -2054,8 +2022,8 @@ asm_function jsimd_\colorid\()_ycc_convert_neon_slowld3
OUTPUT_WIDTH .req w0
INPUT_BUF .req x1
OUTPUT_BUF .req x2
OUTPUT_ROW .req x3
NUM_ROWS .req x4
OUTPUT_ROW .req w3
NUM_ROWS .req w4
OUTPUT_BUF0 .req x5
OUTPUT_BUF1 .req x6
@@ -2082,17 +2050,18 @@ asm_function jsimd_\colorid\()_ycc_convert_neon_slowld3
/* Save NEON registers */
sub sp, sp, #64
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
mov x9, sp
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
/* Outer loop over scanlines */
cmp NUM_ROWS, #1
b.lt 9f
0:
ldr Y, [OUTPUT_BUF0, OUTPUT_ROW, lsl #3]
ldr U, [OUTPUT_BUF1, OUTPUT_ROW, lsl #3]
ldr Y, [OUTPUT_BUF0, OUTPUT_ROW, uxtw #3]
ldr U, [OUTPUT_BUF1, OUTPUT_ROW, uxtw #3]
mov N, OUTPUT_WIDTH
ldr V, [OUTPUT_BUF2, OUTPUT_ROW, lsl #3]
ldr V, [OUTPUT_BUF2, OUTPUT_ROW, uxtw #3]
add OUTPUT_ROW, OUTPUT_ROW, #1
ldr RGB, [INPUT_BUF], #8
@@ -2136,7 +2105,6 @@ asm_function jsimd_\colorid\()_ycc_convert_neon_slowld3
b.gt 0b
9:
/* Restore all registers and return */
sub sp, sp, #64
ld1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
ld1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
br x30
@@ -2199,6 +2167,11 @@ asm_function jsimd_convsamp_neon
TMP8 .req x4
TMPDUP .req w3
/* START_COL is a JDIMENSION (unsigned int) argument, so the ABI doesn't
guarantee that the upper (unused) 32 bits of x1 are valid. This
instruction ensures that those bits are set to zero. */
uxtw x1, w1
mov TMPDUP, #128
ldp TMP1, TMP2, [SAMPLE_DATA], 16
ldp TMP3, TMP4, [SAMPLE_DATA], 16
@@ -2335,8 +2308,9 @@ asm_function jsimd_fdct_islow_neon
/* Save NEON registers */
sub sp, sp, #64
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
mov x10, sp
st1 {v8.8b, v9.8b, v10.8b, v11.8b}, [x10], 32
st1 {v12.8b, v13.8b, v14.8b, v15.8b}, [x10], 32
/* Load all DATA into NEON registers with the following allocation:
* 0 1 2 3 | 4 5 6 7
@@ -2566,7 +2540,6 @@ asm_function jsimd_fdct_islow_neon
st1 {v20.8h, v21.8h, v22.8h, v23.8h}, [DATA]
/* Restore NEON registers */
sub sp, sp, #64
ld1 {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
ld1 {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32
@@ -3080,7 +3053,7 @@ asm_function jsimd_huff_encode_one_block_neon_slowtbl
sub sp, sp, 272
sub BUFFER, BUFFER, #0x1 /* BUFFER=buffer-- */
/* Save ARM registers */
stp x19, x20, [sp], 16
stp x19, x20, [sp]
.if \fast_tbl == 1
adr x15, Ljsimd_huff_encode_one_block_neon_consts
.else
@@ -3294,7 +3267,7 @@ asm_function jsimd_huff_encode_one_block_neon_slowtbl
and v18.16b, v18.16b, v23.16b
add x3, x4, #0x400 /* r1 = dctbl->ehufsi */
and v20.16b, v20.16b, v23.16b
add x15, sp, #0x80 /* x15 = t2 */
add x15, sp, #0x90 /* x15 = t2 */
and v22.16b, v22.16b, v23.16b
ldr w10, [x4, x12, lsl #2]
addp v16.16b, v16.16b, v18.16b
@@ -3317,7 +3290,7 @@ asm_function jsimd_huff_encode_one_block_neon_slowtbl
rbit x9, x9 /* x9 = index0 */
ldrb w14, [x4, #0xf0] /* x14 = actbl->ehufsi[0xf0] */
cmp w12, #(64-8)
mov x11, sp
add x11, sp, #16
b.lt 4f
cbz x9, 6f
st1 {v0.8h, v1.8h, v2.8h, v3.8h}, [x11], #64
@@ -3421,7 +3394,7 @@ asm_function jsimd_huff_encode_one_block_neon_slowtbl
put_bits x3, x11
cbnz x9, 1b
6:
add x13, sp, #0xfe
add x13, sp, #0x10e
cmp x15, x13
b.hs 1f
ldr w12, [x5]
@@ -3429,7 +3402,6 @@ asm_function jsimd_huff_encode_one_block_neon_slowtbl
checkbuf47
put_bits x12, x14
1:
sub sp, sp, 16
str PUT_BUFFER, [x0, #0x10]
str PUT_BITSw, [x0, #0x18]
ldp x19, x20, [sp], 16

View File

@@ -2,7 +2,7 @@
* jsimd_mips.c
*
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2009-2011, 2014, D. R. Commander.
* Copyright (C) 2009-2011, 2014, 2016, D. R. Commander.
* Copyright (C) 2013-2014, MIPS Technologies, Inc., California.
* Copyright (C) 2015, Matthieu Darbois.
*
@@ -77,6 +77,14 @@ init_simd (void)
if (!parse_proc_cpuinfo("MIPS 74K"))
return;
#endif
/* Force different settings through environment variables */
env = getenv("JSIMD_FORCEDSPR2");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support = JSIMD_MIPS_DSPR2;
env = getenv("JSIMD_FORCENONE");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support = 0;
}
static const int mips_idct_ifast_coefs[4] = {

View File

@@ -2,7 +2,7 @@
* jsimd_powerpc.c
*
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
* Copyright (C) 2009-2011, 2014-2015, D. R. Commander.
* Copyright (C) 2009-2011, 2014-2016, D. R. Commander.
* Copyright (C) 2015, Matthieu Darbois.
*
* Based on the x86 SIMD extension for IJG JPEG library,
@@ -22,19 +22,106 @@
#include "../jsimddct.h"
#include "jsimd.h"
#include <stdio.h>
#include <string.h>
#include <ctype.h>
static unsigned int simd_support = ~0;
#if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
#define SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT (1024 * 1024)
LOCAL(int)
check_feature (char *buffer, char *feature)
{
char *p;
if (*feature == 0)
return 0;
if (strncmp(buffer, "cpu", 3) != 0)
return 0;
buffer += 3;
while (isspace(*buffer))
buffer++;
/* Check if 'feature' is present in the buffer as a separate word */
while ((p = strstr(buffer, feature))) {
if (p > buffer && !isspace(*(p - 1))) {
buffer++;
continue;
}
p += strlen(feature);
if (*p != 0 && !isspace(*p)) {
buffer++;
continue;
}
return 1;
}
return 0;
}
LOCAL(int)
parse_proc_cpuinfo (int bufsize)
{
char *buffer = (char *)malloc(bufsize);
FILE *fd;
simd_support = 0;
if (!buffer)
return 0;
fd = fopen("/proc/cpuinfo", "r");
if (fd) {
while (fgets(buffer, bufsize, fd)) {
if (!strchr(buffer, '\n') && !feof(fd)) {
/* "impossible" happened - insufficient size of the buffer! */
fclose(fd);
free(buffer);
return 0;
}
if (check_feature(buffer, "altivec"))
simd_support |= JSIMD_ALTIVEC;
}
fclose(fd);
}
free(buffer);
return 1;
}
#endif
/*
* Check what SIMD accelerations are supported.
*
* FIXME: This code is racy under a multi-threaded environment.
*/
LOCAL(void)
init_simd (void)
{
char *env = NULL;
#if !defined(__ALTIVEC__) && (defined(__linux__) || defined(ANDROID) || defined(__ANDROID__))
int bufsize = 1024; /* an initial guess for the line buffer size limit */
#endif
if (simd_support != ~0U)
return;
simd_support = JSIMD_ALTIVEC;
simd_support = 0;
#if defined(__ALTIVEC__) || defined(__APPLE__)
simd_support |= JSIMD_ALTIVEC;
#elif defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
while (!parse_proc_cpuinfo(bufsize)) {
bufsize *= 2;
if (bufsize > SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT)
break;
}
#endif
/* Force different settings through environment variables */
env = getenv("JSIMD_FORCEALTIVEC");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support = JSIMD_ALTIVEC;
env = getenv("JSIMD_FORCENONE");
if ((env != NULL) && (strcmp(env, "1") == 0))
simd_support = 0;

View File

@@ -248,7 +248,8 @@ int decomp(unsigned char *srcbuf, unsigned char **jpegbuf,
int y=(int)((double)srcbuf[rindex]*0.299
+ (double)srcbuf[gindex]*0.587
+ (double)srcbuf[bindex]*0.114 + 0.5);
if(y>255) y=255; if(y<0) y=0;
if(y>255) y=255;
if(y<0) y=0;
dstbuf[rindex]=abs(dstbuf[rindex]-y);
dstbuf[gindex]=abs(dstbuf[gindex]-y);
dstbuf[bindex]=abs(dstbuf[bindex]-y);
@@ -300,7 +301,8 @@ int fullTest(unsigned char *srcbuf, int w, int h, int subsamp, int jpegqual,
for(tilew=dotile? 8:w, tileh=dotile? 8:h; ; tilew*=2, tileh*=2)
{
if(tilew>w) tilew=w; if(tileh>h) tileh=h;
if(tilew>w) tilew=w;
if(tileh>h) tileh=h;
ntilesw=(w+tilew-1)/tilew; ntilesh=(h+tileh-1)/tileh;
if((jpegbuf=(unsigned char **)malloc(sizeof(unsigned char *)
@@ -447,7 +449,8 @@ int fullTest(unsigned char *srcbuf, int w, int h, int subsamp, int jpegqual,
for(i=0; i<ntilesw*ntilesh; i++)
{
if(jpegbuf[i]) tjFree(jpegbuf[i]); jpegbuf[i]=NULL;
if(jpegbuf[i]) tjFree(jpegbuf[i]);
jpegbuf[i]=NULL;
}
free(jpegbuf); jpegbuf=NULL;
free(jpegsize); jpegsize=NULL;
@@ -465,7 +468,8 @@ int fullTest(unsigned char *srcbuf, int w, int h, int subsamp, int jpegqual,
{
for(i=0; i<ntilesw*ntilesh; i++)
{
if(jpegbuf[i]) tjFree(jpegbuf[i]); jpegbuf[i]=NULL;
if(jpegbuf[i]) tjFree(jpegbuf[i]);
jpegbuf[i]=NULL;
}
free(jpegbuf); jpegbuf=NULL;
}
@@ -532,7 +536,8 @@ int decompTest(char *filename)
for(tilew=dotile? 16:w, tileh=dotile? 16:h; ; tilew*=2, tileh*=2)
{
if(tilew>w) tilew=w; if(tileh>h) tileh=h;
if(tilew>w) tilew=w;
if(tileh>h) tileh=h;
ntilesw=(w+tilew-1)/tilew; ntilesh=(h+tileh-1)/tileh;
if((jpegbuf=(unsigned char **)malloc(sizeof(unsigned char *)
@@ -692,7 +697,8 @@ int decompTest(char *filename)
{
for(i=0; i<ntilesw*ntilesh; i++)
{
if(jpegbuf[i]) tjFree(jpegbuf[i]); jpegbuf[i]=NULL;
if(jpegbuf[i]) tjFree(jpegbuf[i]);
jpegbuf[i]=NULL;
}
free(jpegbuf); jpegbuf=NULL;
}

View File

@@ -368,6 +368,29 @@ static int getSubsamp(j_decompress_ptr dinfo)
retval=i; break;
}
}
/* Handle 4:2:2 and 4:4:0 images whose sampling factors are specified
in non-standard ways. */
if(dinfo->comp_info[0].h_samp_factor==2 &&
dinfo->comp_info[0].v_samp_factor==2 &&
(i==TJSAMP_422 || i==TJSAMP_440))
{
int match=0;
for(k=1; k<dinfo->num_components; k++)
{
int href=tjMCUHeight[i]/8, vref=tjMCUWidth[i]/8;
if(dinfo->jpeg_color_space==JCS_YCCK && k==3)
{
href=vref=2;
}
if(dinfo->comp_info[k].h_samp_factor==href
&& dinfo->comp_info[k].v_samp_factor==vref)
match++;
}
if(match==dinfo->num_components-1)
{
retval=i; break;
}
}
}
}
return retval;
@@ -570,7 +593,8 @@ static tjhandle _tjInitCompress(tjinstance *this)
if(setjmp(this->jerr.setjmp_buffer))
{
/* If we get here, the JPEG code has signaled an error. */
if(this) free(this); return NULL;
if(this) free(this);
return NULL;
}
jpeg_create_compress(&this->cinfo);
@@ -1231,7 +1255,8 @@ static tjhandle _tjInitDecompress(tjinstance *this)
if(setjmp(this->jerr.setjmp_buffer))
{
/* If we get here, the JPEG code has signaled an error. */
if(this) free(this); return NULL;
if(this) free(this);
return NULL;
}
jpeg_create_decompress(&this->dinfo);