Merge branch 'master' into dev
This commit is contained in:
26
ChangeLog.md
26
ChangeLog.md
@@ -388,7 +388,7 @@ detect actual security issues, should they arise in the future.
|
||||
|
||||
1. Added AVX2 SIMD implementations of the colorspace conversion, chroma
|
||||
downsampling and upsampling, integer quantization and sample conversion, and
|
||||
slow integer DCT/IDCT algorithms. When using the slow integer DCT/IDCT
|
||||
accurate integer DCT/IDCT algorithms. When using the accurate integer DCT/IDCT
|
||||
algorithms on AVX2-equipped CPUs, the compression of RGB images is
|
||||
approximately 13-36% (avg. 22%) faster (relative to libjpeg-turbo 1.5.x) with
|
||||
64-bit code and 11-21% (avg. 17%) faster with 32-bit code, and the
|
||||
@@ -498,10 +498,10 @@ libraries that link statically with libjpeg-turbo.
|
||||
|
||||
13. Added Loongson MMI SIMD implementations of the RGB-to-YCbCr and
|
||||
YCbCr-to-RGB colorspace conversion, 4:2:0 chroma downsampling, 4:2:0 fancy
|
||||
chroma upsampling, integer quantization, and slow integer DCT/IDCT algorithms.
|
||||
When using the slow integer DCT/IDCT, this speeds up the compression of RGB
|
||||
images by approximately 70-100% and the decompression of RGB images by
|
||||
approximately 2-3.5x.
|
||||
chroma upsampling, integer quantization, and accurate integer DCT/IDCT
|
||||
algorithms. When using the accurate integer DCT/IDCT, this speeds up the
|
||||
compression of RGB images by approximately 70-100% and the decompression of RGB
|
||||
images by approximately 2-3.5x.
|
||||
|
||||
14. Fixed a build error when building with older MinGW releases (regression
|
||||
caused by 1.5.1[7].)
|
||||
@@ -833,8 +833,8 @@ benchmarking or regression testing, SIMD-accelerated Huffman encoding can be
|
||||
disabled by setting the `JSIMD_NOHUFFENC` environment variable to `1`.
|
||||
|
||||
13. Added ARM 64-bit (ARMv8) NEON SIMD implementations of the commonly-used
|
||||
compression algorithms (including the slow integer forward DCT and h2v2 & h2v1
|
||||
downsampling algorithms, which are not accelerated in the 32-bit NEON
|
||||
compression algorithms (including the accurate integer forward DCT and h2v2 &
|
||||
h2v1 downsampling algorithms, which are not accelerated in the 32-bit NEON
|
||||
implementation.) This speeds up the compression of full-color JPEGs by about
|
||||
75% on average on a Cavium ThunderX processor and by about 2-2.5x on average on
|
||||
Cortex-A53 and Cortex-A57 cores.
|
||||
@@ -965,8 +965,8 @@ platforms other than Windows or Linux. Oops.
|
||||
|
||||
7. Fixed an extremely rare bug in the Huffman encoder that caused 64-bit
|
||||
builds of libjpeg-turbo to incorrectly encode a few specific test images when
|
||||
quality=98, an optimized Huffman table, and the slow integer forward DCT were
|
||||
used.
|
||||
quality=98, an optimized Huffman table, and the accurate integer forward DCT
|
||||
were used.
|
||||
|
||||
8. The Windows (CMake) build system now supports building only static or only
|
||||
shared libraries. This is accomplished by adding either `-DENABLE_STATIC=0` or
|
||||
@@ -1125,8 +1125,8 @@ floating point inverse DCT (using code borrowed from libjpeg v8a and later.)
|
||||
The accuracy of this implementation now matches the accuracy of the SSE/SSE2
|
||||
implementation. Note, however, that the floating point DCT/IDCT algorithms are
|
||||
mainly a legacy feature. They generally do not produce significantly better
|
||||
accuracy than the slow integer DCT/IDCT algorithms, and they are quite a bit
|
||||
slower.
|
||||
accuracy than the accurate integer DCT/IDCT algorithms, and they are quite a
|
||||
bit slower.
|
||||
|
||||
8. Added a new output colorspace (`JCS_RGB565`) to the libjpeg API that allows
|
||||
for decompressing JPEG images into RGB565 (16-bit) pixels. If dithering is not
|
||||
@@ -1536,8 +1536,8 @@ cases.
|
||||
|
||||
2. Despite the above, the fast integer forward DCT still degrades somewhat for
|
||||
JPEG qualities greater than 95, so the TurboJPEG wrapper will now automatically
|
||||
use the slow integer forward DCT when generating JPEG images of quality 96 or
|
||||
greater. This reduces compression performance by as much as 15% for these
|
||||
use the accurate integer forward DCT when generating JPEG images of quality 96
|
||||
or greater. This reduces compression performance by as much as 15% for these
|
||||
high-quality images but is necessary to ensure that the images are perceptually
|
||||
lossless. It also ensures that the library can avoid the performance pitfall
|
||||
created by [1].
|
||||
|
||||
15
README.md
15
README.md
@@ -287,12 +287,13 @@ following reasons:
|
||||
(and slightly faster) floating point IDCT algorithm introduced in libjpeg
|
||||
v8a as opposed to the algorithm used in libjpeg v6b. It should be noted,
|
||||
however, that this algorithm basically brings the accuracy of the floating
|
||||
point IDCT in line with the accuracy of the slow integer IDCT. The floating
|
||||
point DCT/IDCT algorithms are mainly a legacy feature, and they do not
|
||||
produce significantly more accuracy than the slow integer algorithms (to put
|
||||
numbers on this, the typical difference in PNSR between the two algorithms
|
||||
is less than 0.10 dB, whereas changing the quality level by 1 in the upper
|
||||
range of the quality scale is typically more like a 1.0 dB difference.)
|
||||
point IDCT in line with the accuracy of the accurate integer IDCT. The
|
||||
floating point DCT/IDCT algorithms are mainly a legacy feature, and they do
|
||||
not produce significantly more accuracy than the accurate integer algorithms
|
||||
(to put numbers on this, the typical difference in PNSR between the two
|
||||
algorithms is less than 0.10 dB, whereas changing the quality level by 1 in
|
||||
the upper range of the quality scale is typically more like a 1.0 dB
|
||||
difference.)
|
||||
|
||||
- If the floating point algorithms in libjpeg-turbo are not implemented using
|
||||
SIMD instructions on a particular platform, then the accuracy of the
|
||||
@@ -340,7 +341,7 @@ The algorithm used by the SIMD-accelerated quantization function cannot produce
|
||||
correct results whenever the fast integer forward DCT is used along with a JPEG
|
||||
quality of 98-100. Thus, libjpeg-turbo must use the non-SIMD quantization
|
||||
function in those cases. This causes performance to drop by as much as 40%.
|
||||
It is therefore strongly advised that you use the slow integer forward DCT
|
||||
It is therefore strongly advised that you use the accurate integer forward DCT
|
||||
whenever encoding images with a JPEG quality of 98 or higher.
|
||||
|
||||
|
||||
|
||||
51
cjpeg.1
51
cjpeg.1
@@ -1,4 +1,4 @@
|
||||
.TH CJPEG 1 "18 December 2019"
|
||||
.TH CJPEG 1 "4 November 2020"
|
||||
.SH NAME
|
||||
cjpeg \- compress an image file to a JPEG file
|
||||
.SH SYNOPSIS
|
||||
@@ -160,31 +160,40 @@ arithmetic coded JPEG is not yet widely implemented, so many decoders will be
|
||||
unable to view an arithmetic coded JPEG file at all.
|
||||
.TP
|
||||
.B \-dct int
|
||||
Use integer DCT method (default).
|
||||
Use accurate integer DCT method (default).
|
||||
.TP
|
||||
.B \-dct fast
|
||||
Use fast integer DCT (less accurate).
|
||||
In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
|
||||
method when using the x86/x86-64 SIMD extensions (results may vary with other
|
||||
SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)
|
||||
Use less accurate integer DCT method [legacy feature].
|
||||
When the Independent JPEG Group's software was first released in 1991, the
|
||||
compression time for a 1-megapixel JPEG image on a mainstream PC was measured
|
||||
in minutes. Thus, the \fBfast\fR integer DCT algorithm provided noticeable
|
||||
performance benefits. On modern CPUs running libjpeg-turbo, however, the
|
||||
compression time for a 1-megapixel JPEG image is measured in milliseconds, and
|
||||
thus the performance benefits of the \fBfast\fR algorithm are much less
|
||||
noticeable. On modern x86/x86-64 CPUs that support AVX2 instructions, the
|
||||
\fBfast\fR and \fBint\fR methods have similar performance. On other types of
|
||||
CPUs, the \fBfast\fR method is generally about 5-15% faster than the \fBint\fR
|
||||
method.
|
||||
|
||||
For quality levels of 90 and below, there should be little or no perceptible
|
||||
difference between the two algorithms. For quality levels above 90, however,
|
||||
the difference between the fast and the int methods becomes more pronounced.
|
||||
With quality=97, for instance, the fast method incurs generally about a 1-3 dB
|
||||
loss (in PSNR) relative to the int method, but this can be larger for some
|
||||
images. Do not use the fast method with quality levels above 97. The
|
||||
algorithm often degenerates at quality=98 and above and can actually produce a
|
||||
more lossy image than if lower quality levels had been used. Also, in
|
||||
libjpeg-turbo, the fast method is not fully accelerated for quality levels
|
||||
above 97, so it will be slower than the int method.
|
||||
quality difference between the two algorithms. For quality levels above 90,
|
||||
however, the difference between the \fBfast\fR and \fBint\fR methods becomes
|
||||
more pronounced. With quality=97, for instance, the \fBfast\fR method incurs
|
||||
generally about a 1-3 dB loss in PSNR relative to the \fBint\fR method, but
|
||||
this can be larger for some images. Do not use the \fBfast\fR method with
|
||||
quality levels above 97. The algorithm often degenerates at quality=98 and
|
||||
above and can actually produce a more lossy image than if lower quality levels
|
||||
had been used. Also, in libjpeg-turbo, the \fBfast\fR method is not fully
|
||||
accelerated for quality levels above 97, so it will be slower than the
|
||||
\fBint\fR method.
|
||||
.TP
|
||||
.B \-dct float
|
||||
Use floating-point DCT method.
|
||||
The float method is mainly a legacy feature. It does not produce significantly
|
||||
more accurate results than the int method, and it is much slower. The float
|
||||
method may also give different results on different machines due to varying
|
||||
roundoff behavior, whereas the integer methods should give the same results on
|
||||
all machines.
|
||||
Use floating-point DCT method [legacy feature].
|
||||
The \fBfloat\fR method does not produce significantly more accurate results
|
||||
than the \fBint\fR method, and it is much slower. The \fBfloat\fR method may
|
||||
also give different results on different machines due to varying roundoff
|
||||
behavior, whereas the integer methods should give the same results on all
|
||||
machines.
|
||||
.TP
|
||||
.BI \-icc " file"
|
||||
Embed ICC color management profile contained in the specified file.
|
||||
|
||||
8
cjpeg.c
8
cjpeg.c
@@ -5,7 +5,7 @@
|
||||
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||
* Modified 2003-2011 by Guido Vollbeding.
|
||||
* libjpeg-turbo Modifications:
|
||||
* Copyright (C) 2010, 2013-2014, 2017, 2019, D. R. Commander.
|
||||
* Copyright (C) 2010, 2013-2014, 2017, 2019-2020, D. R. Commander.
|
||||
* For conditions of distribution and use, see the accompanying README.ijg
|
||||
* file.
|
||||
*
|
||||
@@ -176,15 +176,15 @@ usage(void)
|
||||
fprintf(stderr, " -arithmetic Use arithmetic coding\n");
|
||||
#endif
|
||||
#ifdef DCT_ISLOW_SUPPORTED
|
||||
fprintf(stderr, " -dct int Use integer DCT method%s\n",
|
||||
fprintf(stderr, " -dct int Use accurate integer DCT method%s\n",
|
||||
(JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
|
||||
#endif
|
||||
#ifdef DCT_IFAST_SUPPORTED
|
||||
fprintf(stderr, " -dct fast Use fast integer DCT (less accurate)%s\n",
|
||||
fprintf(stderr, " -dct fast Use less accurate integer DCT method [legacy feature]%s\n",
|
||||
(JDCT_DEFAULT == JDCT_IFAST ? " (default)" : ""));
|
||||
#endif
|
||||
#ifdef DCT_FLOAT_SUPPORTED
|
||||
fprintf(stderr, " -dct float Use floating-point DCT method%s\n",
|
||||
fprintf(stderr, " -dct float Use floating-point DCT method [legacy feature]%s\n",
|
||||
(JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : ""));
|
||||
#endif
|
||||
fprintf(stderr, " -icc FILE Embed ICC profile contained in FILE\n");
|
||||
|
||||
60
djpeg.1
60
djpeg.1
@@ -1,4 +1,4 @@
|
||||
.TH DJPEG 1 "18 December 2019"
|
||||
.TH DJPEG 1 "4 November 2020"
|
||||
.SH NAME
|
||||
djpeg \- decompress a JPEG file to an image file
|
||||
.SH SYNOPSIS
|
||||
@@ -121,32 +121,40 @@ is specified; otherwise, 24-bit full-color format is emitted.
|
||||
Switches for advanced users:
|
||||
.TP
|
||||
.B \-dct int
|
||||
Use integer DCT method (default).
|
||||
Use accurate integer DCT method (default).
|
||||
.TP
|
||||
.B \-dct fast
|
||||
Use fast integer DCT (less accurate).
|
||||
In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
|
||||
method when using the x86/x86-64 SIMD extensions (results may vary with other
|
||||
SIMD implementations, or when using libjpeg-turbo without SIMD extensions.) If
|
||||
the JPEG image was compressed using a quality level of 85 or below, then there
|
||||
should be little or no perceptible difference between the two algorithms. When
|
||||
decompressing images that were compressed using quality levels above 85,
|
||||
however, the difference between the fast and int methods becomes more
|
||||
pronounced. With images compressed using quality=97, for instance, the fast
|
||||
method incurs generally about a 4-6 dB loss (in PSNR) relative to the int
|
||||
method, but this can be larger for some images. If you can avoid it, do not
|
||||
use the fast method when decompressing images that were compressed using
|
||||
quality levels above 97. The algorithm often degenerates for such images and
|
||||
can actually produce a more lossy output image than if the JPEG image had been
|
||||
compressed using lower quality levels.
|
||||
Use less accurate integer DCT method [legacy feature].
|
||||
When the Independent JPEG Group's software was first released in 1991, the
|
||||
decompression time for a 1-megapixel JPEG image on a mainstream PC was measured
|
||||
in minutes. Thus, the \fBfast\fR integer DCT algorithm provided noticeable
|
||||
performance benefits. On modern CPUs running libjpeg-turbo, however, the
|
||||
decompression time for a 1-megapixel JPEG image is measured in milliseconds,
|
||||
and thus the performance benefits of the \fBfast\fR algorithm are much less
|
||||
noticeable. On modern x86/x86-64 CPUs that support AVX2 instructions, the
|
||||
\fBfast\fR and \fBint\fR methods have similar performance. On other types of
|
||||
CPUs, the \fBfast\fR method is generally about 5-15% faster than the \fBint\fR
|
||||
method.
|
||||
|
||||
If the JPEG image was compressed using a quality level of 85 or below, then
|
||||
there should be little or no perceptible quality difference between the two
|
||||
algorithms. When decompressing images that were compressed using quality
|
||||
levels above 85, however, the difference between the \fBfast\fR and \fBint\fR
|
||||
methods becomes more pronounced. With images compressed using quality=97, for
|
||||
instance, the \fBfast\fR method incurs generally about a 4-6 dB loss in PSNR
|
||||
relative to the \fBint\fR method, but this can be larger for some images. If
|
||||
you can avoid it, do not use the \fBfast\fR method when decompressing images
|
||||
that were compressed using quality levels above 97. The algorithm often
|
||||
degenerates for such images and can actually produce a more lossy output image
|
||||
than if the JPEG image had been compressed using lower quality levels.
|
||||
.TP
|
||||
.B \-dct float
|
||||
Use floating-point DCT method.
|
||||
The float method is mainly a legacy feature. It does not produce significantly
|
||||
more accurate results than the int method, and it is much slower. The float
|
||||
method may also give different results on different machines due to varying
|
||||
roundoff behavior, whereas the integer methods should give the same results on
|
||||
all machines.
|
||||
Use floating-point DCT method [legacy feature].
|
||||
The \fBfloat\fR method does not produce significantly more accurate results
|
||||
than the \fBint\fR method, and it is much slower. The \fBfloat\fR method may
|
||||
also give different results on different machines due to varying roundoff
|
||||
behavior, whereas the integer methods should give the same results on all
|
||||
machines.
|
||||
.TP
|
||||
.B \-dither fs
|
||||
Use Floyd-Steinberg dithering in color quantization.
|
||||
@@ -282,12 +290,6 @@ is fast but much lower quality than the default behavior.
|
||||
.B \-dither none
|
||||
may give acceptable results in two-pass mode, but is seldom tolerable in
|
||||
one-pass mode.
|
||||
.PP
|
||||
If you are fortunate enough to have very fast floating point hardware,
|
||||
\fB\-dct float\fR may be even faster than \fB\-dct fast\fR. But on most
|
||||
machines \fB\-dct float\fR is slower than \fB\-dct int\fR; in this case it is
|
||||
not worth using, because its theoretical accuracy advantage is too small to be
|
||||
significant in practice.
|
||||
.SH ENVIRONMENT
|
||||
.TP
|
||||
.B JPEGMEM
|
||||
|
||||
6
djpeg.c
6
djpeg.c
@@ -149,15 +149,15 @@ usage(void)
|
||||
#endif
|
||||
fprintf(stderr, "Switches for advanced users:\n");
|
||||
#ifdef DCT_ISLOW_SUPPORTED
|
||||
fprintf(stderr, " -dct int Use integer DCT method%s\n",
|
||||
fprintf(stderr, " -dct int Use accurate integer DCT method%s\n",
|
||||
(JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
|
||||
#endif
|
||||
#ifdef DCT_IFAST_SUPPORTED
|
||||
fprintf(stderr, " -dct fast Use fast integer DCT (less accurate)%s\n",
|
||||
fprintf(stderr, " -dct fast Use less accurate integer DCT method [legacy feature]%s\n",
|
||||
(JDCT_DEFAULT == JDCT_IFAST ? " (default)" : ""));
|
||||
#endif
|
||||
#ifdef DCT_FLOAT_SUPPORTED
|
||||
fprintf(stderr, " -dct float Use floating-point DCT method%s\n",
|
||||
fprintf(stderr, " -dct float Use floating-point DCT method [legacy feature]%s\n",
|
||||
(JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : ""));
|
||||
#endif
|
||||
fprintf(stderr, " -dither fs Use F-S dithering (default)\n");
|
||||
|
||||
@@ -4,11 +4,11 @@
|
||||
* This file was part of the Independent JPEG Group's software:
|
||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
||||
* libjpeg-turbo Modifications:
|
||||
* Copyright (C) 2015, D. R. Commander.
|
||||
* Copyright (C) 2015, 2020, D. R. Commander.
|
||||
* For conditions of distribution and use, see the accompanying README.ijg
|
||||
* file.
|
||||
*
|
||||
* This file contains a slow-but-accurate integer implementation of the
|
||||
* This file contains a slower but more accurate integer implementation of the
|
||||
* forward DCT (Discrete Cosine Transform).
|
||||
*
|
||||
* A 2-D DCT can be done by 1-D DCT on each row followed by 1-D DCT
|
||||
|
||||
@@ -5,11 +5,11 @@
|
||||
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||
* Modification developed 2002-2018 by Guido Vollbeding.
|
||||
* libjpeg-turbo Modifications:
|
||||
* Copyright (C) 2015, D. R. Commander.
|
||||
* Copyright (C) 2015, 2020, D. R. Commander.
|
||||
* For conditions of distribution and use, see the accompanying README.ijg
|
||||
* file.
|
||||
*
|
||||
* This file contains a slow-but-accurate integer implementation of the
|
||||
* This file contains a slower but more accurate integer implementation of the
|
||||
* inverse DCT (Discrete Cosine Transform). In the IJG code, this routine
|
||||
* must also perform dequantization of the input coefficients.
|
||||
*
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||
* Modified 1997-2009 by Guido Vollbeding.
|
||||
* libjpeg-turbo Modifications:
|
||||
* Copyright (C) 2009, 2011, 2014-2015, 2018, D. R. Commander.
|
||||
* Copyright (C) 2009, 2011, 2014-2015, 2018, 2020, D. R. Commander.
|
||||
* For conditions of distribution and use, see the accompanying README.ijg
|
||||
* file.
|
||||
*
|
||||
@@ -238,9 +238,9 @@ typedef int boolean;
|
||||
|
||||
/* Capability options common to encoder and decoder: */
|
||||
|
||||
#define DCT_ISLOW_SUPPORTED /* slow but accurate integer algorithm */
|
||||
#define DCT_IFAST_SUPPORTED /* faster, less accurate integer method */
|
||||
#define DCT_FLOAT_SUPPORTED /* floating-point: accurate, fast on fast HW */
|
||||
#define DCT_ISLOW_SUPPORTED /* accurate integer method */
|
||||
#define DCT_IFAST_SUPPORTED /* less accurate int method [legacy feature] */
|
||||
#define DCT_FLOAT_SUPPORTED /* floating-point method [legacy feature] */
|
||||
|
||||
/* Encoder capability options: */
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@
|
||||
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||
* Modified 2002-2009 by Guido Vollbeding.
|
||||
* libjpeg-turbo Modifications:
|
||||
* Copyright (C) 2009-2011, 2013-2014, 2016-2017, D. R. Commander.
|
||||
* Copyright (C) 2009-2011, 2013-2014, 2016-2017, 2020, D. R. Commander.
|
||||
* Copyright (C) 2015, Google, Inc.
|
||||
* For conditions of distribution and use, see the accompanying README.ijg
|
||||
* file.
|
||||
@@ -244,9 +244,9 @@ typedef enum {
|
||||
/* DCT/IDCT algorithm options. */
|
||||
|
||||
typedef enum {
|
||||
JDCT_ISLOW, /* slow but accurate integer algorithm */
|
||||
JDCT_IFAST, /* faster, less accurate integer method */
|
||||
JDCT_FLOAT /* floating-point: accurate, fast on fast HW */
|
||||
JDCT_ISLOW, /* accurate integer method */
|
||||
JDCT_IFAST, /* less accurate integer method [legacy feature] */
|
||||
JDCT_FLOAT /* floating-point method [legacy feature] */
|
||||
} J_DCT_METHOD;
|
||||
|
||||
#ifndef JDCT_DEFAULT /* may be overridden in jconfig.h */
|
||||
|
||||
82
libjpeg.txt
82
libjpeg.txt
@@ -969,30 +969,38 @@ boolean arith_code
|
||||
|
||||
J_DCT_METHOD dct_method
|
||||
Selects the algorithm used for the DCT step. Choices are:
|
||||
JDCT_ISLOW: slow but accurate integer algorithm
|
||||
JDCT_IFAST: faster, less accurate integer method
|
||||
JDCT_FLOAT: floating-point method
|
||||
JDCT_ISLOW: accurate integer method
|
||||
JDCT_IFAST: less accurate integer method [legacy feature]
|
||||
JDCT_FLOAT: floating-point method [legacy feature]
|
||||
JDCT_DEFAULT: default method (normally JDCT_ISLOW)
|
||||
JDCT_FASTEST: fastest method (normally JDCT_IFAST)
|
||||
In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
|
||||
JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
|
||||
with other SIMD implementations, or when using libjpeg-turbo without
|
||||
SIMD extensions.) For quality levels of 90 and below, there should be
|
||||
little or no perceptible difference between the two algorithms. For
|
||||
quality levels above 90, however, the difference between JDCT_IFAST and
|
||||
When the Independent JPEG Group's software was first released in 1991,
|
||||
the compression time for a 1-megapixel JPEG image on a mainstream PC
|
||||
was measured in minutes. Thus, JDCT_IFAST provided noticeable
|
||||
performance benefits. On modern CPUs running libjpeg-turbo, however,
|
||||
the compression time for a 1-megapixel JPEG image is measured in
|
||||
milliseconds, and thus the performance benefits of JDCT_IFAST are much
|
||||
less noticeable. On modern x86/x86-64 CPUs that support AVX2
|
||||
instructions, JDCT_IFAST and JDCT_ISLOW have similar performance. On
|
||||
other types of CPUs, JDCT_IFAST is generally about 5-15% faster than
|
||||
JDCT_ISLOW.
|
||||
|
||||
For quality levels of 90 and below, there should be little or no
|
||||
perceptible quality difference between the two algorithms. For quality
|
||||
levels above 90, however, the difference between JDCT_IFAST and
|
||||
JDCT_ISLOW becomes more pronounced. With quality=97, for instance,
|
||||
JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to
|
||||
JDCT_IFAST incurs generally about a 1-3 dB loss in PSNR relative to
|
||||
JDCT_ISLOW, but this can be larger for some images. Do not use
|
||||
JDCT_IFAST with quality levels above 97. The algorithm often
|
||||
degenerates at quality=98 and above and can actually produce a more
|
||||
lossy image than if lower quality levels had been used. Also, in
|
||||
libjpeg-turbo, JDCT_IFAST is not fully accelerated for quality levels
|
||||
above 97, so it will be slower than JDCT_ISLOW. JDCT_FLOAT is mainly a
|
||||
legacy feature. It does not produce significantly more accurate
|
||||
results than the ISLOW method, and it is much slower. The FLOAT method
|
||||
may also give different results on different machines due to varying
|
||||
roundoff behavior, whereas the integer methods should give the same
|
||||
results on all machines.
|
||||
above 97, so it will be slower than JDCT_ISLOW.
|
||||
|
||||
JDCT_FLOAT does not produce significantly more accurate results than
|
||||
JDCT_ISLOW, and it is much slower. JDCT_FLOAT may also give different
|
||||
results on different machines due to varying roundoff behavior, whereas
|
||||
the integer methods should give the same results on all machines.
|
||||
|
||||
J_COLOR_SPACE jpeg_color_space
|
||||
int num_components
|
||||
@@ -1270,31 +1278,39 @@ Additional decompression parameters that the application may set include:
|
||||
|
||||
J_DCT_METHOD dct_method
|
||||
Selects the algorithm used for the DCT step. Choices are:
|
||||
JDCT_ISLOW: slow but accurate integer algorithm
|
||||
JDCT_IFAST: faster, less accurate integer method
|
||||
JDCT_FLOAT: floating-point method
|
||||
JDCT_ISLOW: accurate integer method
|
||||
JDCT_IFAST: less accurate integer method [legacy feature]
|
||||
JDCT_FLOAT: floating-point method [legacy feature]
|
||||
JDCT_DEFAULT: default method (normally JDCT_ISLOW)
|
||||
JDCT_FASTEST: fastest method (normally JDCT_IFAST)
|
||||
In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
|
||||
JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
|
||||
with other SIMD implementations, or when using libjpeg-turbo without
|
||||
SIMD extensions.) If the JPEG image was compressed using a quality
|
||||
level of 85 or below, then there should be little or no perceptible
|
||||
difference between the two algorithms. When decompressing images that
|
||||
were compressed using quality levels above 85, however, the difference
|
||||
When the Independent JPEG Group's software was first released in 1991,
|
||||
the decompression time for a 1-megapixel JPEG image on a mainstream PC
|
||||
was measured in minutes. Thus, JDCT_IFAST provided noticeable
|
||||
performance benefits. On modern CPUs running libjpeg-turbo, however,
|
||||
the decompression time for a 1-megapixel JPEG image is measured in
|
||||
milliseconds, and thus the performance benefits of JDCT_IFAST are much
|
||||
less noticeable. On modern x86/x86-64 CPUs that support AVX2
|
||||
instructions, JDCT_IFAST and JDCT_ISLOW have similar performance. On
|
||||
other types of CPUs, JDCT_IFAST is generally about 5-15% faster than
|
||||
JDCT_ISLOW.
|
||||
|
||||
If the JPEG image was compressed using a quality level of 85 or below,
|
||||
then there should be little or no perceptible quality difference
|
||||
between the two algorithms. When decompressing images that were
|
||||
compressed using quality levels above 85, however, the difference
|
||||
between JDCT_IFAST and JDCT_ISLOW becomes more pronounced. With images
|
||||
compressed using quality=97, for instance, JDCT_IFAST incurs generally
|
||||
about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be
|
||||
about a 4-6 dB loss in PSNR relative to JDCT_ISLOW, but this can be
|
||||
larger for some images. If you can avoid it, do not use JDCT_IFAST
|
||||
when decompressing images that were compressed using quality levels
|
||||
above 97. The algorithm often degenerates for such images and can
|
||||
actually produce a more lossy output image than if the JPEG image had
|
||||
been compressed using lower quality levels. JDCT_FLOAT is mainly a
|
||||
legacy feature. It does not produce significantly more accurate
|
||||
results than the ISLOW method, and it is much slower. The FLOAT method
|
||||
may also give different results on different machines due to varying
|
||||
roundoff behavior, whereas the integer methods should give the same
|
||||
results on all machines.
|
||||
been compressed using lower quality levels.
|
||||
|
||||
JDCT_FLOAT does not produce significantly more accurate results than
|
||||
JDCT_ISLOW, and it is much slower. JDCT_FLOAT may also give different
|
||||
results on different machines due to varying roundoff behavior, whereas
|
||||
the integer methods should give the same results on all machines.
|
||||
|
||||
boolean do_fancy_upsampling
|
||||
If TRUE, do careful upsampling of chroma components. If FALSE,
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
* Author: Siarhei Siamashka <siarhei.siamashka@nokia.com>
|
||||
* Copyright (C) 2013-2014, Linaro Limited. All Rights Reserved.
|
||||
* Author: Ragesh Radhakrishnan <ragesh.r@linaro.org>
|
||||
* Copyright (C) 2014-2016, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2014-2016, 2020, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2015-2016, 2018, Matthieu Darbois. All Rights Reserved.
|
||||
* Copyright (C) 2016, Siarhei Siamashka. All Rights Reserved.
|
||||
*
|
||||
@@ -2373,7 +2373,7 @@ asm_function jsimd_convsamp_neon
|
||||
/*
|
||||
* jsimd_fdct_islow_neon
|
||||
*
|
||||
* This file contains a slow-but-accurate integer implementation of the
|
||||
* This file contains a slower but more accurate integer implementation of the
|
||||
* forward DCT (Discrete Cosine Transform). The following code is based
|
||||
* directly on the IJG''s original jfdctint.c; see the jfdctint.c for
|
||||
* more details.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jfdctint.asm - accurate integer FDCT (AVX2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2009, 2016, 2018, D. R. Commander.
|
||||
; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||
; more details.
|
||||
@@ -103,7 +103,7 @@ F_3_072 equ DESCALE(3299298341, 30 - CONST_BITS) ; FIX(3.072711026)
|
||||
%endmacro
|
||||
|
||||
; --------------------------------------------------------------------------
|
||||
; In-place 8x8x16-bit slow integer forward DCT using AVX2 instructions
|
||||
; In-place 8x8x16-bit accurate integer forward DCT using AVX2 instructions
|
||||
; %1-%4: Input/output registers
|
||||
; %5-%8: Temp registers
|
||||
; %9: Pass (1 or 2)
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jfdctint.asm - accurate integer FDCT (MMX)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2016, D. R. Commander.
|
||||
; Copyright (C) 2016, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||
; more details.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jfdctint.asm - accurate integer FDCT (SSE2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2016, D. R. Commander.
|
||||
; Copyright (C) 2016, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||
; more details.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jidctint.asm - accurate integer IDCT (AVX2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2009, 2016, 2018, D. R. Commander.
|
||||
; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||
; more details.
|
||||
@@ -113,7 +113,7 @@ F_3_072 equ DESCALE(3299298341, 30 - CONST_BITS) ; FIX(3.072711026)
|
||||
%endmacro
|
||||
|
||||
; --------------------------------------------------------------------------
|
||||
; In-place 8x8x16-bit slow integer inverse DCT using AVX2 instructions
|
||||
; In-place 8x8x16-bit accurate integer inverse DCT using AVX2 instructions
|
||||
; %1-%4: Input/output registers
|
||||
; %5-%12: Temp registers
|
||||
; %9: Pass (1 or 2)
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jidctint.asm - accurate integer IDCT (MMX)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2016, D. R. Commander.
|
||||
; Copyright (C) 2016, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||
; more details.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jidctint.asm - accurate integer IDCT (SSE2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2016, D. R. Commander.
|
||||
; Copyright (C) 2016, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||
; more details.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* simd/jsimd.h
|
||||
*
|
||||
* Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
* Copyright (C) 2011, 2014-2016, 2018, D. R. Commander.
|
||||
* Copyright (C) 2011, 2014-2016, 2018, 2020, D. R. Commander.
|
||||
* Copyright (C) 2013-2014, MIPS Technologies, Inc., California.
|
||||
* Copyright (C) 2014, Linaro Limited.
|
||||
* Copyright (C) 2015-2016, 2018, Matthieu Darbois.
|
||||
@@ -951,7 +951,7 @@ EXTERN(void) jsimd_convsamp_float_sse2
|
||||
EXTERN(void) jsimd_convsamp_float_dspr2
|
||||
(JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace);
|
||||
|
||||
/* Slow Integer Forward DCT */
|
||||
/* Accurate Integer Forward DCT */
|
||||
EXTERN(void) jsimd_fdct_islow_mmx(DCTELEM *data);
|
||||
|
||||
extern const int jconst_fdct_islow_sse2[];
|
||||
@@ -1060,7 +1060,7 @@ EXTERN(void) jsimd_idct_12x12_pass1_dspr2
|
||||
EXTERN(void) jsimd_idct_12x12_pass2_dspr2
|
||||
(int *workspace, int *output);
|
||||
|
||||
/* Slow Integer Inverse DCT */
|
||||
/* Accurate Integer Inverse DCT */
|
||||
EXTERN(void) jsimd_idct_islow_mmx
|
||||
(void *dct_table, JCOEFPTR coef_block, JSAMPARRAY output_buf,
|
||||
JDIMENSION output_col);
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
/*
|
||||
* Loongson MMI optimizations for libjpeg-turbo
|
||||
*
|
||||
* Copyright (C) 2014, 2018, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2014, 2018, 2020, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing.
|
||||
* All Rights Reserved.
|
||||
* Authors: ZhuChen <zhuchen@loongson.cn>
|
||||
@@ -28,7 +28,7 @@
|
||||
* 3. This notice may not be removed or altered from any source distribution.
|
||||
*/
|
||||
|
||||
/* SLOW INTEGER FORWARD DCT */
|
||||
/* ACCURATE INTEGER FORWARD DCT */
|
||||
|
||||
#include "jsimd_mmi.h"
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
/*
|
||||
* Loongson MMI optimizations for libjpeg-turbo
|
||||
*
|
||||
* Copyright (C) 2014-2015, 2018, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2014-2015, 2018, 2020, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2016-2017, Loongson Technology Corporation Limited, BeiJing.
|
||||
* All Rights Reserved.
|
||||
* Authors: ZhuChen <zhuchen@loongson.cn>
|
||||
@@ -28,7 +28,7 @@
|
||||
* 3. This notice may not be removed or altered from any source distribution.
|
||||
*/
|
||||
|
||||
/* SLOW INTEGER INVERSE DCT */
|
||||
/* ACCUATE INTEGER INVERSE DCT */
|
||||
|
||||
#include "jsimd_mmi.h"
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
/*
|
||||
* AltiVec optimizations for libjpeg-turbo
|
||||
*
|
||||
* Copyright (C) 2014, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2014, 2020, D. R. Commander. All Rights Reserved.
|
||||
*
|
||||
* This software is provided 'as-is', without any express or implied
|
||||
* warranty. In no event will the authors be held liable for any damages
|
||||
@@ -20,7 +20,7 @@
|
||||
* 3. This notice may not be removed or altered from any source distribution.
|
||||
*/
|
||||
|
||||
/* SLOW INTEGER FORWARD DCT */
|
||||
/* ACCURATE INTEGER FORWARD DCT */
|
||||
|
||||
#include "jsimd_altivec.h"
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
/*
|
||||
* AltiVec optimizations for libjpeg-turbo
|
||||
*
|
||||
* Copyright (C) 2014-2015, D. R. Commander. All Rights Reserved.
|
||||
* Copyright (C) 2014-2015, 2020, D. R. Commander. All Rights Reserved.
|
||||
*
|
||||
* This software is provided 'as-is', without any express or implied
|
||||
* warranty. In no event will the authors be held liable for any damages
|
||||
@@ -20,7 +20,7 @@
|
||||
* 3. This notice may not be removed or altered from any source distribution.
|
||||
*/
|
||||
|
||||
/* SLOW INTEGER INVERSE DCT */
|
||||
/* ACCURATE INTEGER INVERSE DCT */
|
||||
|
||||
#include "jsimd_altivec.h"
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jfdctint.asm - accurate integer FDCT (64-bit AVX2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2009, 2016, 2018, D. R. Commander.
|
||||
; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||
; more details.
|
||||
@@ -103,7 +103,7 @@ F_3_072 equ DESCALE(3299298341, 30 - CONST_BITS) ; FIX(3.072711026)
|
||||
%endmacro
|
||||
|
||||
; --------------------------------------------------------------------------
|
||||
; In-place 8x8x16-bit slow integer forward DCT using AVX2 instructions
|
||||
; In-place 8x8x16-bit accurate integer forward DCT using AVX2 instructions
|
||||
; %1-%4: Input/output registers
|
||||
; %5-%8: Temp registers
|
||||
; %9: Pass (1 or 2)
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jfdctint.asm - accurate integer FDCT (64-bit SSE2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2009, 2016, D. R. Commander.
|
||||
; Copyright (C) 2009, 2016, 2020, D. R. Commander.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||
@@ -14,7 +14,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||
; more details.
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jidctint.asm - accurate integer IDCT (64-bit AVX2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2009, 2016, 2018, D. R. Commander.
|
||||
; Copyright (C) 2009, 2016, 2018, 2020, D. R. Commander.
|
||||
; Copyright (C) 2018, Matthias Räncker.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
@@ -15,7 +15,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||
; more details.
|
||||
@@ -114,7 +114,7 @@ F_3_072 equ DESCALE(3299298341, 30 - CONST_BITS) ; FIX(3.072711026)
|
||||
%endmacro
|
||||
|
||||
; --------------------------------------------------------------------------
|
||||
; In-place 8x8x16-bit slow integer inverse DCT using AVX2 instructions
|
||||
; In-place 8x8x16-bit accurate integer inverse DCT using AVX2 instructions
|
||||
; %1-%4: Input/output registers
|
||||
; %5-%12: Temp registers
|
||||
; %9: Pass (1 or 2)
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
; jidctint.asm - accurate integer IDCT (64-bit SSE2)
|
||||
;
|
||||
; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB
|
||||
; Copyright (C) 2009, 2016, D. R. Commander.
|
||||
; Copyright (C) 2009, 2016, 2020, D. R. Commander.
|
||||
; Copyright (C) 2018, Matthias Räncker.
|
||||
;
|
||||
; Based on the x86 SIMD extension for IJG JPEG library
|
||||
@@ -15,7 +15,7 @@
|
||||
; NASM is available from http://nasm.sourceforge.net/ or
|
||||
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||
;
|
||||
; This file contains a slow-but-accurate integer implementation of the
|
||||
; This file contains a slower but more accurate integer implementation of the
|
||||
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||
; more details.
|
||||
|
||||
135
usage.txt
135
usage.txt
@@ -168,35 +168,43 @@ Switches for advanced users:
|
||||
be unable to view an arithmetic coded JPEG file at
|
||||
all.
|
||||
|
||||
-dct int Use integer DCT method (default).
|
||||
-dct fast Use fast integer DCT (less accurate).
|
||||
In libjpeg-turbo, the fast method is generally about
|
||||
5-15% faster than the int method when using the
|
||||
x86/x86-64 SIMD extensions (results may vary with other
|
||||
SIMD implementations, or when using libjpeg-turbo
|
||||
without SIMD extensions.) For quality levels of 90 and
|
||||
below, there should be little or no perceptible
|
||||
difference between the two algorithms. For quality
|
||||
levels above 90, however, the difference between
|
||||
the fast and the int methods becomes more pronounced.
|
||||
With quality=97, for instance, the fast method incurs
|
||||
generally about a 1-3 dB loss (in PSNR) relative to
|
||||
the int method, but this can be larger for some images.
|
||||
Do not use the fast method with quality levels above
|
||||
97. The algorithm often degenerates at quality=98 and
|
||||
above and can actually produce a more lossy image than
|
||||
if lower quality levels had been used. Also, in
|
||||
libjpeg-turbo, the fast method is not fully accerated
|
||||
for quality levels above 97, so it will be slower than
|
||||
the int method.
|
||||
-dct float Use floating-point DCT method.
|
||||
The float method is mainly a legacy feature. It does
|
||||
not produce significantly more accurate results than
|
||||
the int method, and it is much slower. The float
|
||||
method may also give different results on different
|
||||
machines due to varying roundoff behavior, whereas the
|
||||
integer methods should give the same results on all
|
||||
machines.
|
||||
-dct int Use accurate integer DCT method (default).
|
||||
-dct fast Use less accurate integer DCT method [legacy feature].
|
||||
When the Independent JPEG Group's software was first
|
||||
released in 1991, the compression time for a
|
||||
1-megapixel JPEG image on a mainstream PC was measured
|
||||
in minutes. Thus, the fast integer DCT algorithm
|
||||
provided noticeable performance benefits. On modern
|
||||
CPUs running libjpeg-turbo, however, the compression
|
||||
time for a 1-megapixel JPEG image is measured in
|
||||
milliseconds, and thus the performance benefits of the
|
||||
fast algorithm are much less noticeable. On modern
|
||||
x86/x86-64 CPUs that support AVX2 instructions, the
|
||||
fast and int methods have similar performance. On
|
||||
other types of CPUs, the fast method is generally about
|
||||
5-15% faster than the int method.
|
||||
|
||||
For quality levels of 90 and below, there should be
|
||||
little or no perceptible quality difference between the
|
||||
two algorithms. For quality levels above 90, however,
|
||||
the difference between the fast and int methods becomes
|
||||
more pronounced. With quality=97, for instance, the
|
||||
fast method incurs generally about a 1-3 dB loss in
|
||||
PSNR relative to the int method, but this can be larger
|
||||
for some images. Do not use the fast method with
|
||||
quality levels above 97. The algorithm often
|
||||
degenerates at quality=98 and above and can actually
|
||||
produce a more lossy image than if lower quality levels
|
||||
had been used. Also, in libjpeg-turbo, the fast method
|
||||
is not fully accelerated for quality levels above 97,
|
||||
so it will be slower than the int method.
|
||||
-dct float Use floating-point DCT method [legacy feature].
|
||||
The float method does not produce significantly more
|
||||
accurate results than the int method, and it is much
|
||||
slower. The float method may also give different
|
||||
results on different machines due to varying roundoff
|
||||
behavior, whereas the integer methods should give the
|
||||
same results on all machines.
|
||||
|
||||
-restart N Emit a JPEG restart marker every N MCU rows, or every
|
||||
N MCU blocks if "B" is attached to the number.
|
||||
@@ -318,36 +326,45 @@ The basic command line switches for djpeg are:
|
||||
|
||||
Switches for advanced users:
|
||||
|
||||
-dct int Use integer DCT method (default).
|
||||
-dct fast Use fast integer DCT (less accurate).
|
||||
In libjpeg-turbo, the fast method is generally about
|
||||
5-15% faster than the int method when using the
|
||||
x86/x86-64 SIMD extensions (results may vary with other
|
||||
SIMD implementations, or when using libjpeg-turbo
|
||||
without SIMD extensions.) If the JPEG image was
|
||||
compressed using a quality level of 85 or below, then
|
||||
there should be little or no perceptible difference
|
||||
between the two algorithms. When decompressing images
|
||||
that were compressed using quality levels above 85,
|
||||
however, the difference between the fast and int
|
||||
methods becomes more pronounced. With images
|
||||
compressed using quality=97, for instance, the fast
|
||||
method incurs generally about a 4-6 dB loss (in PSNR)
|
||||
relative to the int method, but this can be larger for
|
||||
some images. If you can avoid it, do not use the fast
|
||||
method when decompressing images that were compressed
|
||||
using quality levels above 97. The algorithm often
|
||||
degenerates for such images and can actually produce
|
||||
a more lossy output image than if the JPEG image had
|
||||
been compressed using lower quality levels.
|
||||
-dct float Use floating-point DCT method.
|
||||
The float method is mainly a legacy feature. It does
|
||||
not produce significantly more accurate results than
|
||||
the int method, and it is much slower. The float
|
||||
method may also give different results on different
|
||||
machines due to varying roundoff behavior, whereas the
|
||||
integer methods should give the same results on all
|
||||
machines.
|
||||
-dct int Use accurate integer DCT method (default).
|
||||
-dct fast Use less accurate integer DCT method [legacy feature].
|
||||
When the Independent JPEG Group's software was first
|
||||
released in 1991, the decompression time for a
|
||||
1-megapixel JPEG image on a mainstream PC was measured
|
||||
in minutes. Thus, the fast integer DCT algorithm
|
||||
provided noticeable performance benefits. On modern
|
||||
CPUs running libjpeg-turbo, however, the decompression
|
||||
time for a 1-megapixel JPEG image is measured in
|
||||
milliseconds, and thus the performance benefits of the
|
||||
fast algorithm are much less noticeable. On modern
|
||||
x86/x86-64 CPUs that support AVX2 instructions, the
|
||||
fast and int methods have similar performance. On
|
||||
other types of CPUs, the fast method is generally about
|
||||
5-15% faster than the int method.
|
||||
|
||||
If the JPEG image was compressed using a quality level
|
||||
of 85 or below, then there should be little or no
|
||||
perceptible quality difference between the two
|
||||
algorithms. When decompressing images that were
|
||||
compressed using quality levels above 85, however, the
|
||||
difference between the fast and int methods becomes
|
||||
more pronounced. With images compressed using
|
||||
quality=97, for instance, the fast method incurs
|
||||
generally about a 4-6 dB loss in PSNR relative to the
|
||||
int method, but this can be larger for some images. If
|
||||
you can avoid it, do not use the fast method when
|
||||
decompressing images that were compressed using quality
|
||||
levels above 97. The algorithm often degenerates for
|
||||
such images and can actually produce a more lossy
|
||||
output image than if the JPEG image had been compressed
|
||||
using lower quality levels.
|
||||
-dct float Use floating-point DCT method [legacy feature].
|
||||
The float method does not produce significantly more
|
||||
accurate results than the int method, and it is much
|
||||
slower. The float method may also give different
|
||||
results on different machines due to varying roundoff
|
||||
behavior, whereas the integer methods should give the
|
||||
same results on all machines.
|
||||
|
||||
-dither fs Use Floyd-Steinberg dithering in color quantization.
|
||||
-dither ordered Use ordered dithering in color quantization.
|
||||
|
||||
Reference in New Issue
Block a user