Provide a more thorough description of the trade-offs between the various DCT/IDCT algorithms, based on new resarch

git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.3.x@1286 632fc199-4ca6-4c93-a231-07263d6284db
2014-05-11 09:46:28 +00:00
parent 311f16e848
commit 6b48fbd229
5 changed files with 144 additions and 50 deletions
--- a/README-turbo.txt
+++ b/README-turbo.txt
@@ -419,10 +419,16 @@ details.
 For the most part, libjpeg-turbo should produce identical output to libjpeg
 v6b.  The one exception to this is when using the floating point DCT/IDCT, in
-which case the outputs of libjpeg v6b and libjpeg-turbo are not guaranteed to
+which case the outputs of libjpeg v6b and libjpeg-turbo can differ for the
-be identical (the accuracy of the floating point DCT/IDCT is constant when
+following reasons:
-using libjpeg-turbo's SIMD extensions, but otherwise, it can depend heavily on
+
-the compiler and compiler settings.)
+-- The SSE/SSE2 floating point DCT implementation in libjpeg-turbo is ever so
   slightly more accurate than the implementation in libjpeg v6b, but not by
   any amount perceptible to human vision (generally in the range of 0.01 to
   0.08 dB gain in PNSR.)
 -- When not using the SIMD extensions, then the accuracy of the floating point
   DCT/IDCT can depend on the compiler and compiler settings.
 While libjpeg-turbo does emulate the libjpeg v8 API/ABI, under the hood, it is
 still using the same algorithms as libjpeg v6b, so there are several specific
@@ -430,12 +436,14 @@ cases in which libjpeg-turbo cannot be expected to produce the same output as
 libjpeg v8:
 -- When decompressing using scaling factors of 1/2 and 1/4, because libjpeg v8
-   implements those scaling algorithms a bit differently than libjpeg v6b does,
+   implements those scaling algorithms differently than libjpeg v6b does, and
-   and libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
+   libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
 -- When using chrominance subsampling, because libjpeg v8 implements this
   with its DCT/IDCT scaling algorithms rather than with a separate
-   downsampling/upsampling algorithm.
+   downsampling/upsampling algorithm.  In our testing, the subsampled/upsampled
   output of libjpeg v8 is less accurate than that of libjpeg v6b for this
   reason.
 -- When using the floating point IDCT, for the reasons stated above and also
   because the floating point IDCT algorithm was modified in libjpeg v8a to
--- a/cjpeg.1
+++ b/cjpeg.1
@@ -1,4 +1,4 @@
-.TH CJPEG 1 "18 January 2013"
+.TH CJPEG 1 "11 May 2014"
 .SH NAME
 cjpeg \- compress an image file to a JPEG file
 .SH SYNOPSIS
@@ -166,14 +166,25 @@ Use integer DCT method (default).
 .TP
 .B \-dct fast
 Use fast integer DCT (less accurate).
 In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
 method when using the x86/x86-64 SIMD extensions (results may vary with other
 SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)
 For quality levels of 90 and below, there should be little or no perceptible
 difference between the two algorithms.  For quality levels above 90, however,
 the difference between the fast and the int methods becomes more pronounced.
 With quality=97, for instance, the fast method incurs generally about a 1-3 dB
 loss (in PSNR) relative to the int method, but this can be larger for some
 images.  Do not use the fast method with quality levels above 97.  The
 algorithm often degenerates at quality=98 and above and can actually produce a
 more lossy image than if lower quality levels had been used.
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The float method is very slightly more accurate than the int method, but is
+The float method is mostly a legacy feature.  It does not produce significantly
-much slower unless your machine has very fast floating-point hardware.  Also
+more accurate results than the int method, and it is much slower.  The float
-note that results of the floating-point method may vary slightly across
+method may also give different results on different machines due to varying
-machines, while the integer methods should give the same results everywhere.
+roundoff behavior, whereas the integer methods should give the same results on
-The fast integer method is much less accurate than the other two.
+all machines.
 .TP
 .BI \-restart " N"
 Emit a JPEG restart marker every N MCU rows, or every N MCU blocks if "B" is
--- a/djpeg.1
+++ b/djpeg.1
@@ -1,4 +1,4 @@
-.TH DJPEG 1 "18 January 2013"
+.TH DJPEG 1 "11 May 2014"
 .SH NAME
 djpeg \- decompress a JPEG file to an image file
 .SH SYNOPSIS
@@ -115,14 +115,28 @@ Use integer DCT method (default).
 .TP
 .B \-dct fast
 Use fast integer DCT (less accurate).
 In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
 method when using the x86/x86-64 SIMD extensions (results may vary with other
 SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)  If
 the JPEG image was compressed using a quality level of 85 or below, then there
 should be little or no perceptible difference between the two algorithms.  When
 decompressing images that were compressed using quality levels above 85,
 however, the difference between the fast and int methods becomes more
 pronounced.  With images compressed using quality=97, for instance, the fast
 method incurs generally about a 4-6 dB loss (in PSNR) relative to the int
 method, but this can be larger for some images.  If you can avoid it, do not
 use the fast method when decompressing images that were compressed using
 quality levels above 97.  The algorithm often degenerates for such images and
 can actually produce a more lossy output image than if the JPEG image had been
 compressed using lower quality levels.
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The float method is very slightly more accurate than the int method, but is
+The float method is mostly a legacy feature.  It does not produce significantly
-much slower unless your machine has very fast floating-point hardware.  Also
+more accurate results than the int method, and it is much slower.  The float
-note that results of the floating-point method may vary slightly across
+method may also give different results on different machines due to varying
-machines, while the integer methods should give the same results everywhere.
+roundoff behavior, whereas the integer methods should give the same results on
-The fast integer method is much less accurate than the other two.
+all machines.
 .TP
 .B \-dither fs
 Use Floyd-Steinberg dithering in color quantization.
--- a/libjpeg.txt
+++ b/libjpeg.txt
@@ -3,7 +3,7 @@ USING THE IJG JPEG LIBRARY
 This file was part of the Independent JPEG Group's software:
 Copyright (C) 1994-2011, Thomas G. Lane, Guido Vollbeding.
 Modifications:
-Copyright (C) 2010, D. R. Commander.
+Copyright (C) 2010, 2014, D. R. Commander.
 For conditions of distribution and use, see the accompanying README file.
@@ -886,14 +886,23 @@ J_DCT_METHOD dct_method
                JDCT_FLOAT: floating-point method
                JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-        The FLOAT method is very slightly more accurate than the ISLOW method,
+        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
-        but may give different results on different machines due to varying
+        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
-        roundoff behavior.  The integer methods should give the same results
+        with other SIMD implementations, or when using libjpeg-turbo without
-        on all machines.  On machines with sufficiently fast FP hardware, the
+        SIMD extensions.)  For quality levels of 90 and below, there should be
-        floating-point method may also be the fastest.  The IFAST method is
+        little or no perceptible difference between the two algorithms.  For
-        considerably less accurate than the other two; its use is not
+        quality levels above 90, however, the difference between JDCT_IFAST and
-        recommended if high quality is a concern.  JDCT_DEFAULT and
+        JDCT_ISLOW becomes more pronounced.  With quality=97, for instance,
-        JDCT_FASTEST are macros configurable by each installation.
+        JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to
        JDCT_ISLOW, but this can be larger for some images.  Do not use
        JDCT_IFAST with quality levels above 97.  The algorithm often
        degenerates at quality=98 and above and can actually produce a more
        lossy image than if lower quality levels had been used.  JDCT_FLOAT is
        mostly a legacy feature.  It does not produce significantly more
        accurate results than the ISLOW method, and it is much slower.  The
        FLOAT method may also give different results on different machines due
        to varying roundoff behavior, whereas the integer methods should give
        the same results on all machines.
 J_COLOR_SPACE jpeg_color_space
 int num_components
@@ -1170,8 +1179,32 @@ int actual_number_of_colors
 Additional decompression parameters that the application may set include:
 J_DCT_METHOD dct_method
-        Selects the algorithm used for the DCT step.  Choices are the same
+        Selects the algorithm used for the DCT step.  Choices are:
-        as described above for compression.
+                JDCT_ISLOW: slow but accurate integer algorithm
                JDCT_IFAST: faster, less accurate integer method
                JDCT_FLOAT: floating-point method
                JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                JDCT_FASTEST: fastest method (normally JDCT_IFAST)
        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
        with other SIMD implementations, or when using libjpeg-turbo without
        SIMD extensions.)  If the JPEG image was compressed using a quality
        level of 85 or below, then there should be little or no perceptible
        difference between the two algorithms.  When decompressing images that
        were compressed using quality levels above 85, however, the difference
        between JDCT_IFAST and JDCT_ISLOW becomes more pronounced.  With images
        compressed using quality=97, for instance, JDCT_IFAST incurs generally
        about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be
        larger for some images.  If you can avoid it, do not use JDCT_IFAST
        when decompressing images that were compressed using quality levels
        above 97.  The algorithm often degenerates for such images and can
        actually produce a more lossy output image than if the JPEG image had
        been compressed using lower quality levels.  JDCT_FLOAT is mostly a
        legacy feature.  It does not produce significantly more accurate
        results than the ISLOW method, and it is much slower.  The FLOAT method
        may also give different results on different machines due to varying
        roundoff behavior, whereas the integer methods should give the same
        results on all machines.
 boolean do_fancy_upsampling
        If TRUE, do careful upsampling of chroma components.  If FALSE,
--- a/usage.txt
+++ b/usage.txt
@@ -172,13 +172,28 @@ Switches for advanced users:
        -dct int        Use integer DCT method (default).
        -dct fast       Use fast integer DCT (less accurate).
        -dct float      Use floating-point DCT method.
-                        The float method is very slightly more accurate than
+                        In libjpeg-turbo, the fast method is generally about
-                        the int method, but is much slower unless your machine
+                        5-15% faster than the int method when using the
-                        has very fast floating-point hardware.  Also note that
+                        x86/x86-64 SIMD extensions (results may vary with other
-                        results of the floating-point method may vary slightly
+                        SIMD implementations, or when using libjpeg-turbo
-                        across machines, while the integer methods should give
+                        without SIMD extensions.)  For quality levels of 90 and
-                        the same results everywhere.  The fast integer method
+                        below, there should be little or no perceptible
-                        is much less accurate than the other two.
+                        difference between the two algorithms.  For quality
                        levels above 90, however, the difference between
                        the fast and the int methods becomes more pronounced.
                        With quality=97, for instance, the fast method incurs
                        generally about a 1-3 dB loss (in PSNR) relative to
                        the int method, but this can be larger for some images.
                        Do not use the fast method with quality levels above
                        97.  The algorithm often degenerates at quality=98 and
                        above and can actually produce a more lossy image than
                        if lower quality levels had been used.  The float
                        method is mostly a legacy feature.  It does not produce
                        significantly more accurate results than the int
                        method, and it is much slower.  The float method may
                        also give different results on different machines due
                        to varying roundoff behavior, whereas the integer
                        methods should give the same results on all machines.
        -restart N      Emit a JPEG restart marker every N MCU rows, or every
                        N MCU blocks if "B" is attached to the number.
@@ -296,13 +311,32 @@ Switches for advanced users:
        -dct int        Use integer DCT method (default).
        -dct fast       Use fast integer DCT (less accurate).
        -dct float      Use floating-point DCT method.
-                        The float method is very slightly more accurate than
+                        In libjpeg-turbo, the fast method is generally about
-                        the int method, but is much slower unless your machine
+                        5-15% faster than the int method when using the
-                        has very fast floating-point hardware.  Also note that
+                        x86/x86-64 SIMD extensions (results may vary with other
-                        results of the floating-point method may vary slightly
+                        SIMD implementations, or when using libjpeg-turbo
-                        across machines, while the integer methods should give
+                        without SIMD extensions.)  If the JPEG image was
-                        the same results everywhere.  The fast integer method
+                        compressed using a quality level of 85 or below, then
-                        is much less accurate than the other two.
+                        there should be little or no perceptible difference
                        between the two algorithms.  When decompressing images
                        that were compressed using quality levels above 85,
                        however, the difference between the fast and int
                        methods becomes more pronounced.  With images
                        compressed using quality=97, for instance, the fast
                        method incurs generally about a 4-6 dB loss (in PSNR)
                        relative to the int method, but this can be larger for
                        some images.  If you can avoid it, do not use the fast
                        method when decompressing images that were compressed
                        using quality levels above 97.  The algorithm often
                        degenerates for such images and can actually produce
                        a more lossy output image than if the JPEG image had
                        been compressed using lower quality levels.  The float
                        method is mostly a legacy feature.  It does not produce
                        significantly more accurate results than the int
                        method, and it is much slower.  The float method may
                        also give different results on different machines due
                        to varying roundoff behavior, whereas the integer
                        methods should give the same results on all machines.
        -dither fs      Use Floyd-Steinberg dithering in color quantization.
        -dither ordered Use ordered dithering in color quantization.
@@ -381,12 +415,6 @@ When producing a color-quantized image, "-onepass -dither ordered" is fast but
 much lower quality than the default behavior.  "-dither none" may give
 acceptable results in two-pass mode, but is seldom tolerable in one-pass mode.
 If you are fortunate enough to have very fast floating point hardware,
 "-dct float" may be even faster than "-dct fast".  But on most machines
 "-dct float" is slower than "-dct int"; in this case it is not worth using,
 because its theoretical accuracy advantage is too small to be significant
 in practice.
 Two-pass color quantization requires a good deal of memory; on MS-DOS machines
 it may run out of memory even with -maxmemory 0.  In that case you can still
 decompress, with some loss of image quality, by specifying -onepass for