Demote "fast" [I]DCT algorithms to legacy status

- Refer to the "slow" [I]DCT algorithms as "accurate" instead, since they are not slow under libjpeg-turbo. - Adjust documentation claims to reflect the fact that the "slow" and "fast" algorithms produce about the same performance on AVX2-equipped CPUs (because of the dual-lane nature of AVX2, it was not possible to accelerate the "fast" algorithm beyond what was achievable with SSE2.) Also adjust the claims to reflect the fact that the "fast" algorithm tends to be ~5-15% faster than the "slow" algorithm on non-AVX2-equipped CPUs, regardless of the use of the libjpeg-turbo SIMD extensions. - Indicate the legacy status of the "fast" and float algorithms in the documentation and cjpeg/djpeg usage info. - Remove obsolete paragraph in the djpeg man page that suggested that the float algorithm could be faster than the "fast" algorithm on some CPUs.
2020-11-04 10:13:06 -06:00
parent c3bfbde21d
commit 6e632af9f6
28 changed files with 263 additions and 218 deletions
--- a/libjpeg.txt
+++ b/libjpeg.txt
@@ -969,30 +969,38 @@ boolean arith_code

 J_DCT_METHOD dct_method
        Selects the algorithm used for the DCT step.  Choices are:
-                JDCT_ISLOW: slow but accurate integer algorithm
-                JDCT_IFAST: faster, less accurate integer method
-                JDCT_FLOAT: floating-point method
+                JDCT_ISLOW: accurate integer method
+                JDCT_IFAST: less accurate integer method [legacy feature]
+                JDCT_FLOAT: floating-point method [legacy feature]
                JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
-        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
-        with other SIMD implementations, or when using libjpeg-turbo without
-        SIMD extensions.)  For quality levels of 90 and below, there should be
-        little or no perceptible difference between the two algorithms.  For
-        quality levels above 90, however, the difference between JDCT_IFAST and
+        When the Independent JPEG Group's software was first released in 1991,
+        the compression time for a 1-megapixel JPEG image on a mainstream PC
+        was measured in minutes.  Thus, JDCT_IFAST provided noticeable
+        performance benefits.  On modern CPUs running libjpeg-turbo, however,
+        the compression time for a 1-megapixel JPEG image is measured in
+        milliseconds, and thus the performance benefits of JDCT_IFAST are much
+        less noticeable.  On modern x86/x86-64 CPUs that support AVX2
+        instructions, JDCT_IFAST and JDCT_ISLOW have similar performance.  On
+        other types of CPUs, JDCT_IFAST is generally about 5-15% faster than
+        JDCT_ISLOW.
+
+        For quality levels of 90 and below, there should be little or no
+        perceptible quality difference between the two algorithms.  For quality
+        levels above 90, however, the difference between JDCT_IFAST and
        JDCT_ISLOW becomes more pronounced.  With quality=97, for instance,
-        JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to
+        JDCT_IFAST incurs generally about a 1-3 dB loss in PSNR relative to
        JDCT_ISLOW, but this can be larger for some images.  Do not use
        JDCT_IFAST with quality levels above 97.  The algorithm often
        degenerates at quality=98 and above and can actually produce a more
        lossy image than if lower quality levels had been used.  Also, in
        libjpeg-turbo, JDCT_IFAST is not fully accelerated for quality levels
-        above 97, so it will be slower than JDCT_ISLOW.  JDCT_FLOAT is mainly a
-        legacy feature.  It does not produce significantly more accurate
-        results than the ISLOW method, and it is much slower.  The FLOAT method
-        may also give different results on different machines due to varying
-        roundoff behavior, whereas the integer methods should give the same
-        results on all machines.
+        above 97, so it will be slower than JDCT_ISLOW.
+
+        JDCT_FLOAT does not produce significantly more accurate results than
+        JDCT_ISLOW, and it is much slower.  JDCT_FLOAT may also give different
+        results on different machines due to varying roundoff behavior, whereas
+        the integer methods should give the same results on all machines.

 J_COLOR_SPACE jpeg_color_space
 int num_components
@@ -1270,31 +1278,39 @@ Additional decompression parameters that the application may set include:

 J_DCT_METHOD dct_method
        Selects the algorithm used for the DCT step.  Choices are:
-                JDCT_ISLOW: slow but accurate integer algorithm
-                JDCT_IFAST: faster, less accurate integer method
-                JDCT_FLOAT: floating-point method
+                JDCT_ISLOW: accurate integer method
+                JDCT_IFAST: less accurate integer method [legacy feature]
+                JDCT_FLOAT: floating-point method [legacy feature]
                JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
-        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
-        with other SIMD implementations, or when using libjpeg-turbo without
-        SIMD extensions.)  If the JPEG image was compressed using a quality
-        level of 85 or below, then there should be little or no perceptible
-        difference between the two algorithms.  When decompressing images that
-        were compressed using quality levels above 85, however, the difference
+        When the Independent JPEG Group's software was first released in 1991,
+        the decompression time for a 1-megapixel JPEG image on a mainstream PC
+        was measured in minutes.  Thus, JDCT_IFAST provided noticeable
+        performance benefits.  On modern CPUs running libjpeg-turbo, however,
+        the decompression time for a 1-megapixel JPEG image is measured in
+        milliseconds, and thus the performance benefits of JDCT_IFAST are much
+        less noticeable.  On modern x86/x86-64 CPUs that support AVX2
+        instructions, JDCT_IFAST and JDCT_ISLOW have similar performance.  On
+        other types of CPUs, JDCT_IFAST is generally about 5-15% faster than
+        JDCT_ISLOW.
+
+        If the JPEG image was compressed using a quality level of 85 or below,
+        then there should be little or no perceptible quality difference
+        between the two algorithms.  When decompressing images that were
+        compressed using quality levels above 85, however, the difference
        between JDCT_IFAST and JDCT_ISLOW becomes more pronounced.  With images
        compressed using quality=97, for instance, JDCT_IFAST incurs generally
-        about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be
+        about a 4-6 dB loss in PSNR relative to JDCT_ISLOW, but this can be
        larger for some images.  If you can avoid it, do not use JDCT_IFAST
        when decompressing images that were compressed using quality levels
        above 97.  The algorithm often degenerates for such images and can
        actually produce a more lossy output image than if the JPEG image had
-        been compressed using lower quality levels.  JDCT_FLOAT is mainly a
-        legacy feature.  It does not produce significantly more accurate
-        results than the ISLOW method, and it is much slower.  The FLOAT method
-        may also give different results on different machines due to varying
-        roundoff behavior, whereas the integer methods should give the same
-        results on all machines.
+        been compressed using lower quality levels.
+
+        JDCT_FLOAT does not produce significantly more accurate results than
+        JDCT_ISLOW, and it is much slower.  JDCT_FLOAT may also give different
+        results on different machines due to varying roundoff behavior, whereas
+        the integer methods should give the same results on all machines.

 boolean do_fancy_upsampling
        If TRUE, do careful upsampling of chroma components.  If FALSE,