Bump version to 2.1.

Merge pull request #79 from arjun024/devel
more logical flow of control if trellis_quant enabled
2014-07-29 09:12:48 -05:00 · 2014-07-25 09:02:45 -04:00 · 2014-07-25 17:33:35 +05:30 · 2014-07-24 16:39:26 -05:00 · 2014-07-24 17:09:27 -04:00 · 2014-07-24 10:50:59 -04:00
16 changed files with 4352 additions and 151 deletions
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -5,7 +5,7 @@
 cmake_minimum_required(VERSION 2.6)
 project(libmozjpeg C)
-set(VERSION 2.0pre)
+set(VERSION 2.1)
 if(MINGW OR CYGWIN)
  execute_process(COMMAND "date" "+%Y%m%d" OUTPUT_VARIABLE BUILD)
@@ -264,7 +264,7 @@ set_property(TARGET jpegtran-static PROPERTY COMPILE_FLAGS "-DUSE_SETMODE")
 add_executable(rdjpgcom rdjpgcom.c)
-add_executable(wrjpgcom rdjpgcom.c)
+add_executable(wrjpgcom wrjpgcom.c)
 #
--- a/ChangeLog.txt
+++ b/ChangeLog.txt
@@ -23,6 +23,14 @@ and is out of scope for a codec library.
 [5] The TurboJPEG API can now be used to compress JPEG images from YUV planar
 source images.
 [7] Improved the accuracy and performance of the non-SIMD implementation of the
 floating point inverse DCT (using code borrowed from libjpeg v8a and later.)
 The accuracy of this implementation now matches the accuracy of the SSE/SSE2
 implementation.  Note, however, that the floating point DCT/IDCT algorithms are
 mainly a legacy feature.  They generally do not produce significantly better
 accuracy than the slow integer DCT/IDCT algorithms, and they are quite a bit
 slower.
 1.3.1
 =====
--- a/README-turbo.txt
+++ b/README-turbo.txt
@@ -419,10 +419,25 @@ details.
 For the most part, libjpeg-turbo should produce identical output to libjpeg
 v6b.  The one exception to this is when using the floating point DCT/IDCT, in
-which case the outputs of libjpeg v6b and libjpeg-turbo are not guaranteed to
+which case the outputs of libjpeg v6b and libjpeg-turbo can differ for the
-be identical (the accuracy of the floating point DCT/IDCT is constant when
+following reasons:
-using libjpeg-turbo's SIMD extensions, but otherwise, it can depend heavily on
+
-the compiler and compiler settings.)
+-- The SSE/SSE2 floating point DCT implementation in libjpeg-turbo is ever so
   slightly more accurate than the implementation in libjpeg v6b, but not by
   any amount perceptible to human vision (generally in the range of 0.01 to
   0.08 dB gain in PNSR.)
 -- When not using the SIMD extensions, libjpeg-turbo uses the more accurate
   (and slightly faster) floating point IDCT algorithm introduced in libjpeg
   v8a as opposed to the algorithm used in libjpeg v6b.  It should be noted,
   however, that this algorithm basically brings the accuracy of the floating
   point IDCT in line with the accuracy of the slow integer IDCT.  The floating
   point DCT/IDCT algorithms are mainly a legacy feature, and they do not
   produce significantly more accuracy than the slow integer algorithms (to put
   numbers on this, the typical difference in PNSR between the two algorithms
   is less than 0.10 dB, whereas changing the quality level by 1 in the upper
   range of the quality scale is typically more like a 1.0 dB difference.)
 -- When not using the SIMD extensions, then the accuracy of the floating point
   DCT/IDCT can depend on the compiler and compiler settings.
 While libjpeg-turbo does emulate the libjpeg v8 API/ABI, under the hood, it is
 still using the same algorithms as libjpeg v6b, so there are several specific
@@ -430,16 +445,14 @@ cases in which libjpeg-turbo cannot be expected to produce the same output as
 libjpeg v8:
 -- When decompressing using scaling factors of 1/2 and 1/4, because libjpeg v8
-   implements those scaling algorithms a bit differently than libjpeg v6b does,
+   implements those scaling algorithms differently than libjpeg v6b does, and
-   and libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
+   libjpeg-turbo's SIMD extensions are based on the libjpeg v6b behavior.
 -- When using chrominance subsampling, because libjpeg v8 implements this
   with its DCT/IDCT scaling algorithms rather than with a separate
-   downsampling/upsampling algorithm.
+   downsampling/upsampling algorithm.  In our testing, the subsampled/upsampled
-
+   output of libjpeg v8 is less accurate than that of libjpeg v6b for this
-- When using the floating point IDCT, for the reasons stated above and also
+   reason.
   because the floating point IDCT algorithm was modified in libjpeg v8a to
   improve accuracy.
 -- When decompressing using a scaling factor > 1 and merged (AKA "non-fancy" or
   "non-smooth") chrominance upsampling, because libjpeg v8 does not support
--- a/README.md
+++ b/README.md
@@ -7,6 +7,8 @@ The idea is to reduce transfer times for JPEGs on the Web, thus reducing page lo
 'mozjpeg' is not intended to be a general JPEG library replacement. It makes tradeoffs that are intended to benefit Web use cases and focuses solely on improving encoding. It is best used as part of a Web encoding workflow. For a general JPEG library (e.g. your system libjpeg), especially if you care about decoding, we recommend libjpeg-turbo.
-For more information, see the project announcement:
+More information:
-https://blog.mozilla.org/research/2014/03/05/introducing-the-mozjpeg-project/
+* [Version 1.0 Announcement](https://blog.mozilla.org/research/2014/03/05/introducing-the-mozjpeg-project/)
 * [Version 2.0 Announcement](https://blog.mozilla.org/research/2014/07/15/mozilla-advances-jpeg-encoding-with-mozjpeg-2-0/)
 * [Mailing List](https://lists.mozilla.org/listinfo/dev-mozjpeg)</a>
--- a/cjpeg.1
+++ b/cjpeg.1
@@ -1,4 +1,4 @@
-.TH CJPEG 1 "18 January 2013"
+.TH CJPEG 1 "11 May 2014"
 .SH NAME
 cjpeg \- compress an image file to a JPEG file
 .SH SYNOPSIS
@@ -166,14 +166,25 @@ Use integer DCT method (default).
 .TP
 .B \-dct fast
 Use fast integer DCT (less accurate).
 In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
 method when using the x86/x86-64 SIMD extensions (results may vary with other
 SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)
 For quality levels of 90 and below, there should be little or no perceptible
 difference between the two algorithms.  For quality levels above 90, however,
 the difference between the fast and the int methods becomes more pronounced.
 With quality=97, for instance, the fast method incurs generally about a 1-3 dB
 loss (in PSNR) relative to the int method, but this can be larger for some
 images.  Do not use the fast method with quality levels above 97.  The
 algorithm often degenerates at quality=98 and above and can actually produce a
 more lossy image than if lower quality levels had been used.
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The float method is very slightly more accurate than the int method, but is
+The float method is mostly a legacy feature.  It does not produce significantly
-much slower unless your machine has very fast floating-point hardware.  Also
+more accurate results than the int method, and it is much slower.  The float
-note that results of the floating-point method may vary slightly across
+method may also give different results on different machines due to varying
-machines, while the integer methods should give the same results everywhere.
+roundoff behavior, whereas the integer methods should give the same results on
-The fast integer method is much less accurate than the other two.
+all machines.
 .TP
 .BI \-restart " N"
 Emit a JPEG restart marker every N MCU rows, or every N MCU blocks if "B" is
--- a/cjpeg.c
+++ b/cjpeg.c
@@ -168,6 +168,7 @@ usage (void)
 #ifdef C_PROGRESSIVE_SUPPORTED
  fprintf(stderr, "  -progressive   Create progressive JPEG file (enabled by default)\n");
 #endif
  fprintf(stderr, "  -baseline      Create baseline JPEG file (disable progressive coding)\n");
 #ifdef TARGA_SUPPORTED
  fprintf(stderr, "  -targa         Input file is Targa format (usually not needed)\n");
 #endif
@@ -206,7 +207,6 @@ usage (void)
 #endif
  fprintf(stderr, "  -verbose  or  -debug   Emit debug output\n");
  fprintf(stderr, "Switches for wizards:\n");
  fprintf(stderr, "  -baseline      Force baseline quantization tables\n");
  fprintf(stderr, "  -qtables file  Use quantization tables given in file\n");
  fprintf(stderr, "  -qslots N[,...]    Set component quantization tables\n");
  fprintf(stderr, "  -sample HxV[,...]  Set component sampling factors\n");
@@ -279,11 +279,17 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
    } else if (keymatch(arg, "baseline", 1)) {
      /* Force baseline-compatible output (8-bit quantizer values). */
      force_baseline = TRUE;
      /* Disable multiple scans */
      simple_progressive = FALSE;
      cinfo->num_scans = 0;
      cinfo->scan_info = NULL;
    } else if (keymatch(arg, "dct", 2)) {
      /* Select DCT algorithm. */
-      if (++argn >= argc)	/* advance to next argument */
+      if (++argn >= argc) { /* advance to next argument */
        fprintf(stderr, "%s: missing argument for dct\n", progname);
 	usage();
      }
      if (keymatch(argv[argn], "int", 1)) {
 	cinfo->dct_method = JDCT_ISLOW;
      } else if (keymatch(argv[argn], "fast", 2)) {
@@ -291,6 +297,7 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
      } else if (keymatch(argv[argn], "float", 2)) {
 	cinfo->dct_method = JDCT_FLOAT;
      } else
        fprintf(stderr, "%s: invalid argument for dct\n", progname);
 	usage();
    } else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) {
@@ -314,7 +321,7 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
    } else if (keymatch(arg, "flat", 4)) {
      cinfo->use_flat_quant_tbl = TRUE;
      jpeg_set_quality(cinfo, 75, TRUE);
-      
+
    } else if (keymatch(arg, "grayscale", 2) || keymatch(arg, "greyscale",2)) {
      /* Force a monochrome JPEG file to be generated. */
      jpeg_set_colorspace(cinfo, JCS_GRAYSCALE);
@@ -327,12 +334,12 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
      if (++argn >= argc)	/* advance to next argument */
 	usage();
      cinfo->lambda_log_scale1 = atof(argv[argn]);
-      
+
    } else if (keymatch(arg, "lambda2", 7)) {
      if (++argn >= argc)	/* advance to next argument */
 	usage();
      cinfo->lambda_log_scale2 = atof(argv[argn]);
-      
+
    } else if (keymatch(arg, "maxmemory", 3)) {
      /* Maximum memory in Kb (or Mb with 'm'). */
      long lval;
@@ -348,7 +355,7 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
    } else if (keymatch(arg, "multidcscan", 3)) {
      cinfo->one_dc_scan = FALSE;
-      
+
    } else if (keymatch(arg, "optimize", 1) || keymatch(arg, "optimise", 1)) {
      /* Enable entropy parm optimization. */
 #ifdef ENTROPY_OPT_SUPPORTED
@@ -361,8 +368,10 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
    } else if (keymatch(arg, "outfile", 4)) {
      /* Set output file name. */
-      if (++argn >= argc)	/* advance to next argument */
+      if (++argn >= argc)	{ /* advance to next argument */
        fprintf(stderr, "%s: missing argument for outfile\n", progname);
 	usage();
      }
      outfilename = argv[argn];	/* save it away for later use */
    } else if (keymatch(arg, "progressive", 1)) {
@@ -388,8 +397,10 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
    } else if (keymatch(arg, "quality", 1)) {
      /* Quality ratings (quantization table scaling factors). */
-      if (++argn >= argc)	/* advance to next argument */
+      if (++argn >= argc)	{ /* advance to next argument */
        fprintf(stderr, "%s: missing argument for quality\n", progname);
 	usage();
      }
      qualityarg = argv[argn];
    } else if (keymatch(arg, "qslots", 2)) {
@@ -505,6 +516,7 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
      jpeg_set_quality(cinfo, 75, TRUE);
    } else {
      fprintf(stderr, "%s: unknown option '%s'\n", progname, arg);
      usage();			/* bogus switch */
    }
  }
@@ -516,20 +528,26 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
    /* Set quantization tables for selected quality. */
    /* Some or all may be overridden if -qtables is present. */
    if (qualityarg != NULL)	/* process -quality if it was present */
-      if (! set_quality_ratings(cinfo, qualityarg, force_baseline))
+      if (! set_quality_ratings(cinfo, qualityarg, force_baseline)) {
        fprintf(stderr, "%s: can't set quality ratings\n", progname);
 	usage();
      }
    if (qtablefile != NULL)	/* process -qtables if it was present */
-      if (! read_quant_tables(cinfo, qtablefile, force_baseline))
+      if (! read_quant_tables(cinfo, qtablefile, force_baseline)) {
        fprintf(stderr, "%s: can't read qtable file\n", progname);
 	usage();
      }
    if (qslotsarg != NULL)	/* process -qslots if it was present */
      if (! set_quant_slots(cinfo, qslotsarg))
 	usage();
    if (samplearg != NULL)	/* process -sample if it was present */
-      if (! set_sample_factors(cinfo, samplearg))
+      if (! set_sample_factors(cinfo, samplearg)) {
        fprintf(stderr, "%s: can't set sample factors\n", progname);
 	usage();
      }
 #ifdef C_PROGRESSIVE_SUPPORTED
    if (simple_progressive)	/* process -progressive; -scans can override */
--- a/configure.ac
+++ b/configure.ac
@@ -2,7 +2,7 @@
 # Process this file with autoconf to produce a configure script.
 AC_PREREQ([2.56])
-AC_INIT([libmozjpeg], [2.0pre])
+AC_INIT([libmozjpeg], [2.1])
 BUILD=`date +%Y%m%d`
 AM_INIT_AUTOMAKE([-Wall foreign dist-bzip2])
@@ -20,7 +20,7 @@ AC_PROG_CC
 AM_PROG_CC_C_O
 AM_PROG_AS
 AC_PROG_INSTALL
-AM_PROG_AR
+m4_ifdef([AM_PROG_AR], [AM_PROG_AR])
 AC_PROG_LIBTOOL
 AC_PROG_LN_S
--- a/djpeg.1
+++ b/djpeg.1
@@ -1,4 +1,4 @@
-.TH DJPEG 1 "18 January 2013"
+.TH DJPEG 1 "11 May 2014"
 .SH NAME
 djpeg \- decompress a JPEG file to an image file
 .SH SYNOPSIS
@@ -115,14 +115,28 @@ Use integer DCT method (default).
 .TP
 .B \-dct fast
 Use fast integer DCT (less accurate).
 In libjpeg-turbo, the fast method is generally about 5-15% faster than the int
 method when using the x86/x86-64 SIMD extensions (results may vary with other
 SIMD implementations, or when using libjpeg-turbo without SIMD extensions.)  If
 the JPEG image was compressed using a quality level of 85 or below, then there
 should be little or no perceptible difference between the two algorithms.  When
 decompressing images that were compressed using quality levels above 85,
 however, the difference between the fast and int methods becomes more
 pronounced.  With images compressed using quality=97, for instance, the fast
 method incurs generally about a 4-6 dB loss (in PSNR) relative to the int
 method, but this can be larger for some images.  If you can avoid it, do not
 use the fast method when decompressing images that were compressed using
 quality levels above 97.  The algorithm often degenerates for such images and
 can actually produce a more lossy output image than if the JPEG image had been
 compressed using lower quality levels.
 .TP
 .B \-dct float
 Use floating-point DCT method.
-The float method is very slightly more accurate than the int method, but is
+The float method is mostly a legacy feature.  It does not produce significantly
-much slower unless your machine has very fast floating-point hardware.  Also
+more accurate results than the int method, and it is much slower.  The float
-note that results of the floating-point method may vary slightly across
+method may also give different results on different machines due to varying
-machines, while the integer methods should give the same results everywhere.
+roundoff behavior, whereas the integer methods should give the same results on
-The fast integer method is much less accurate than the other two.
+all machines.
 .TP
 .B \-dither fs
 Use Floyd-Steinberg dithering in color quantization.
--- a/jcapistd.c
+++ b/jcapistd.c
@@ -46,7 +46,7 @@ jpeg_start_compress (j_compress_ptr cinfo, boolean write_all_tables)
    jpeg_suppress_tables(cinfo, FALSE);	/* mark all tables to be written */
  /* setting up scan optimisation pattern failed, disable scan optimisation */
-  if (cinfo->num_scans_luma == 0)
+  if (cinfo->num_scans_luma == 0 || cinfo->scan_info == NULL || cinfo->num_scans == 0)
    cinfo->optimize_scans = FALSE;
  /* (Re)initialize error mgr and destination modules */
--- a/jcdctmgr.c
+++ b/jcdctmgr.c
@@ -543,6 +543,8 @@ forward_DCT_float (j_compress_ptr cinfo, jpeg_component_info * compptr,
  FAST_FLOAT * divisors = fdct->float_divisors[compptr->quant_tbl_no];
  FAST_FLOAT * workspace;
  JDIMENSION bi;
  float v;
  int x;
  /* Make sure the compiler doesn't look up these every pass */
@@ -572,10 +574,10 @@ forward_DCT_float (j_compress_ptr cinfo, jpeg_component_info * compptr,
      };
      for (i = 0; i < DCTSIZE2; i++) {
-        float v = workspace[i];
+        v = workspace[i];
        v /= aanscalefactor[i%8];
        v /= aanscalefactor[i/8];
-        int x = (v >= 0.0) ? (int)(v + 0.5) : (int)(v - 0.5);
+        x = (v >= 0.0) ? (int)(v + 0.5) : (int)(v - 0.5);
        dst[bi][i] = x;
      }
    }
@@ -588,9 +590,7 @@ forward_DCT_float (j_compress_ptr cinfo, jpeg_component_info * compptr,
 #endif /* DCT_FLOAT_SUPPORTED */
 #include "jchuff.h"
-
+#include "jpeg_nbits_table.h"
 static unsigned char jpeg_nbits_table[65536];
 static int jpeg_nbits_table_init = 0;
 static const float jpeg_lambda_weights_flat[64] = {
  1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f,
@@ -637,6 +637,10 @@ quantize_trellis(j_compress_ptr cinfo, c_derived_tbl *actbl, JBLOCKROW coef_bloc
  int has_eob;
  float cost_all_zeros;
  float best_cost_skip;
  float cost;
  int zero_run;
  int run_bits;
  int rate;
  Ss = cinfo->Ss;
  Se = cinfo->Se;
@@ -654,15 +658,6 @@ quantize_trellis(j_compress_ptr cinfo, c_derived_tbl *actbl, JBLOCKROW coef_bloc
    requires_eob[0] = 0;
  }
  if(!jpeg_nbits_table_init) {
    for(i = 0; i < 65536; i++) {
      int nbits = 0, temp = i;
      while (temp) {temp >>= 1;  nbits++;}
      jpeg_nbits_table[i] = nbits;
    }
    jpeg_nbits_table_init = 1;
  }
  norm = 0.0;
  for (i = 1; i < DCTSIZE2; i++) {
    norm += qtbl->quantval[i] * qtbl->quantval[i];
@@ -725,11 +720,11 @@ quantize_trellis(j_compress_ptr cinfo, c_derived_tbl *actbl, JBLOCKROW coef_bloc
        if (j != Ss-1 && coef_blocks[bi][zz] == 0)
          continue;
-        int zero_run = i - 1 - j;
+        zero_run = i - 1 - j;
        if ((zero_run >> 4) && actbl->ehufsi[0xf0] == 0)
          continue;
-        int run_bits = (zero_run >> 4) * actbl->ehufsi[0xf0];
+        run_bits = (zero_run >> 4) * actbl->ehufsi[0xf0];
        zero_run &= 15;
        for (k = 0; k < num_candidates; k++) {
@@ -737,8 +732,8 @@ quantize_trellis(j_compress_ptr cinfo, c_derived_tbl *actbl, JBLOCKROW coef_bloc
          if (coef_bits == 0)
            continue;
-          int rate = coef_bits + candidate_bits[k] + run_bits;
+          rate = coef_bits + candidate_bits[k] + run_bits;
-          float cost = rate + candidate_dist[k];
+          cost = rate + candidate_dist[k];
          cost += accumulated_zero_dist[i-1] - accumulated_zero_dist[j] + accumulated_cost[j];
          if (cost < accumulated_cost[i]) {
--- a/jchuff.c
+++ b/jchuff.c
@@ -22,8 +22,7 @@
 #include "jchuff.h"		/* Declarations shared with jcphuff.c */
 #include <limits.h>
-static unsigned char jpeg_nbits_table[65536];
+#include "jpeg_nbits_table.h"
 static int jpeg_nbits_table_init = 0;
 #ifndef min
 #define min(a,b) ((a)<(b)?(a):(b))
@@ -271,15 +270,6 @@ jpeg_make_c_derived_tbl (j_compress_ptr cinfo, boolean isDC, int tblno,
    dtbl->ehufco[i] = huffcode[p];
    dtbl->ehufsi[i] = huffsize[p];
  }
  if(!jpeg_nbits_table_init) {
    for(i = 0; i < 65536; i++) {
      int nbits = 0, temp = i;
      while (temp) {temp >>= 1;  nbits++;}
      jpeg_nbits_table[i] = nbits;
    }
    jpeg_nbits_table_init = 1;
  }
 }
--- a/jcmaster.c
+++ b/jcmaster.c
@@ -933,11 +933,10 @@ jinit_c_master_control (j_compress_ptr cinfo, boolean transcode_only)
  else
    master->total_passes = cinfo->num_scans;
  master->pass_number_scan_opt_base = 0;
  if (cinfo->trellis_quant) {
-    if (cinfo->progressive_mode)
+    master->pass_number_scan_opt_base = ((cinfo->use_scans_in_trellis) ? 4 : 2) * cinfo->num_components * cinfo->trellis_num_loops;
-      master->total_passes += ((cinfo->use_scans_in_trellis) ? 4 : 2) * cinfo->num_components * cinfo->trellis_num_loops;
+    master->total_passes += master->pass_number_scan_opt_base;
    else
      master->total_passes += 1;
  }
  if (cinfo->optimize_scans) {
@@ -947,9 +946,4 @@ jinit_c_master_control (j_compress_ptr cinfo, boolean transcode_only)
    for (i = 0; i < cinfo->num_scans; i++)
      master->scan_buffer[i] = NULL;
  }
  if (cinfo->trellis_quant)
    master->pass_number_scan_opt_base = ((cinfo->use_scans_in_trellis) ? 4 : 2) * cinfo->num_components * cinfo->trellis_num_loops;
  else
    master->pass_number_scan_opt_base = 0;
 }
--- a/jidctflt.c
+++ b/jidctflt.c
@@ -1,9 +1,12 @@
 /*
 * jidctflt.c
 *
 * This file was part of the Independent JPEG Group's software:
 * Copyright (C) 1994-1998, Thomas G. Lane.
- * This file is part of the Independent JPEG Group's software.
+ * Modified 2010 by Guido Vollbeding.
- * For conditions of distribution and use, see the accompanying README file.
+ * libjpeg-turbo Modifications:
 * Copyright (C) 2014, D. R. Commander.
  * For conditions of distribution and use, see the accompanying README file.
 *
 * This file contains a floating-point implementation of the
 * inverse DCT (Discrete Cosine Transform).  In the IJG code, this routine
@@ -76,10 +79,10 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
  FLOAT_MULT_TYPE * quantptr;
  FAST_FLOAT * wsptr;
  JSAMPROW outptr;
-  JSAMPLE *range_limit = IDCT_range_limit(cinfo);
+  JSAMPLE *range_limit = cinfo->sample_range_limit;
  int ctr;
  FAST_FLOAT workspace[DCTSIZE2]; /* buffers data between passes */
-  SHIFT_TEMPS
+  #define _0_125 ((FLOAT_MULT_TYPE)0.125)
  /* Pass 1: process columns from input, store into work array. */
@@ -101,7 +104,8 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
 	inptr[DCTSIZE*5] == 0 && inptr[DCTSIZE*6] == 0 &&
 	inptr[DCTSIZE*7] == 0) {
      /* AC terms all zero */
-      FAST_FLOAT dcval = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+      FAST_FLOAT dcval = DEQUANTIZE(inptr[DCTSIZE*0],
                                    quantptr[DCTSIZE*0] * _0_125);
      wsptr[DCTSIZE*0] = dcval;
      wsptr[DCTSIZE*1] = dcval;
@@ -120,10 +124,10 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
    /* Even part */
-    tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0]);
+    tmp0 = DEQUANTIZE(inptr[DCTSIZE*0], quantptr[DCTSIZE*0] * _0_125);
-    tmp1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2]);
+    tmp1 = DEQUANTIZE(inptr[DCTSIZE*2], quantptr[DCTSIZE*2] * _0_125);
-    tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4]);
+    tmp2 = DEQUANTIZE(inptr[DCTSIZE*4], quantptr[DCTSIZE*4] * _0_125);
-    tmp3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6]);
+    tmp3 = DEQUANTIZE(inptr[DCTSIZE*6], quantptr[DCTSIZE*6] * _0_125);
    tmp10 = tmp0 + tmp2;	/* phase 3 */
    tmp11 = tmp0 - tmp2;
@@ -138,10 +142,10 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
    /* Odd part */
-    tmp4 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1]);
+    tmp4 = DEQUANTIZE(inptr[DCTSIZE*1], quantptr[DCTSIZE*1] * _0_125);
-    tmp5 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3]);
+    tmp5 = DEQUANTIZE(inptr[DCTSIZE*3], quantptr[DCTSIZE*3] * _0_125);
-    tmp6 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5]);
+    tmp6 = DEQUANTIZE(inptr[DCTSIZE*5], quantptr[DCTSIZE*5] * _0_125);
-    tmp7 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7]);
+    tmp7 = DEQUANTIZE(inptr[DCTSIZE*7], quantptr[DCTSIZE*7] * _0_125);
    z13 = tmp6 + tmp5;		/* phase 6 */
    z10 = tmp6 - tmp5;
@@ -152,12 +156,12 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
    tmp11 = (z11 - z13) * ((FAST_FLOAT) 1.414213562); /* 2*c4 */
    z5 = (z10 + z12) * ((FAST_FLOAT) 1.847759065); /* 2*c2 */
-    tmp10 = ((FAST_FLOAT) 1.082392200) * z12 - z5; /* 2*(c2-c6) */
+    tmp10 = z5 - z12 * ((FAST_FLOAT) 1.082392200); /* 2*(c2-c6) */
-    tmp12 = ((FAST_FLOAT) -2.613125930) * z10 + z5; /* -2*(c2+c6) */
+    tmp12 = z5 - z10 * ((FAST_FLOAT) 2.613125930); /* 2*(c2+c6) */
    tmp6 = tmp12 - tmp7;	/* phase 2 */
    tmp5 = tmp11 - tmp6;
-    tmp4 = tmp10 + tmp5;
+    tmp4 = tmp10 - tmp5;
    wsptr[DCTSIZE*0] = tmp0 + tmp7;
    wsptr[DCTSIZE*7] = tmp0 - tmp7;
@@ -165,8 +169,8 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
    wsptr[DCTSIZE*6] = tmp1 - tmp6;
    wsptr[DCTSIZE*2] = tmp2 + tmp5;
    wsptr[DCTSIZE*5] = tmp2 - tmp5;
-    wsptr[DCTSIZE*4] = tmp3 + tmp4;
+    wsptr[DCTSIZE*3] = tmp3 + tmp4;
-    wsptr[DCTSIZE*3] = tmp3 - tmp4;
+    wsptr[DCTSIZE*4] = tmp3 - tmp4;
    inptr++;			/* advance pointers to next column */
    quantptr++;
@@ -174,7 +178,6 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
  }
  /* Pass 2: process rows from work array, store into output array. */
  /* Note that we must descale the results by a factor of 8 == 2**3. */
  wsptr = workspace;
  for (ctr = 0; ctr < DCTSIZE; ctr++) {
@@ -187,8 +190,10 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
    /* Even part */
-    tmp10 = wsptr[0] + wsptr[4];
+    /* Apply signed->unsigned and prepare float->int conversion */
-    tmp11 = wsptr[0] - wsptr[4];
+    z5 = wsptr[0] + ((FAST_FLOAT) CENTERJSAMPLE + (FAST_FLOAT) 0.5);
    tmp10 = z5 + wsptr[4];
    tmp11 = z5 - wsptr[4];
    tmp13 = wsptr[2] + wsptr[6];
    tmp12 = (wsptr[2] - wsptr[6]) * ((FAST_FLOAT) 1.414213562) - tmp13;
@@ -209,31 +214,23 @@ jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
    tmp11 = (z11 - z13) * ((FAST_FLOAT) 1.414213562);
    z5 = (z10 + z12) * ((FAST_FLOAT) 1.847759065); /* 2*c2 */
-    tmp10 = ((FAST_FLOAT) 1.082392200) * z12 - z5; /* 2*(c2-c6) */
+    tmp10 = z5 - z12 * ((FAST_FLOAT) 1.082392200); /* 2*(c2-c6) */
-    tmp12 = ((FAST_FLOAT) -2.613125930) * z10 + z5; /* -2*(c2+c6) */
+    tmp12 = z5 - z10 * ((FAST_FLOAT) 2.613125930); /* 2*(c2+c6) */
    tmp6 = tmp12 - tmp7;
    tmp5 = tmp11 - tmp6;
-    tmp4 = tmp10 + tmp5;
+    tmp4 = tmp10 - tmp5;
-    /* Final output stage: scale down by a factor of 8 and range-limit */
+    /* Final output stage: float->int conversion and range-limit */
-    outptr[0] = range_limit[(int) DESCALE((INT32) (tmp0 + tmp7), 3)
+    outptr[0] = range_limit[((int) (tmp0 + tmp7)) & RANGE_MASK];
-			    & RANGE_MASK];
+    outptr[7] = range_limit[((int) (tmp0 - tmp7)) & RANGE_MASK];
-    outptr[7] = range_limit[(int) DESCALE((INT32) (tmp0 - tmp7), 3)
+    outptr[1] = range_limit[((int) (tmp1 + tmp6)) & RANGE_MASK];
-			    & RANGE_MASK];
+    outptr[6] = range_limit[((int) (tmp1 - tmp6)) & RANGE_MASK];
-    outptr[1] = range_limit[(int) DESCALE((INT32) (tmp1 + tmp6), 3)
+    outptr[2] = range_limit[((int) (tmp2 + tmp5)) & RANGE_MASK];
-			    & RANGE_MASK];
+    outptr[5] = range_limit[((int) (tmp2 - tmp5)) & RANGE_MASK];
-    outptr[6] = range_limit[(int) DESCALE((INT32) (tmp1 - tmp6), 3)
+    outptr[3] = range_limit[((int) (tmp3 + tmp4)) & RANGE_MASK];
-			    & RANGE_MASK];
+    outptr[4] = range_limit[((int) (tmp3 - tmp4)) & RANGE_MASK];
    outptr[2] = range_limit[(int) DESCALE((INT32) (tmp2 + tmp5), 3)
 			    & RANGE_MASK];
    outptr[5] = range_limit[(int) DESCALE((INT32) (tmp2 - tmp5), 3)
 			    & RANGE_MASK];
    outptr[4] = range_limit[(int) DESCALE((INT32) (tmp3 + tmp4), 3)
 			    & RANGE_MASK];
    outptr[3] = range_limit[(int) DESCALE((INT32) (tmp3 - tmp4), 3)
 			    & RANGE_MASK];
    wsptr += DCTSIZE;		/* advance pointer to next row */
  }
--- a/jpeg_nbits_table.h
+++ b/jpeg_nbits_table.h
--- a/libjpeg.txt
+++ b/libjpeg.txt
@@ -3,7 +3,7 @@ USING THE IJG JPEG LIBRARY
 This file was part of the Independent JPEG Group's software:
 Copyright (C) 1994-2011, Thomas G. Lane, Guido Vollbeding.
 Modifications:
-Copyright (C) 2010, D. R. Commander.
+Copyright (C) 2010, 2014, D. R. Commander.
 For conditions of distribution and use, see the accompanying README file.
@@ -886,14 +886,23 @@ J_DCT_METHOD dct_method
 		JDCT_FLOAT: floating-point method
 		JDCT_DEFAULT: default method (normally JDCT_ISLOW)
 		JDCT_FASTEST: fastest method (normally JDCT_IFAST)
-	The FLOAT method is very slightly more accurate than the ISLOW method,
+        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
-	but may give different results on different machines due to varying
+        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
-	roundoff behavior.  The integer methods should give the same results
+        with other SIMD implementations, or when using libjpeg-turbo without
-	on all machines.  On machines with sufficiently fast FP hardware, the
+        SIMD extensions.)  For quality levels of 90 and below, there should be
-	floating-point method may also be the fastest.  The IFAST method is
+        little or no perceptible difference between the two algorithms.  For
-	considerably less accurate than the other two; its use is not
+        quality levels above 90, however, the difference between JDCT_IFAST and
-	recommended if high quality is a concern.  JDCT_DEFAULT and
+        JDCT_ISLOW becomes more pronounced.  With quality=97, for instance,
-	JDCT_FASTEST are macros configurable by each installation.
+        JDCT_IFAST incurs generally about a 1-3 dB loss (in PSNR) relative to
        JDCT_ISLOW, but this can be larger for some images.  Do not use
        JDCT_IFAST with quality levels above 97.  The algorithm often
        degenerates at quality=98 and above and can actually produce a more
        lossy image than if lower quality levels had been used.  JDCT_FLOAT is
        mostly a legacy feature.  It does not produce significantly more
        accurate results than the ISLOW method, and it is much slower.  The
        FLOAT method may also give different results on different machines due
        to varying roundoff behavior, whereas the integer methods should give
        the same results on all machines.
 J_COLOR_SPACE jpeg_color_space
 int num_components
@@ -1170,8 +1179,32 @@ int actual_number_of_colors
 Additional decompression parameters that the application may set include:
 J_DCT_METHOD dct_method
-	Selects the algorithm used for the DCT step.  Choices are the same
+        Selects the algorithm used for the DCT step.  Choices are:
-	as described above for compression.
+                JDCT_ISLOW: slow but accurate integer algorithm
                JDCT_IFAST: faster, less accurate integer method
                JDCT_FLOAT: floating-point method
                JDCT_DEFAULT: default method (normally JDCT_ISLOW)
                JDCT_FASTEST: fastest method (normally JDCT_IFAST)
        In libjpeg-turbo, JDCT_IFAST is generally about 5-15% faster than
        JDCT_ISLOW when using the x86/x86-64 SIMD extensions (results may vary
        with other SIMD implementations, or when using libjpeg-turbo without
        SIMD extensions.)  If the JPEG image was compressed using a quality
        level of 85 or below, then there should be little or no perceptible
        difference between the two algorithms.  When decompressing images that
        were compressed using quality levels above 85, however, the difference
        between JDCT_IFAST and JDCT_ISLOW becomes more pronounced.  With images
        compressed using quality=97, for instance, JDCT_IFAST incurs generally
        about a 4-6 dB loss (in PSNR) relative to JDCT_ISLOW, but this can be
        larger for some images.  If you can avoid it, do not use JDCT_IFAST
        when decompressing images that were compressed using quality levels
        above 97.  The algorithm often degenerates for such images and can
        actually produce a more lossy output image than if the JPEG image had
        been compressed using lower quality levels.  JDCT_FLOAT is mostly a
        legacy feature.  It does not produce significantly more accurate
        results than the ISLOW method, and it is much slower.  The FLOAT method
        may also give different results on different machines due to varying
        roundoff behavior, whereas the integer methods should give the same
        results on all machines.
 boolean do_fancy_upsampling
 	If TRUE, do careful upsampling of chroma components.  If FALSE,
--- a/usage.txt
+++ b/usage.txt
@@ -172,13 +172,28 @@ Switches for advanced users:
 	-dct int	Use integer DCT method (default).
 	-dct fast	Use fast integer DCT (less accurate).
 	-dct float	Use floating-point DCT method.
-			The float method is very slightly more accurate than
+                        In libjpeg-turbo, the fast method is generally about
-			the int method, but is much slower unless your machine
+                        5-15% faster than the int method when using the
-			has very fast floating-point hardware.  Also note that
+                        x86/x86-64 SIMD extensions (results may vary with other
-			results of the floating-point method may vary slightly
+                        SIMD implementations, or when using libjpeg-turbo
-			across machines, while the integer methods should give
+                        without SIMD extensions.)  For quality levels of 90 and
-			the same results everywhere.  The fast integer method
+                        below, there should be little or no perceptible
-			is much less accurate than the other two.
+                        difference between the two algorithms.  For quality
                        levels above 90, however, the difference between
                        the fast and the int methods becomes more pronounced.
                        With quality=97, for instance, the fast method incurs
                        generally about a 1-3 dB loss (in PSNR) relative to
                        the int method, but this can be larger for some images.
                        Do not use the fast method with quality levels above
                        97.  The algorithm often degenerates at quality=98 and
                        above and can actually produce a more lossy image than
                        if lower quality levels had been used.  The float
                        method is mostly a legacy feature.  It does not produce
                        significantly more accurate results than the int
                        method, and it is much slower.  The float method may
                        also give different results on different machines due
                        to varying roundoff behavior, whereas the integer
                        methods should give the same results on all machines.
 	-restart N	Emit a JPEG restart marker every N MCU rows, or every
 			N MCU blocks if "B" is attached to the number.
@@ -296,13 +311,32 @@ Switches for advanced users:
 	-dct int	Use integer DCT method (default).
 	-dct fast	Use fast integer DCT (less accurate).
 	-dct float	Use floating-point DCT method.
-			The float method is very slightly more accurate than
+                        In libjpeg-turbo, the fast method is generally about
-			the int method, but is much slower unless your machine
+                        5-15% faster than the int method when using the
-			has very fast floating-point hardware.  Also note that
+                        x86/x86-64 SIMD extensions (results may vary with other
-			results of the floating-point method may vary slightly
+                        SIMD implementations, or when using libjpeg-turbo
-			across machines, while the integer methods should give
+                        without SIMD extensions.)  If the JPEG image was
-			the same results everywhere.  The fast integer method
+                        compressed using a quality level of 85 or below, then
-			is much less accurate than the other two.
+                        there should be little or no perceptible difference
                        between the two algorithms.  When decompressing images
                        that were compressed using quality levels above 85,
                        however, the difference between the fast and int
                        methods becomes more pronounced.  With images
                        compressed using quality=97, for instance, the fast
                        method incurs generally about a 4-6 dB loss (in PSNR)
                        relative to the int method, but this can be larger for
                        some images.  If you can avoid it, do not use the fast
                        method when decompressing images that were compressed
                        using quality levels above 97.  The algorithm often
                        degenerates for such images and can actually produce
                        a more lossy output image than if the JPEG image had
                        been compressed using lower quality levels.  The float
                        method is mostly a legacy feature.  It does not produce
                        significantly more accurate results than the int
                        method, and it is much slower.  The float method may
                        also give different results on different machines due
                        to varying roundoff behavior, whereas the integer
                        methods should give the same results on all machines.
 	-dither fs	Use Floyd-Steinberg dithering in color quantization.
 	-dither ordered	Use ordered dithering in color quantization.
@@ -381,12 +415,6 @@ When producing a color-quantized image, "-onepass -dither ordered" is fast but
 much lower quality than the default behavior.  "-dither none" may give
 acceptable results in two-pass mode, but is seldom tolerable in one-pass mode.
 If you are fortunate enough to have very fast floating point hardware,
 "-dct float" may be even faster than "-dct fast".  But on most machines
 "-dct float" is slower than "-dct int"; in this case it is not worth using,
 because its theoretical accuracy advantage is too small to be significant
 in practice.
 Two-pass color quantization requires a good deal of memory; on MS-DOS machines
 it may run out of memory even with -maxmemory 0.  In that case you can still
 decompress, with some loss of image quality, by specifying -onepass for
Author	SHA1	Message	Date
Josh Aas	594b7258cc	Bump version to 2.1.	2014-07-29 09:12:48 -05:00
fbossen	0533b31891	Merge pull request #79 from arjun024/devel more logical flow of control if trellis_quant enabled	2014-07-25 09:02:45 -04:00
Arjun Sreedharan	cca53c920d	more logical flow of control if trellis_quant enabled if trellis_quant is enabled, increment total number of passes by optimization beginning pass number. Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>	2014-07-25 17:33:35 +05:30
Josh Aas	5901802871	Update README.md with link to 2.0 announcement and mailing list.	2014-07-24 16:39:26 -05:00
Frank Bossen	fbef31f76d	Add option to disable progressive coding in cjpeg Redefine baseline option in cjpeg to actually create a baseline JPEG file by disabling progressive coding	2014-07-24 17:09:27 -04:00
Frank Bossen	1aa50b71d9	Use precomputed table From jpeglib-turbo r1221: Integrate a slightly modified version of Mozilla's patch for precomputing the bit-counting LUT. This is useful if the table needs to be shared among multiple processes, although the primary reason for doing that is reduced footprint on mobile devices, which are probably already covered by the clz intrinsic code.	2014-07-24 10:50:59 -04:00
Frank Bossen	3adc64a4cb	Improve floating point DCT From libjpeg-turbo r1288 Port the more accurate (and slightly faster) floating point IDCT implementation from jpeg-8a and later. New research revealed that the SSE/SSE2 floating point IDCT implementation was actually more accurate than the jpeg-6b implementation, not less, which is why its mathematical results have always differed from those of the jpeg-6b implementation. This patch brings the accuracy of the C code in line with that of the SSE/SSE2 code.	2014-07-23 10:26:46 -04:00
Frank Bossen	ccb1d12f53	Update doc re: various DCT implementations From libjpeg-turbo r1287	2014-07-23 10:23:24 -04:00
Frank Bossen	c73a82c6aa	Fix build for wrjpgcom See r1325 in libjpeg-turbo	2014-07-23 09:22:55 -04:00
Frank Bossen	b9f25333f6	Disable scan optimization if no scan given Addresses segmentation fault issue in #69	2014-07-21 20:11:46 +02:00
Josh Aas	0f6b96c68e	Merge pull request #73 from pornel/master Added few error messages in cjpeg	2014-07-21 11:23:20 -05:00
Josh Aas	eeea9ed397	Merge pull request #72 from jlongman/master Bugfix: AM_PROG_AR is not recognized by older automake, so only use it w...	2014-07-21 11:21:30 -05:00
Josh Aas	b95528727f	Merge pull request #75 from pmed/master Fixed mozjpeg build with Visual C++ 2010	2014-07-21 11:18:33 -05:00
Pavel Medvedev	d9605b0560	Fixed mozjpeg build with Visual C++ 2010 Moved several variable declarations out of inner scopes to the function scope to compile C code with Visual C++ 2010	2014-07-21 15:32:54 +04:00
Kornel Lesiński	40e6e8b2a2	Added few error messages in cjpeg	2014-07-20 16:11:59 +01:00
Frank Bossen	514307e9e6	Fix trellis for nonprogressive mode (#69 ) Correct number of trellis passes when in nonprogressive mode	2014-07-18 18:10:56 +02:00
jlongman	7388a54647	Bugfix: AM_PROG_AR is not recognized by older automake, so only use it when defined	2014-07-17 14:16:24 -04:00
Josh Aas	e7a135b930	Bump version number for 2.0, make this version 2.0.1.	2014-07-15 13:55:55 -05:00