From libjpeg-turbo r1288
Port the more accurate (and slightly faster) floating point IDCT
implementation from jpeg-8a and later. New research revealed that the
SSE/SSE2 floating point IDCT implementation was actually more accurate
than the jpeg-6b implementation, not less, which is why its
mathematical results have always differed from those of the jpeg-6b
implementation. This patch brings the accuracy of the C code in line
with that of the SSE/SSE2 code.