mozjpeg

Author	SHA1	Message	Date
DRC	e8aa5fa934	Add JSIMD_NOHUFFENC environment variable for ARM Useful in regression/performance testing	2016-01-15 13:15:54 -06:00
DRC	ec6941f7bc	Complete the ARM64 NEON SIMD implementation This adds 64-bit NEON coverage for all of the algorithms that are covered by the 32-bit NEON implementation, except for h2v1 (4:2:2) fancy upsampling (used when decompressing 4:2:2 JPEG images.) It also adds 64-bit NEON SIMD coverage for: * slow integer forward DCT (compressor) * h2v2 (4:2:0) downsampling (compressor) * h2v1 (4:2:2) downsampling (compressor) which are not covered in the 32-bit implementation. Compression speedups relative to libjpeg-turbo 1.4.2: Apple A7 (iPhone 5S), iOS, 64-bit: 113-150% (reported) 48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 2.1-33% (avg. 15%) Refer to #44 and #49 for discussion This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Based on: `dcd9d84f10` `b0d87b811f` `70cd5c8a49` `3e58d9a064` `837b19542f` `73dc43ccc8` `a82b71a261` `c1b1188c21` `305c89284e` `7f443f9995` `4c2b53b77d` Unified version with fixes: `1004a3cd05`	2016-01-15 11:21:48 -06:00
DRC	499c470b63	ARM32 NEON SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 800 MHz ARM Cortex-A9, iOS, 32-bit: 26-44% (avg. 32%) Refer to #42 and #47 for discussion. This commit also removes the unnecessary if (simd_support & JSIMD_ARM_NEON) statements from the jsimd* algorithm functions. Since the jsimd_can*() functions check for the existence of NEON, the corresponding algorithm functions will never be called if NEON isn't available. Removing those if statements improved performance across the board by a couple of percent. Based on: `fc023c880c`	2016-01-13 23:38:35 -06:00
DRC	f3a8684cd1	SSE2 SIMD implementation of Huffman encoding Full-color compression speedups relative to libjpeg-turbo 1.4.2: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%) 2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%) 2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%) 3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%) 3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%) 2.6 GHz AMD Athlon 64 X2 5050e: Performance-neutral (give or take a few percent) Full-color compression speedups relative to IPP: 2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%) 2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%) Refer to #42 for discussion. Numerous other approaches were attempted, but this one proved to be the most performant across all platforms. This commit also fixes #3 (works around, really-- the clang-compiled version of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but that code is now bypassed by the new SSE2 Huffman algorithm.) Based on: `2cb4d41330` `36c94e050d`	2016-01-12 03:03:49 -06:00
DRC	368cd52d38	Allow JSIMD_FORCENONE=1 env to disable x86-64 SIMD Traditionally, the x86-64 code did not call init_simd() because it had no need to (only SSE2 was supported.) However, having the ability to disable SIMD at run time is a useful testing tool, and all of the other SIMD implementations have this ability.	2016-01-12 00:29:34 -06:00
DRC	fbe5007fbc	Merge branch '1.4.x'	2015-12-19 14:29:46 -06:00
DRC	71e971fb35	Build: Use FILEPATH type for NASM CMake variable This causes cmake-gui to to display the proper file chooser dialog (as opposed to the directory chooser.) Fixes #40	2015-12-19 14:18:21 -06:00
DRC	d70a5c12fc	Remove unnecessary .arch directive in ARM64 code This directive was preventing the code from assembling using the integrated assembler in clang. Fixes #33	2015-12-14 17:02:43 -06:00
DRC	1e32fe3113	Replace INT32 with a new internal datatype (JLONG) These days, INT32 is a commonly-defined datatype in system headers. We cannot eliminate the definition of that datatype from jmorecfg.h, since the INT32 typedef has technically been part of the libjpeg API since version 5 (1994.) However, using INT32 internally is risky, because the inclusion of a particular header (Xmd.h, for instance) could change the definition of INT32 from long to int on 64-bit platforms and thus change the internal behavior of libjpeg-turbo in unexpected ways (for instance, failing to correctly set __INT32_IS_ACTUALLY_LONG to match the INT32 typedef-- perhaps as a result of including the wrong version of jpeglib.h-- could cause libjpeg-turbo to produce incorrect results.) The library has always been built in environments in which INT32 is effectively long (on Windows, long is always 32-bit, so effectively it's the same as int), so it makes sense to turn INT32 into an explicitly long datatype. This ensures that libjpeg-turbo will always behave consistently, regardless of the headers included at compile time. Addresses a concern expressed in #26.	2015-10-14 20:34:32 -05:00
DRC	7e3acc0e0a	Rename README, LICENSE, BUILDING text files The IJG README file has been renamed to README.ijg, in order to avoid confusion (many people were assuming that that was our project's README file and weren't reading README-turbo.txt) and to lay the groundwork for markdown versions of the libjpeg-turbo README and build instructions.	2015-10-10 10:31:33 -05:00
DRC	b961f0bfce	Merge branch '1.4.x'	2015-09-16 23:16:38 -05:00
James Cowgill	54792ba340	Fix MIPS DSPr2 4:2:0 upsample bug w/ small images The DSPr2 code was errantly comparing the residual (t9, width & 0xF) with the end pointer (t4, out + width) instead of the width directly (a1). This would give the wrong results with any image whose output width was less than 16. The other small changes (ulw to lw and removal of the nop) are just some easy optimizations around this code. This issue caused a buffer overrun and subsequent segfault on images whose scaled output height was 1 pixel and whose scaled output width was < 16 pixels. Note that the "plain" (non-fancy and non-merged) upsample routine, which was affected by this bug, is normally not used except when decompressing a non-YCbCr JPEG image, but it is also used when decompressing a single-row image (because the other upsampling algorithms require at least two rows.) Closes #16.	2015-09-16 23:15:22 -05:00
Chandler Carruth	498d9bc92f	Fix x86-64 ABI conformance issue in SIMD code (descriptions cribbed by DRC from discussion in #20) In the x86-64 ABI, the high (unused) DWORD of a 32-bit argument's register is undefined, so it was incorrect to use a 64-bit mov instruction to transfer a JDIMENSION argument in the 64-bit SSE2 SIMD functions. The code worked thus far only because the existing compiler optimizers weren't smart enough to do anything else with the register in question, so the upper 32 bits happened to be all zeroes-- for the past 6 years, on every x86-64 compiler previously known to mankind. The bleeding-edge Clang/LLVM compiler has a smarter optimizer, and under certain circumstances, it will attempt to load-combine adjacent 32-bit integers from one of the libjpeg structures into a single 64-bit integer and pass that 64-bit integer as a 32-bit argument to one of the SIMD functions (which is allowed by the ABI, since the upper 32 bits of the 32-bit argument's register are undefined.) This caused the libjpeg-turbo regression tests to crash. Also enhance the documentation of JDIMENSION to explain that its size is significant to the implementation of the SIMD code. Closes #20. Refer also to http://crbug.com/532214.	2015-09-16 22:35:31 -05:00
James Cowgill	f62dbccf5f	Fix build error when compiling MIPS SIMD w/ -mfpxx When compiled with -mfpxx (which is now the default on Debian), there are some restrictions on the use of odd-numbered FP registers. More details about FPXX can be found here: https://dmz-portal.mips.com/wiki/MIPS_O32_ABI_-_FR0_and_FR1_Interlinking This commit simply changes all uses of FP registers to an even-numbered equivalent like this: f0 -> f0 f1 -> f2 f2 -> f4 ... f8 -> f16 This commit should have no observable effect except that the MIPS assembly will now compile with -mfpxx. Closes #11	2015-08-26 21:30:18 -05:00
DRC	691cd93383	Fix 'make dist' git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1574 632fc199-4ca6-4c93-a231-07263d6284db	2015-06-20 16:36:32 +00:00
DRC	89b5e06df5	Studies show that GCC v5.1.0 performs as well as or better than v4.2, but v4.7.x-v4.9.x do not perform as well as v4.2. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1571 632fc199-4ca6-4c93-a231-07263d6284db	2015-06-20 16:20:53 +00:00
DRC	eea6424596	Typo git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1570 632fc199-4ca6-4c93-a231-07263d6284db	2015-06-20 15:51:34 +00:00
DRC	f15ef33768	Fix a segfault that occured in the MIPS DSPr2 fancy upsampling routine when downsampled_width==3. Because the DSPr2 code unrolls the loop for the middle columns (refer to jdsample.c), it has the effect of performing two column iterations, and that only works properly if the number of columns (minus the first and last) is >= 2. For the specific case of downsampled_width==3, this patch skips to the second iteration of the unrolled column loop. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1562 632fc199-4ca6-4c93-a231-07263d6284db	2015-06-08 17:41:34 +00:00
DRC	3b7015d5d8	Enable silent build rules for the NASM objects, if the source is configured with automake 1.11 or later. NOTE: the build still spits out "error: ignoring unknown tag NASM" for each object, but unfortunately, if we remove "--tag NASM" from the command line, the build breaks under older versions of automake (it aborts with "unable to infer tagged configuration.") git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1534 632fc199-4ca6-4c93-a231-07263d6284db	2015-02-23 19:03:29 +00:00
DRC	771ab19437	Extend the AltiVec VMX SIMD routines to support little endian PowerPC platforms. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1529 632fc199-4ca6-4c93-a231-07263d6284db	2015-02-20 19:57:21 +00:00
DRC	3f7608348d	Come on, Cohaagen, you got what you want. Give these people air! git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1527 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-28 00:25:43 +00:00
DRC	8b5a0093ee	Oops. The MIPS SIMD implementations of h2v1 and h2v2 upsampling were not checking for DSPr2 support, so running 'djpeg -nosmooth' on a non-DSPr2-enabled platform caused an "illegal instruction" error. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/branches/1.4.x@1523 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-21 17:42:28 +00:00
DRC	246b01bb0a	Revert r1506 (we actually are generating columns with the IDCT, so the naming makes sense in retrospect); further de-confusification in the forward DCT git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1507 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-16 03:13:16 +00:00
DRC	c4e3c361d7	De-confusify the variable names a bit -- "out" represents the output of the IDCT kernel, so use "final" to represent the packed data that will be stored to memory. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1506 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-15 08:51:31 +00:00
DRC	c641cddb80	AltiVec SIMD implementation of H2V1 and H2V2 plain upsampling (used only when decompressing YCCK images with fast upsampling enabled.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1504 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-14 15:41:11 +00:00
DRC	86af36aebc	AltiVec SIMD implementation of H2V1 and H2V2 merged upsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1503 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-14 13:27:32 +00:00
DRC	11c4010c3c	Fix an overread detected by valgrind git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1502 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-14 13:07:06 +00:00
DRC	2517ef72ed	Fix bugs in the AltiVec fancy upsampling routines uncovered during additional testing with small image sizes. Since the input width is half the output width, the upsampler should only write a second 16-byte chuck if there are more than 8 input columns left. Additionally, if the width is < 16, then we need to insert a dummy sample (the SSE2 code does this as well, but I neglected to port that portion of the code for some reason.) git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1501 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-14 10:45:31 +00:00
DRC	cbcb53617b	Fix a bug in the AltiVec downsampling routines uncovered during additional testing with small image sizes. Since the output width is half the input width, the downsampler should only read a second 16-byte chunk if there are more than 8 output columns left. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1499 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-14 08:31:54 +00:00
DRC	ada430bb3d	Make the formatting and naming of variables and constants more consistent git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1498 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-13 11:49:15 +00:00
DRC	406bb01ca2	Make the formatting and naming of variables and constants more consistent git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1497 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-13 11:47:20 +00:00
DRC	a6a24c270e	Make the formatting and naming of variables and constants more consistent git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1496 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-13 10:00:12 +00:00
DRC	52a4ec6c8a	AltiVec SIMD implementation of H2V1 and H2V2 fancy upsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1495 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-13 09:02:29 +00:00
DRC	51eba06011	Minor code readability tweak git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1492 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-11 07:38:35 +00:00
DRC	d71a6e0c25	Use intrinsics for loading aligned data in the IDCT functions. This has no effect on performance, but it makes it more obvious what that code is doing. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1491 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-11 06:34:47 +00:00
DRC	ac4daa7716	AltiVec SIMD implementation of YCC-to-RGB color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1489 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-10 22:56:26 +00:00
DRC	af69295d3f	Fix minor issue in code comments git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1488 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-10 12:25:42 +00:00
DRC	6aed11293d	Fix minor issue in code comments git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1487 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-10 12:10:08 +00:00
DRC	a5005751a3	Simplify the code somewhat. It actually wasn't necessary to have a "fast path" and a "medium path"-- they perform the same. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1486 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-10 12:09:11 +00:00
DRC	25347882ed	Overhaul the AltiVec vector loading code in the compression-side colorspace conversion routines. The existing code was sometimes overreading the source buffer (at least according to valgrind), and it was necessary to increase the complexity of the code in order to prevent this without significantly compromising performance. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1485 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-10 11:32:36 +00:00
DRC	8de75d0f57	Fix minor issue in code comments git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1484 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-08 10:42:54 +00:00
DRC	22048207ea	AltiVec SIMD implementation of 2x1 and 2x2 downsampling git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1483 632fc199-4ca6-4c93-a231-07263d6284db	2015-01-08 06:18:33 +00:00
DRC	577ecd93f1	AltiVec SIMD implementation of sample conversion and integer quantization git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1474 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-23 04:14:54 +00:00
DRC	ff30c63934	Document the fact that the AltiVec implementation uses the same modified algorithms as the SSE2 implementation git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1473 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-23 02:42:59 +00:00
DRC	45453085e7	Use intrinsics for loading/storing data in the DCT/IDCT functions. This has no effect on the performance of the aligned loads/stores, but it makes it more obvious what that code is doing. Using intrinsics for the unaligned stores in the inverse DCT functions increases overall decompression performance by 1-2%. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1472 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-22 16:04:17 +00:00
DRC	b1fec4ff98	AltiVec SIMD implementation of RGB-to-Grayscale color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1471 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-22 14:10:33 +00:00
DRC	5976e42550	Remove unneeded code; Make sure jccolor-altivec.o will be rebuilt if jccolext-altivec.c changes. git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1470 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-22 13:57:30 +00:00
DRC	62bae204f4	AltiVec SIMD implementation of RGB-to-YCC color conversion git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1469 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-22 13:42:26 +00:00
DRC	13af139624	Make comments more consistent git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1466 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-22 01:38:01 +00:00
DRC	bf8a5feb79	Cosmetic tweaks to the PowerPC SIMD stubs git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1464 632fc199-4ca6-4c93-a231-07263d6284db	2014-12-22 01:10:11 +00:00

... 3 4 5 6 7 ...

410 Commits