Full-color compression speedups relative to previous commits:
Cortex-A53 (Nexus 5X), Android, 64-bit: 1.1-13% (avg. 6.0%)
Cortex-A57 (Nexus 5X), Android, 64-bit: 0.0-22% (avg. 6.3%)
Refer to #47 and #50 for discussion
Closes#50
Note that this commit introduces a similar /proc/cpuinfo parser to that
of the ARM32 implementation. It is used to specifically check whether
the code is running on Cavium ThunderX and, if so, disable the ARM64
SIMD Huffman routines (which slow performance by an average of 8% on
that CPU.)
Based on:
a8c282e5e5
This adds 64-bit NEON coverage for all of the algorithms that are
covered by the 32-bit NEON implementation, except for h2v1 (4:2:2) fancy
upsampling (used when decompressing 4:2:2 JPEG images.) It also adds
64-bit NEON SIMD coverage for:
* slow integer forward DCT (compressor)
* h2v2 (4:2:0) downsampling (compressor)
* h2v1 (4:2:2) downsampling (compressor)
which are not covered in the 32-bit implementation.
Compression speedups relative to libjpeg-turbo 1.4.2:
Apple A7 (iPhone 5S), iOS, 64-bit: 113-150% (reported)
48-core ThunderX (RunAbove ARM Cloud), Linux, 64-bit: 2.1-33% (avg. 15%)
Refer to #44 and #49 for discussion
This commit also removes the unnecessary
if (simd_support & JSIMD_ARM_NEON)
statements from the jsimd* algorithm functions. Since the jsimd_can*()
functions check for the existence of NEON, the corresponding algorithm
functions will never be called if NEON isn't available.
Based on:
dcd9d84f10b0d87b811f70cd5c8a493e58d9a064837b19542f73dc43ccc8a82b71a261c1b1188c21305c89284e7f443f99954c2b53b77d
Unified version with fixes:
1004a3cd05
Full-color compression speedups relative to libjpeg-turbo 1.4.2:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 2.2-18% (avg. 9.5%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: 10-25% (avg. 17%)
2.3 GHz AMD A10-4600M APU, Linux, 64-bit: 4.9-17% (avg. 11%)
2.3 GHz AMD A10-4600M APU, Linux, 32-bit: 8.8-19% (avg. 15%)
3.0 GHz Intel Core i7, OS X, 64-bit: 3.5-16% (avg. 10%)
3.0 GHz Intel Core i7, OS X, 32-bit: 4.8-14% (avg. 11%)
2.6 GHz AMD Athlon 64 X2 5050e:
Performance-neutral (give or take a few percent)
Full-color compression speedups relative to IPP:
2.8 GHz Intel Xeon W3530, Linux, 64-bit: 4.8-34% (avg. 19%)
2.8 GHz Intel Xeon W3530, Linux, 32-bit: -19%-7.0% (avg. -7.0%)
Refer to #42 for discussion. Numerous other approaches were attempted,
but this one proved to be the most performant across all platforms.
This commit also fixes#3 (works around, really-- the clang-compiled version
of jchuff.c still performs 20% worse than its GCC-compiled counterpart, but
that code is now bypassed by the new SSE2 Huffman algorithm.)
Based on:
2cb4d4133036c94e050d
-----
aee36252be.patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c.patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn+ssh://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db