Commit Graph

17 Commits

Author SHA1 Message Date
DRC
33a4b3d400 Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1388 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-25 15:26:09 +00:00
DRC
b052d67eb1 ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
-----

aee36252be.patch

From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15

The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).

The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.

The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.

Also use larger indentation in the source code for separating two independent
instruction streams.

-----

a5efdbf22c.patch

From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion

The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7:  ~55 cycles
* ARM Cortex-A8:  ~28 cycles
* ARM Cortex-A9:  ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles

Based on the Linaro rgb565 patch from
    https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.


git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-23 15:47:51 +00:00
DRC
83052612d0 .func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail (http://llvm.org/bugs/show_bug.cgi?id=20424).
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1375 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-22 11:31:46 +00:00
DRC
abb6a513fa Formatting tweaks 2014-02-05 07:39:38 +00:00
DRC
dbfa2648d8 Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) 2012-02-02 22:32:45 +00:00
DRC
e808882c95 Update Nokia contact info 2011-09-06 18:58:22 +00:00
DRC
a02a9af565 Improve performance of IFAST iDCT by changing the order of transpose and descale steps 2011-09-06 18:57:53 +00:00
DRC
061f96dc7d Make ARM ISLOW iDCT faster on typical cases, and eliminate the possibility of 16-bit overflows when handling arbitrary coefficients. 2011-09-06 18:55:45 +00:00
DRC
00e258dedd Improve the performance of YCbCr to RGB conversion on ARM 2011-08-24 23:27:44 +00:00
DRC
7672bd3ac5 NEON-accelerated slow integer inverse DCT 2011-08-22 13:48:01 +00:00
DRC
00a69f142a NEON-accelerated quantization 2011-08-17 21:00:59 +00:00
DRC
dbb92f2eee Improve performance of ARM NEON IFAST iDCT 2011-08-15 08:36:51 +00:00
DRC
22b4359e42 ARM NEON-accelerated RGB-to-YCbCr conversion 2011-08-12 19:27:20 +00:00
DRC
ce02d1d62a Support for accelerated forward DCT using ARM NEON instructions 2011-08-10 23:31:13 +00:00
DRC
e3f7e75525 NEON-optimized 2x2 and 4x4 scaled iDCTs 2011-06-17 21:12:58 +00:00
DRC
d02c734a19 iOS ARM support 2011-06-14 22:16:50 +00:00
DRC
99799a6c29 ARM NEON support 2011-05-03 08:47:43 +00:00