Commit Graph

15 Commits

Author SHA1 Message Date
DRC
b052d67eb1 ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
-----

aee36252be.patch

From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15

The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).

The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.

The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.

Also use larger indentation in the source code for separating two independent
instruction streams.

-----

a5efdbf22c.patch

From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion

The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7:  ~55 cycles
* ARM Cortex-A8:  ~28 cycles
* ARM Cortex-A9:  ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles

Based on the Linaro rgb565 patch from
    https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.


git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-23 15:47:51 +00:00
DRC
5d5b9a497b Clean up code formatting in the SIMD interface functions 2014-05-15 19:45:11 +00:00
DRC
0d25e86574 Remove trailing spaces (+ one additional tab in TJUnitTest.java that was missed in the previous commit) 2014-05-09 18:06:58 +00:00
DRC
bf417e56e0 Remove trailing space 2014-02-06 19:13:24 +00:00
DRC
bd029eb0f7 Make environment variable syntax consistent between ARM and x86 code, and add an option to disable SIMD on x86 (this option will be added to the x86-64 code as well, but it makes more sense to add it when we add AVX support.) 2013-10-31 07:40:24 +00:00
DRC
dbfa2648d8 Accelerated 4:2:2 upsampling routine for ARM (improves performance ~20-30% when decompressing 4:2:2 JPEGs using fancy upsampling) 2012-02-02 22:32:45 +00:00
DRC
a112f12efd Compiler warnings 2012-01-31 05:27:41 +00:00
DRC
c1e4151607 Added new alpha channel colorspace constants/pixel formats, so applications can specify that they need the unused byte in a 4-component RGB output buffer set to 0xFF when decompressing. 2011-12-19 02:21:03 +00:00
DRC
7672bd3ac5 NEON-accelerated slow integer inverse DCT 2011-08-22 13:48:01 +00:00
DRC
00a69f142a NEON-accelerated quantization 2011-08-17 21:00:59 +00:00
DRC
22b4359e42 ARM NEON-accelerated RGB-to-YCbCr conversion 2011-08-12 19:27:20 +00:00
DRC
ce02d1d62a Support for accelerated forward DCT using ARM NEON instructions 2011-08-10 23:31:13 +00:00
DRC
e3f7e75525 NEON-optimized 2x2 and 4x4 scaled iDCTs 2011-06-17 21:12:58 +00:00
DRC
d02c734a19 iOS ARM support 2011-06-14 22:16:50 +00:00
DRC
99799a6c29 ARM NEON support 2011-05-03 08:47:43 +00:00