Commit Graph

21 Commits

Author SHA1 Message Date
DRC
b052d67eb1 ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
-----

aee36252be.patch

From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15

The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).

The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.

The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.

Also use larger indentation in the source code for separating two independent
instruction streams.

-----

a5efdbf22c.patch

From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion

The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7:  ~55 cycles
* ARM Cortex-A8:  ~28 cycles
* ARM Cortex-A9:  ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles

Based on the Linaro rgb565 patch from
    https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.


git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-23 15:47:51 +00:00
DRC
76ef3c5dda Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench. 2014-05-19 19:13:22 +00:00
DRC
6263c1fc1b SIMD-accelerated int upsample routine for MIPS DSPr2 2014-05-18 20:04:47 +00:00
DRC
ecfbabdbf3 Fix MIPS build 2014-05-18 19:36:05 +00:00
DRC
5d5b9a497b Clean up code formatting in the SIMD interface functions 2014-05-15 19:45:11 +00:00
DRC
99de998e2c SIMD-accelerated NULL convert routine for MIPS DSPr2 2014-05-15 18:26:01 +00:00
DRC
c4c3ac6305 SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2 2014-05-14 15:00:10 +00:00
DRC
38bfd451d5 SIMD-accelerated merged upsampling routines for MIPS DSPr2 2014-05-13 18:40:14 +00:00
DRC
7824f70008 SIMD-accelerated slow integer IDCT routine for MIPS DSPr2 2014-05-06 09:53:21 +00:00
DRC
c6c8c7911f SIMD-accelerated integer convsamp routine for MIPS DSPr2 2013-10-12 21:39:20 +00:00
DRC
3c6b1ba545 SIMD-accelerated floating point quantize and convsamp routines for MIPS DSPr2 2013-10-09 18:39:44 +00:00
DRC
10138c9d35 SIMD-accelerated fast integer inverse DCT routine for MIPS DSPr2 2013-10-08 02:18:59 +00:00
DRC
6addfed58b SIMD-accelerated fast integer forward DCT routine for MIPS DSPr2 2013-10-08 02:11:21 +00:00
DRC
01f46504ee SIMD-accelerated slow integer forward DCT and quantize routines for MIPS DSPr2 2013-09-30 18:13:27 +00:00
DRC
198cc7c161 SIMD-accelerated 3/4 and 3/2 decompression scaling for MIPS DSPr2 2013-09-27 17:51:08 +00:00
DRC
f934fc621e SIMD-accelerated 1/2 and 1/4 decompression scaling for MIPS DSPr2 2013-09-27 17:43:23 +00:00
DRC
154c2dc749 SIMD-optimized RGB-to-grayscale conversion for MIPS DSPr2 2013-09-27 17:39:57 +00:00
DRC
aa5a1808fe SIMD support for performing upsampling using MIPS DSPr2 instructions 2013-07-27 21:50:02 +00:00
DRC
3f2e3b11f0 SIMD support for performing downsampling using MIPS DSPr2 instructions 2013-07-27 21:48:18 +00:00
DRC
41e3657631 SIMD support for performing fancy upsampling using MIPS DSPr2 instructions 2013-07-27 21:44:14 +00:00
DRC
64da9d6ba8 SIMD support for performing color conversion using MIPS DSPr2 instructions 2013-07-24 21:50:20 +00:00