DRC
ddd54ff8a8
Optimizations to the AltiVec DCT algorithms (pre-compute constants and combine multiply/add operations)
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1462 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-20 03:32:59 +00:00
DRC
0d435698f4
AltiVec SIMD implementation of slow integer inverse DCT
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1461 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-20 01:17:39 +00:00
DRC
63c1674ebc
Use macros to allocate constants statically, rather than reading them from a table using vec_splat*(). This improves code readability and probably improves performance a bit as well.
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1460 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-20 01:16:26 +00:00
DRC
864600d707
Swap the order of the IFAST and ISLOW FDCT functions so that it matches the order of the prototypes in jsimd.h and the stubs in jsimd_powerpc.c.
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1459 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-20 01:14:38 +00:00
DRC
aa805bc89f
Modify the ARM64 assembly file so that it uses only syntax that the clang assembler in XCode 5.x can understand. These changes should all be cosmetic in nature-- they do not change the meaning or readability of the code nor the ability to build it for Linux. Actually, the code is now more in compliance with the ARM64 programming manual. In addition to these changes, there were a couple of instructions that clang simply doesn't support, so gas-preprocessor.pl was modified so that it now converts those into equivalent instructions that clang can handle.
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1456 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-19 18:23:52 +00:00
DRC
c7dadd2d0b
AltiVec SIMD implementation of fast integer inverse DCT
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1445 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-18 10:12:29 +00:00
DRC
7475e59637
Further cleanup of the AltiVec forward DCT code:
...
-- Use macros to represent the fast FDCT constants, to facilitate comparing the AltiVec implementation of the algorithm with the SSE2 implementation.
-- Rename slow FDCT constants for consistency.
-- Use vec_sra() in all cases in the slow FDCT code. The SSE2 implementation uses psraw, which is an arithmetic shift, so we need to do likewise with AltiVec. Using vec_sr() hasn't caused any problems yet, but it is conceivable that it might cause different behavior in certain corner cases.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1444 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-18 09:49:39 +00:00
DRC
25e40dc42c
AltiVec SIMD implementation of slow integer forward DCT; Clean up fast integer forward DCT code so that it is easier to see how it derives from the SSE2 code and to make it play more nicely with the slow FDCT code.
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1443 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-17 08:04:39 +00:00
DRC
296c8bad7e
Fix cosmetic issues in AltiVec comments
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1442 632fc199-4ca6-4c93-a231-07263d6284db
2014-12-17 08:00:29 +00:00
DRC
78c2093bd4
The AltiVec code actually works on 32-bit PowerPC platforms as well, so change the "powerpc64" token to "powerpc". Also clean up the shift code, which wasn't building properly on OS X.
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1406 632fc199-4ca6-4c93-a231-07263d6284db
2014-09-05 07:23:12 +00:00
DRC
a2cc95b827
AltiVec SIMD implementation of fast forward DCT
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1405 632fc199-4ca6-4c93-a231-07263d6284db
2014-09-05 06:33:42 +00:00
DRC
2ef1bec37f
Rename the ARM64 assembly file to match the C file
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1390 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-29 01:53:17 +00:00
DRC
a1cea09935
Fix several mathematical issues discovered in the ARM64 NEON code while running the extended regression tests introduced in r1267. Specific comments can be found in the original patches:
...
https://sourceforge.net/p/libjpeg-turbo/patches/64/
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1389 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-29 01:49:59 +00:00
DRC
33a4b3d400
Reformat code per Siarhei's original patch (to clearly indicate that the offset instructions are completely independent) and add Siarhei as an individual author (he no longer works for Nokia.)
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1388 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-25 15:26:09 +00:00
DRC
a92d31df00
ARM64 NEON SIMD support for YCC-to-RGB565 conversion
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1386 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-23 15:57:38 +00:00
DRC
b052d67eb1
ARM NEON SIMD support for YCC-to-RGB565 conversion, and optimizations to the existing YCC-to-RGB conversion code:
...
-----
aee36252be .patch
From aee36252be20054afce371a92406fc66ba6627b5 Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com >
Date: Wed, 13 Aug 2014 03:50:22 +0300
Subject: [PATCH] ARM: Faster NEON yuv->rgb conversion for Krait and Cortex-A15
The older code was developed and tested only on ARM Cortex-A8 and ARM Cortex-A9.
Tuning it for newer ARM processors can introduce some speed-up (up to 20%).
The performance of the inner loop (conversion of 8 pixels) improves from
~27 cycles down to ~22 cycles on Qualcomm Krait 300, and from ~20 cycles
down to ~18 cycles on ARM Cortex-A15.
The performance remains exactly the same on ARM Cortex-A7 (~58 cycles),
ARM Cortex-A8 (~25 cycles) and ARM Cortex-A9 (~30 cycles) processors.
Also use larger indentation in the source code for separating two independent
instruction streams.
-----
a5efdbf22c .patch
From a5efdbf22ce9c1acd4b14a353cec863c2c57557e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@gmail.com >
Date: Wed, 13 Aug 2014 07:23:09 +0300
Subject: [PATCH] ARM: NEON optimized yuv->rgb565 conversion
The performance of the inner loop (conversion of 8 pixels):
* ARM Cortex-A7: ~55 cycles
* ARM Cortex-A8: ~28 cycles
* ARM Cortex-A9: ~32 cycles
* ARM Cortex-A15: ~20 cycles
* Qualcomm Krait: ~24 cycles
Based on the Linaro rgb565 patch from
https://sourceforge.net/p/libjpeg-turbo/patches/24/
but implements better instructions scheduling.
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1385 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-23 15:47:51 +00:00
DRC
d0d81e9c3a
Revert r1335 and r1336. It was a valiant effort, but on Windows, xmm8-xmm15 are non-volatile, and the overhead of pushing them onto the stack at the beginning of each function and popping them at the end was causing worse performance (in the neighborhood of 3-5%) than just using the work areas and limiting the register usage to xmm0-xmm7. Best to leave the SSE2 code alone. We can optimize the register usage for AVX2, once that port takes place.
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1382 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-22 18:30:44 +00:00
DRC
83052612d0
.func/.endfunc are only necessary when generating STABS debug info, which basically went out of style with parachute pants and Rick Astley. At any rate, none of the platforms for which we're building the ARM code use it (DWARF is the common format these days), and the .func/.endfunc directives cause the clang integrated assembler to fail ( http://llvm.org/bugs/show_bug.cgi?id=20424 ).
...
git-svn-id: svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk@1375 632fc199-4ca6-4c93-a231-07263d6284db
2014-08-22 11:31:46 +00:00
DRC
3c50582e77
Oops. The Windows version of collect_args/uncollect_args uses rsp, so we still need the rsp prologue/epilogue, despite the fact that we aren't using the stack as a work area. This fixes a segfault on Windows caused by r1335.
2014-08-09 22:58:18 +00:00
DRC
b7efc273a0
Attempt to improve performance by refactoring the compression-side color conversion and DCT algorithms so that they take full advantage of the additional registers available with 64-bit SSE2. This produces a somewhat yawn-worthy speedup of 2-3%, but at least the code is a lot more readable now.
2014-08-09 14:30:28 +00:00
DRC
82b8751482
Fix performance and other issues uncovered in testing with actual ARM64 hardware; formatting tweaks; remove NEON platform check (NEON is always available with ARMv8)
2014-07-23 14:14:14 +00:00
DRC
e2f5c7cab3
Add proper support for Borland compilers (Borland needs section names to be prefixed with an underscore, and it needs OMF object files.)
2014-06-22 21:14:39 +00:00
DRC
76ef3c5dda
Allow for building the MIPS DSPr2 extensions if the host is mips-* as well as mipsel-*. The DSPr2 extensions are little endian, so we still have to check that the compiler defines __MIPSEL__ before enabling them. This paves the way for supporting big-endian MIPS, and in the near term, it allows the SIMD extensions to be built with Sourcery CodeBench.
2014-05-19 19:13:22 +00:00
DRC
6263c1fc1b
SIMD-accelerated int upsample routine for MIPS DSPr2
2014-05-18 20:04:47 +00:00
DRC
ecfbabdbf3
Fix MIPS build
2014-05-18 19:36:05 +00:00
DRC
144e7b79e4
Remove MS-DOS code and information, and adjust copyright headers to reflect the removal of features in r1307 and r1308. libjpeg-turbo has never supported MS-DOS, nor is it even possible for us to do so.
2014-05-18 18:33:44 +00:00
DRC
f8301c92dd
Get rid of the HAVE_PROTOTYPES configuration option, as well as the related JMETHOD and JPP macros. libjpeg-turbo has never supported compilers that don't handle prototypes. Doing so requires ansi2knr, which isn't even supported in the IJG code anymore.
2014-05-16 10:43:44 +00:00
DRC
2c0b793539
Remove all of the NEED_SHORT_EXTERNAL_NAMES stuff. There is scant information available as to which linkers ever had a 15-character global symbol name limit. AFAICT, it might have been a VMS and/or a.out BSD thing, but none of those platforms have ever been supported by libjpeg-turbo (nor are such systems supported by other open source libraries of this nature.)
2014-05-15 20:30:16 +00:00
DRC
5d5b9a497b
Clean up code formatting in the SIMD interface functions
2014-05-15 19:45:11 +00:00
DRC
99de998e2c
SIMD-accelerated NULL convert routine for MIPS DSPr2
2014-05-15 18:26:01 +00:00
DRC
a37736dd43
Fix error in MIPS DSPr2 accelerated smooth downsample routine
2014-05-15 17:10:39 +00:00
DRC
c4c3ac6305
SIMD-accelerated h2v2 smooth downsampling routine for MIPS DSPr2
2014-05-14 15:00:10 +00:00
DRC
38bfd451d5
SIMD-accelerated merged upsampling routines for MIPS DSPr2
2014-05-13 18:40:14 +00:00
DRC
84f9fbfe3e
Modify Windows build system to take into account new assembly file names
2014-05-10 10:10:03 +00:00
DRC
1bd801a872
Using subdirectories unfortunately opened up a can of worms. In order to prevent object name conflicts, it is necessary to use the subdir-objects automake directive, but it simply doesn't work right on some of the versions of automake we still have to support. Another option would be to add a separate Makefile.am file to each subdirectory, but that requires maintaining a completely different set of build rules for each one. Fortunately, however, we're in the 21st century now, so we can use filenames longer than 8.3.
2014-05-10 09:53:34 +00:00
DRC
6af3f00efa
Re-organize the x86/x86-64 SIMD routines into separate folders by instruction set so we can name each routine similarly to its corresponding C file. This also makes it easier to add support for new instruction sets.
2014-05-09 20:14:26 +00:00
DRC
0d25e86574
Remove trailing spaces (+ one additional tab in TJUnitTest.java that was missed in the previous commit)
2014-05-09 18:06:58 +00:00
DRC
e45363d7c2
Convert tabs to spaces in the libjpeg code and the SIMD code (TurboJPEG retains the use of tabs for historical reasons. They were annoying in the libjpeg code primarily because they were not consistently used and because they were used to format as well as indent the code. In the case of TurboJPEG, tabs are used just to indent the code, so even if the editor assumes a different tab width, the code will still be readable.)
2014-05-09 18:00:32 +00:00
DRC
ef1c66701a
Fix an error in the MIPS DSPr2 fancy upsampling routine
2014-05-09 14:45:55 +00:00
DRC
7824f70008
SIMD-accelerated slow integer IDCT routine for MIPS DSPr2
2014-05-06 09:53:21 +00:00
DRC
bf417e56e0
Remove trailing space
2014-02-06 19:13:24 +00:00
DRC
890f35098a
Create a separate stub file for 64-bit ARM, since it currently implements only the decompression-related functions.
2014-02-05 19:03:41 +00:00
DRC
1bb1e69186
First pass at ARMv8 64-bit NEON SIMD support
2014-02-05 08:15:44 +00:00
DRC
abb6a513fa
Formatting tweaks
2014-02-05 07:39:38 +00:00
DRC
bd029eb0f7
Make environment variable syntax consistent between ARM and x86 code, and add an option to disable SIMD on x86 (this option will be added to the x86-64 code as well, but it makes more sense to add it when we add AVX support.)
2013-10-31 07:40:24 +00:00
DRC
c6c8c7911f
SIMD-accelerated integer convsamp routine for MIPS DSPr2
2013-10-12 21:39:20 +00:00
DRC
3c6b1ba545
SIMD-accelerated floating point quantize and convsamp routines for MIPS DSPr2
2013-10-09 18:39:44 +00:00
DRC
10138c9d35
SIMD-accelerated fast integer inverse DCT routine for MIPS DSPr2
2013-10-08 02:18:59 +00:00
DRC
6addfed58b
SIMD-accelerated fast integer forward DCT routine for MIPS DSPr2
2013-10-08 02:11:21 +00:00
DRC
01f46504ee
SIMD-accelerated slow integer forward DCT and quantize routines for MIPS DSPr2
2013-09-30 18:13:27 +00:00