Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a2e6a9dd47 | ||
|
|
5ead57a34a |
134
README
134
README
@@ -1,8 +1,8 @@
|
|||||||
The Independent JPEG Group's JPEG software
|
The Independent JPEG Group's JPEG software
|
||||||
==========================================
|
==========================================
|
||||||
|
|
||||||
README for release 6a of 7-Feb-96
|
README for release 6b of 27-Mar-1998
|
||||||
=================================
|
====================================
|
||||||
|
|
||||||
This distribution contains the sixth public release of the Independent JPEG
|
This distribution contains the sixth public release of the Independent JPEG
|
||||||
Group's free JPEG software. You are welcome to redistribute this software and
|
Group's free JPEG software. You are welcome to redistribute this software and
|
||||||
@@ -13,9 +13,10 @@ larger programs) should contact IJG at jpeg-info@uunet.uu.net to be added to
|
|||||||
our electronic mailing list. Mailing list members are notified of updates
|
our electronic mailing list. Mailing list members are notified of updates
|
||||||
and have a chance to participate in technical discussions, etc.
|
and have a chance to participate in technical discussions, etc.
|
||||||
|
|
||||||
This software is the work of Tom Lane, Philip Gladstone, Luis Ortiz, Jim
|
This software is the work of Tom Lane, Philip Gladstone, Jim Boucher,
|
||||||
Boucher, Lee Crocker, Julian Minguillon, George Phillips, Davide Rossi,
|
Lee Crocker, Julian Minguillon, Luis Ortiz, George Phillips, Davide Rossi,
|
||||||
Ge' Weijers, and other members of the Independent JPEG Group.
|
Guido Vollbeding, Ge' Weijers, and other members of the Independent JPEG
|
||||||
|
Group.
|
||||||
|
|
||||||
IJG is not affiliated with the official ISO JPEG standards committee.
|
IJG is not affiliated with the official ISO JPEG standards committee.
|
||||||
|
|
||||||
@@ -126,7 +127,7 @@ with respect to this software, its quality, accuracy, merchantability, or
|
|||||||
fitness for a particular purpose. This software is provided "AS IS", and you,
|
fitness for a particular purpose. This software is provided "AS IS", and you,
|
||||||
its user, assume the entire risk as to its quality and accuracy.
|
its user, assume the entire risk as to its quality and accuracy.
|
||||||
|
|
||||||
This software is copyright (C) 1991-1996, Thomas G. Lane.
|
This software is copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
All Rights Reserved except as specified below.
|
All Rights Reserved except as specified below.
|
||||||
|
|
||||||
Permission is hereby granted to use, copy, modify, and distribute this
|
Permission is hereby granted to use, copy, modify, and distribute this
|
||||||
@@ -166,8 +167,11 @@ ansi2knr.c for full details.) However, since ansi2knr.c is not needed as part
|
|||||||
of any program generated from the IJG code, this does not limit you more than
|
of any program generated from the IJG code, this does not limit you more than
|
||||||
the foregoing paragraphs do.
|
the foregoing paragraphs do.
|
||||||
|
|
||||||
The configuration script "configure" was produced with GNU Autoconf. It
|
The Unix configuration script "configure" was produced with GNU Autoconf.
|
||||||
is copyright by the Free Software Foundation but is freely distributable.
|
It is copyright by the Free Software Foundation but is freely distributable.
|
||||||
|
The same holds for its supporting scripts (config.guess, config.sub,
|
||||||
|
ltconfig, ltmain.sh). Another support script, install-sh, is copyright
|
||||||
|
by M.I.T. but is also freely distributable.
|
||||||
|
|
||||||
It appears that the arithmetic coding option of the JPEG spec is covered by
|
It appears that the arithmetic coding option of the JPEG spec is covered by
|
||||||
patents owned by IBM, AT&T, and Mitsubishi. Hence arithmetic coding cannot
|
patents owned by IBM, AT&T, and Mitsubishi. Hence arithmetic coding cannot
|
||||||
@@ -178,13 +182,12 @@ Huffman mode, it is unlikely that very many implementations will support it.)
|
|||||||
So far as we are aware, there are no patent restrictions on the remaining
|
So far as we are aware, there are no patent restrictions on the remaining
|
||||||
code.
|
code.
|
||||||
|
|
||||||
WARNING: Unisys has begun to enforce their patent on LZW compression against
|
The IJG distribution formerly included code to read and write GIF files.
|
||||||
GIF encoders and decoders. You will need a license from Unisys to use the
|
To avoid entanglement with the Unisys LZW patent, GIF reading support has
|
||||||
included rdgif.c or wrgif.c files in a commercial or shareware application.
|
been removed altogether, and the GIF writer has been simplified to produce
|
||||||
At this time, Unisys is not enforcing their patent against freeware, so
|
"uncompressed GIFs". This technique does not use the LZW algorithm; the
|
||||||
distribution of this package remains legal. However, we intend to remove
|
resulting GIF files are larger than usual, but are readable by all standard
|
||||||
GIF support from the IJG package as soon as a suitable replacement format
|
GIF decoders.
|
||||||
becomes reasonably popular.
|
|
||||||
|
|
||||||
We are required to state that
|
We are required to state that
|
||||||
"The Graphics Interchange Format(c) is the Copyright property of
|
"The Graphics Interchange Format(c) is the Copyright property of
|
||||||
@@ -203,21 +206,21 @@ The best short technical introduction to the JPEG compression algorithm is
|
|||||||
Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44.
|
Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44.
|
||||||
(Adjacent articles in that issue discuss MPEG motion picture compression,
|
(Adjacent articles in that issue discuss MPEG motion picture compression,
|
||||||
applications of JPEG, and related topics.) If you don't have the CACM issue
|
applications of JPEG, and related topics.) If you don't have the CACM issue
|
||||||
handy, a PostScript file containing a revised version of Wallace's article
|
handy, a PostScript file containing a revised version of Wallace's article is
|
||||||
is available at ftp.uu.net, graphics/jpeg/wallace.ps.gz. The file (actually
|
available at ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz. The file (actually
|
||||||
a preprint for an article that appeared in IEEE Trans. Consumer Electronics)
|
a preprint for an article that appeared in IEEE Trans. Consumer Electronics)
|
||||||
omits the sample images that appeared in CACM, but it includes corrections
|
omits the sample images that appeared in CACM, but it includes corrections
|
||||||
and some added material. Note: the Wallace article is copyright ACM and
|
and some added material. Note: the Wallace article is copyright ACM and IEEE,
|
||||||
IEEE, and it may not be used for commercial purposes.
|
and it may not be used for commercial purposes.
|
||||||
|
|
||||||
A somewhat less technical, more leisurely introduction to JPEG can be found in
|
A somewhat less technical, more leisurely introduction to JPEG can be found in
|
||||||
"The Data Compression Book" by Mark Nelson, published by M&T Books (Redwood
|
"The Data Compression Book" by Mark Nelson and Jean-loup Gailly, published by
|
||||||
City, CA), 1991, ISBN 1-55851-216-0. This book provides good explanations and
|
M&T Books (New York), 2nd ed. 1996, ISBN 1-55851-434-1. This book provides
|
||||||
example C code for a multitude of compression methods including JPEG. It is
|
good explanations and example C code for a multitude of compression methods
|
||||||
an excellent source if you are comfortable reading C code but don't know much
|
including JPEG. It is an excellent source if you are comfortable reading C
|
||||||
about data compression in general. The book's JPEG sample code is far from
|
code but don't know much about data compression in general. The book's JPEG
|
||||||
industrial-strength, but when you are ready to look at a full implementation,
|
sample code is far from industrial-strength, but when you are ready to look
|
||||||
you've got one here...
|
at a full implementation, you've got one here...
|
||||||
|
|
||||||
The best full description of JPEG is the textbook "JPEG Still Image Data
|
The best full description of JPEG is the textbook "JPEG Still Image Data
|
||||||
Compression Standard" by William B. Pennebaker and Joan L. Mitchell, published
|
Compression Standard" by William B. Pennebaker and Joan L. Mitchell, published
|
||||||
@@ -242,10 +245,9 @@ Part 1: Requirements and guidelines" and has document numbers ISO/IEC IS
|
|||||||
Continuous-tone Still Images, Part 2: Compliance testing" and has document
|
Continuous-tone Still Images, Part 2: Compliance testing" and has document
|
||||||
numbers ISO/IEC IS 10918-2, ITU-T T.83.
|
numbers ISO/IEC IS 10918-2, ITU-T T.83.
|
||||||
|
|
||||||
Extensions to the original JPEG standard are defined in JPEG Part 3, a new ISO
|
Some extensions to the original JPEG standard are defined in JPEG Part 3,
|
||||||
document. Part 3 is undergoing ISO balloting and is expected to be approved
|
a newer ISO standard numbered ISO/IEC IS 10918-3 and ITU-T T.84. IJG
|
||||||
by the end of 1995; it will have document numbers ISO/IEC IS 10918-3, ITU-T
|
currently does not support any Part 3 extensions.
|
||||||
T.84. IJG currently does not support any Part 3 extensions.
|
|
||||||
|
|
||||||
The JPEG standard does not specify all details of an interchangeable file
|
The JPEG standard does not specify all details of an interchangeable file
|
||||||
format. For the omitted details we follow the "JFIF" conventions, revision
|
format. For the omitted details we follow the "JFIF" conventions, revision
|
||||||
@@ -255,24 +257,22 @@ format. For the omitted details we follow the "JFIF" conventions, revision
|
|||||||
1778 McCarthy Blvd.
|
1778 McCarthy Blvd.
|
||||||
Milpitas, CA 95035
|
Milpitas, CA 95035
|
||||||
phone (408) 944-6300, fax (408) 944-6314
|
phone (408) 944-6300, fax (408) 944-6314
|
||||||
A PostScript version of this document is available at ftp.uu.net, file
|
A PostScript version of this document is available by FTP at
|
||||||
graphics/jpeg/jfif.ps.gz. It can also be obtained by e-mail from the C-Cube
|
ftp://ftp.uu.net/graphics/jpeg/jfif.ps.gz. There is also a plain text
|
||||||
mail server, netlib@c3.pla.ca.us. Send the message "send jfif_ps from jpeg"
|
version at ftp://ftp.uu.net/graphics/jpeg/jfif.txt.gz, but it is missing
|
||||||
to the server to obtain the JFIF document; send the message "help" if you have
|
the figures.
|
||||||
trouble.
|
|
||||||
|
|
||||||
The TIFF 6.0 file format specification can be obtained by FTP from sgi.com
|
The TIFF 6.0 file format specification can be obtained by FTP from
|
||||||
(192.48.153.1), file graphics/tiff/TIFF6.ps.Z; or you can order a printed
|
ftp://ftp.sgi.com/graphics/tiff/TIFF6.ps.gz. The JPEG incorporation scheme
|
||||||
copy from Aldus Corp. at (206) 628-6593. The JPEG incorporation scheme
|
|
||||||
found in the TIFF 6.0 spec of 3-June-92 has a number of serious problems.
|
found in the TIFF 6.0 spec of 3-June-92 has a number of serious problems.
|
||||||
IJG does not recommend use of the TIFF 6.0 design (TIFF Compression tag 6).
|
IJG does not recommend use of the TIFF 6.0 design (TIFF Compression tag 6).
|
||||||
Instead, we recommend the JPEG design proposed by TIFF Technical Note #2
|
Instead, we recommend the JPEG design proposed by TIFF Technical Note #2
|
||||||
(Compression tag 7). Copies of this Note can be obtained from sgi.com or
|
(Compression tag 7). Copies of this Note can be obtained from ftp.sgi.com or
|
||||||
from ftp.uu.net:/graphics/jpeg/. It is expected that the next revision of
|
from ftp://ftp.uu.net/graphics/jpeg/. It is expected that the next revision
|
||||||
the TIFF spec will replace the 6.0 JPEG design with the Note's design.
|
of the TIFF spec will replace the 6.0 JPEG design with the Note's design.
|
||||||
Although IJG's own code does not support TIFF/JPEG, the free libtiff library
|
Although IJG's own code does not support TIFF/JPEG, the free libtiff library
|
||||||
uses our library to implement TIFF/JPEG per the Note. libtiff is available
|
uses our library to implement TIFF/JPEG per the Note. libtiff is available
|
||||||
from sgi.com:/graphics/tiff/.
|
from ftp://ftp.sgi.com/graphics/tiff/.
|
||||||
|
|
||||||
|
|
||||||
ARCHIVE LOCATIONS
|
ARCHIVE LOCATIONS
|
||||||
@@ -281,26 +281,27 @@ ARCHIVE LOCATIONS
|
|||||||
The "official" archive site for this software is ftp.uu.net (Internet
|
The "official" archive site for this software is ftp.uu.net (Internet
|
||||||
address 192.48.96.9). The most recent released version can always be found
|
address 192.48.96.9). The most recent released version can always be found
|
||||||
there in directory graphics/jpeg. This particular version will be archived
|
there in directory graphics/jpeg. This particular version will be archived
|
||||||
as graphics/jpeg/jpegsrc.v6a.tar.gz. If you are on the Internet, you
|
as ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz. If you don't have
|
||||||
can retrieve files from ftp.uu.net by standard anonymous FTP. If you don't
|
direct Internet access, UUNET's archives are also available via UUCP; contact
|
||||||
have FTP access, UUNET's archives are also available via UUCP; contact
|
|
||||||
help@uunet.uu.net for information on retrieving files that way.
|
help@uunet.uu.net for information on retrieving files that way.
|
||||||
|
|
||||||
Numerous Internet sites maintain copies of the UUNET files. However, only
|
Numerous Internet sites maintain copies of the UUNET files. However, only
|
||||||
ftp.uu.net is guaranteed to have the latest official version.
|
ftp.uu.net is guaranteed to have the latest official version.
|
||||||
|
|
||||||
You can also obtain this software in DOS-compatible "zip" archive format from
|
You can also obtain this software in DOS-compatible "zip" archive format from
|
||||||
the SimTel archives (ftp.coast.net:/SimTel/msdos/graphics/), or on CompuServe
|
the SimTel archives (ftp://ftp.simtel.net/pub/simtelnet/msdos/graphics/), or
|
||||||
in the Graphics Support forum (GO CIS:GRAPHSUP), library 12 "JPEG Tools".
|
on CompuServe in the Graphics Support forum (GO CIS:GRAPHSUP), library 12
|
||||||
Again, these versions may sometimes lag behind the ftp.uu.net release.
|
"JPEG Tools". Again, these versions may sometimes lag behind the ftp.uu.net
|
||||||
|
release.
|
||||||
|
|
||||||
The JPEG FAQ (Frequently Asked Questions) article is a useful source of
|
The JPEG FAQ (Frequently Asked Questions) article is a useful source of
|
||||||
general information about JPEG. It is updated constantly and therefore is
|
general information about JPEG. It is updated constantly and therefore is
|
||||||
not included in this distribution. The FAQ is posted every two weeks to
|
not included in this distribution. The FAQ is posted every two weeks to
|
||||||
Usenet newsgroups comp.graphics.misc, news.answers, and other groups.
|
Usenet newsgroups comp.graphics.misc, news.answers, and other groups.
|
||||||
You can always obtain the latest version from the news.answers archive at
|
It is available on the World Wide Web at http://www.faqs.org/faqs/jpeg-faq/
|
||||||
rtfm.mit.edu. By FTP, fetch /pub/usenet/news.answers/jpeg-faq/part1 and
|
and other news.answers archive sites, including the official news.answers
|
||||||
.../part2. If you don't have FTP, send e-mail to mail-server@rtfm.mit.edu
|
archive at rtfm.mit.edu: ftp://rtfm.mit.edu/pub/usenet/news.answers/jpeg-faq/.
|
||||||
|
If you don't have Web or FTP access, send e-mail to mail-server@rtfm.mit.edu
|
||||||
with body
|
with body
|
||||||
send usenet/news.answers/jpeg-faq/part1
|
send usenet/news.answers/jpeg-faq/part1
|
||||||
send usenet/news.answers/jpeg-faq/part2
|
send usenet/news.answers/jpeg-faq/part2
|
||||||
@@ -315,21 +316,20 @@ some of the more popular free and shareware viewers, and tells where to
|
|||||||
obtain them on Internet.
|
obtain them on Internet.
|
||||||
|
|
||||||
If you are on a Unix machine, we highly recommend Jef Poskanzer's free
|
If you are on a Unix machine, we highly recommend Jef Poskanzer's free
|
||||||
PBMPLUS image software, which provides many useful operations on PPM-format
|
PBMPLUS software, which provides many useful operations on PPM-format image
|
||||||
image files. In particular, it can convert PPM images to and from a wide
|
files. In particular, it can convert PPM images to and from a wide range of
|
||||||
range of other formats. You can obtain this package by FTP from ftp.x.org
|
other formats, thus making cjpeg/djpeg considerably more useful. The latest
|
||||||
(contrib/pbmplus*.tar.Z) or ftp.ee.lbl.gov (pbmplus*.tar.Z). There is also
|
version is distributed by the NetPBM group, and is available from numerous
|
||||||
a newer update of this package called NETPBM, available from
|
sites, notably ftp://wuarchive.wustl.edu/graphics/graphics/packages/NetPBM/.
|
||||||
wuarchive.wustl.edu under directory /graphics/graphics/packages/NetPBM/.
|
Unfortunately PBMPLUS/NETPBM is not nearly as portable as the IJG software is;
|
||||||
Unfortunately PBMPLUS/NETPBM is not nearly as portable as the IJG software
|
you are likely to have difficulty making it work on any non-Unix machine.
|
||||||
is; you are likely to have difficulty making it work on any non-Unix machine.
|
|
||||||
|
|
||||||
A different free JPEG implementation, written by the PVRG group at Stanford,
|
A different free JPEG implementation, written by the PVRG group at Stanford,
|
||||||
is available from havefun.stanford.edu in directory pub/jpeg. This program
|
is available from ftp://havefun.stanford.edu/pub/jpeg/. This program
|
||||||
is designed for research and experimentation rather than production use;
|
is designed for research and experimentation rather than production use;
|
||||||
it is slower, harder to use, and less portable than the IJG code, but it
|
it is slower, harder to use, and less portable than the IJG code, but it
|
||||||
is easier to read and modify. Also, the PVRG code supports lossless JPEG,
|
is easier to read and modify. Also, the PVRG code supports lossless JPEG,
|
||||||
which we do not.
|
which we do not. (On the other hand, it doesn't do progressive JPEG.)
|
||||||
|
|
||||||
|
|
||||||
FILE FORMAT WARS
|
FILE FORMAT WARS
|
||||||
@@ -370,14 +370,16 @@ use a proprietary file format!
|
|||||||
TO DO
|
TO DO
|
||||||
=====
|
=====
|
||||||
|
|
||||||
|
The major thrust for v7 will probably be improvement of visual quality.
|
||||||
|
The current method for scaling the quantization tables is known not to be
|
||||||
|
very good at low Q values. We also intend to investigate block boundary
|
||||||
|
smoothing, "poor man's variable quantization", and other means of improving
|
||||||
|
quality-vs-file-size performance without sacrificing compatibility.
|
||||||
|
|
||||||
In future versions, we are considering supporting some of the upcoming JPEG
|
In future versions, we are considering supporting some of the upcoming JPEG
|
||||||
Part 3 extensions --- principally, variable quantization and the SPIFF file
|
Part 3 extensions --- principally, variable quantization and the SPIFF file
|
||||||
format.
|
format.
|
||||||
|
|
||||||
Tuning the software for better behavior at low quality/high compression
|
As always, speeding things up is of great interest.
|
||||||
settings is also of interest. The current method for scaling the
|
|
||||||
quantization tables is known not to be very good at low Q values.
|
|
||||||
|
|
||||||
As always, speeding things up is high on our priority list.
|
|
||||||
|
|
||||||
Please send bug reports, offers of help, etc. to jpeg-info@uunet.uu.net.
|
Please send bug reports, offers of help, etc. to jpeg-info@uunet.uu.net.
|
||||||
|
|||||||
3655
aclocal.m4
vendored
Normal file
3655
aclocal.m4
vendored
Normal file
File diff suppressed because it is too large
Load Diff
71
altui/README.alt
Normal file
71
altui/README.alt
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
Here is an alternate command-line user interface for the IJG JPEG software.
|
||||||
|
It is designed for use under MS-DOS, and may also be useful on other non-Unix
|
||||||
|
operating systems. (For that matter, this code works fine on Unix, but the
|
||||||
|
standard command-line syntax is better on Unix because it is pipe-friendly.)
|
||||||
|
|
||||||
|
With this user interface, cjpeg and djpeg accept multiple input file names
|
||||||
|
on the command line; output file names are generated by substituting
|
||||||
|
appropriate extensions. The user is prompted before any already-existing
|
||||||
|
file will be overwritten. See usage.alt for details.
|
||||||
|
|
||||||
|
Expansion of wild-card file specifications is useful but is not directly
|
||||||
|
provided by this code. Most DOS C compilers have the ability to do wild-card
|
||||||
|
expansion "behind the scenes", and we rely on that feature. On other systems,
|
||||||
|
the shell may do it for you, as is done on Unix.
|
||||||
|
|
||||||
|
Also, a DOS-specific routine is provided to determine available memory;
|
||||||
|
this makes the -maxmemory switch unnecessary except in unusual cases.
|
||||||
|
If you know how to determine available memory on a different system,
|
||||||
|
you can easily add the necessary code. (And please send it along to
|
||||||
|
jpeg-info@uunet.uu.net so we can include it in future releases!)
|
||||||
|
|
||||||
|
|
||||||
|
INSTALLATION
|
||||||
|
============
|
||||||
|
|
||||||
|
You need to have the main IJG JPEG distribution, release 6 or later.
|
||||||
|
Replace the standard cjpeg.c and djpeg.c files with the ones provided here.
|
||||||
|
Then build the software as described in the main distribution's install.doc
|
||||||
|
file, with these exceptions:
|
||||||
|
|
||||||
|
* Define PROGRESS_REPORT in jconfig.h if you want the percent-done display.
|
||||||
|
* Define NO_OVERWRITE_CHECK if you *don't* want overwrite confirmation.
|
||||||
|
* You may ignore the USE_SETMODE and TWO_FILE_COMMANDLINE symbols discussed
|
||||||
|
in install.doc; these files do not use them.
|
||||||
|
* As given, djpeg.c defaults to GIF output (not PPM output as in the standard
|
||||||
|
djpeg.c). If you want something different, modify DEFAULT_FMT.
|
||||||
|
|
||||||
|
You may also need to do something special to enable filename wild-card
|
||||||
|
expansion, assuming your compiler has that capability at all.
|
||||||
|
|
||||||
|
Modify the standard usage.doc file as described in usage.alt. (If you want
|
||||||
|
to use the Unix-style manual pages cjpeg.1 and djpeg.1, better fix them too.)
|
||||||
|
|
||||||
|
|
||||||
|
Here are some specific notes for popular MS-DOS compilers:
|
||||||
|
|
||||||
|
Borland C:
|
||||||
|
Add "-DMSDOS" to CFLAGS to enable use of the DOS memory determination code.
|
||||||
|
Link with the standard library file WILDARGS.OBJ to get wild-card expansion.
|
||||||
|
|
||||||
|
Microsoft C:
|
||||||
|
Add "-DMSDOS" to CFLAGS to enable use of the DOS memory determination code.
|
||||||
|
Link with the standard library file SETARGV.OBJ to get wild-card expansion.
|
||||||
|
In the versions I've used, you must also add /NOE to the linker switches to
|
||||||
|
avoid a duplicate-symbol error from including SETARGV.
|
||||||
|
|
||||||
|
DJGPP (we recommend version 2.0 or later):
|
||||||
|
Add "-DFREE_MEM_ESTIMATE=0" to CFLAGS. Wild-card expansion is automatic.
|
||||||
|
|
||||||
|
|
||||||
|
LEGAL ISSUES
|
||||||
|
============
|
||||||
|
|
||||||
|
This software is copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
|
Terms of distribution and use are the same as for the free IJG JPEG software;
|
||||||
|
see its README file for details.
|
||||||
|
|
||||||
|
The authors make NO WARRANTY or representation, either express or implied,
|
||||||
|
with respect to this software, its quality, accuracy, merchantability, or
|
||||||
|
fitness for a particular purpose. This software is provided "AS IS", and you,
|
||||||
|
its user, assume the entire risk as to its quality and accuracy.
|
||||||
813
altui/cjpeg.c
Normal file
813
altui/cjpeg.c
Normal file
@@ -0,0 +1,813 @@
|
|||||||
|
/*
|
||||||
|
* alternate cjpeg.c
|
||||||
|
*
|
||||||
|
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
|
* This file is part of the Independent JPEG Group's software.
|
||||||
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 6, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
|
* This file contains an alternate user interface for the JPEG compressor.
|
||||||
|
* One or more input files are named on the command line, and output file
|
||||||
|
* names are created by substituting ".jpg" for the input file's extension.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */
|
||||||
|
#include "jversion.h" /* for version message */
|
||||||
|
|
||||||
|
#ifdef USE_CCOMMAND /* command-line reader for Macintosh */
|
||||||
|
#ifdef __MWERKS__
|
||||||
|
#include <SIOUX.h> /* Metrowerks needs this */
|
||||||
|
#include <console.h> /* ... and this */
|
||||||
|
#endif
|
||||||
|
#ifdef THINK_C
|
||||||
|
#include <console.h> /* Think declares it here */
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef PATH_MAX /* ANSI maximum-pathname-length constant */
|
||||||
|
#define PATH_MAX 256
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/* Create the add-on message string table. */
|
||||||
|
|
||||||
|
#define JMESSAGE(code,string) string ,
|
||||||
|
|
||||||
|
static const char * const cdjpeg_message_table[] = {
|
||||||
|
#include "cderror.h"
|
||||||
|
NULL
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: compiler-specific hacks to enable filename wild-card expansion
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER /* Microsoft Visual C++ */
|
||||||
|
/* from setargv.c (setargv.obj) */
|
||||||
|
/* Tested under Visual C++ V6.0, Toolkit 2003, and 2005 Express Edition */
|
||||||
|
int __cdecl _setargv(void) { int __cdecl __setargv(void); return __setargv(); }
|
||||||
|
#endif
|
||||||
|
#ifdef __BORLANDC__ /* Borland C++ */
|
||||||
|
/* from wildargs.c (wildargs.obj) */
|
||||||
|
/* Tested under Borland C++ Compiler 5.5 (win32) */
|
||||||
|
#include <wildargs.h>
|
||||||
|
typedef void _RTLENTRY (* _RTLENTRY _argv_expand_fnc)(char *, _PFN_ADDARG);
|
||||||
|
_argv_expand_fnc _argv_expand_ptr = _expand_wild;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Automatic determination of available memory.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static long default_maxmem; /* saves value determined at startup, or 0 */
|
||||||
|
|
||||||
|
#ifndef FREE_MEM_ESTIMATE /* may be defined from command line */
|
||||||
|
|
||||||
|
#ifdef MSDOS /* For MS-DOS (unless flat-memory model) */
|
||||||
|
|
||||||
|
#include <dos.h> /* for access to intdos() call */
|
||||||
|
|
||||||
|
LOCAL(long)
|
||||||
|
unused_dos_memory (void)
|
||||||
|
/* Obtain total amount of unallocated DOS memory */
|
||||||
|
{
|
||||||
|
union REGS regs;
|
||||||
|
long nparas;
|
||||||
|
|
||||||
|
regs.h.ah = 0x48; /* DOS function Allocate Memory Block */
|
||||||
|
regs.x.bx = 0xFFFF; /* Ask for more memory than DOS can have */
|
||||||
|
(void) intdos(®s, ®s);
|
||||||
|
/* DOS will fail and return # of paragraphs actually available in BX. */
|
||||||
|
nparas = (unsigned int) regs.x.bx;
|
||||||
|
/* Times 16 to convert to bytes. */
|
||||||
|
return nparas << 4;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* The default memory setting is 95% of the available space. */
|
||||||
|
#define FREE_MEM_ESTIMATE ((unused_dos_memory() * 95L) / 100L)
|
||||||
|
|
||||||
|
#endif /* MSDOS */
|
||||||
|
|
||||||
|
#ifdef ATARI /* For Atari ST/STE/TT, Pure C or Turbo C */
|
||||||
|
|
||||||
|
#include <ext.h>
|
||||||
|
|
||||||
|
/* The default memory setting is 90% of the available space. */
|
||||||
|
#define FREE_MEM_ESTIMATE (((long) coreleft() * 90L) / 100L)
|
||||||
|
|
||||||
|
#endif /* ATARI */
|
||||||
|
|
||||||
|
/* Add memory-estimation procedures for other operating systems here,
|
||||||
|
* with appropriate #ifdef's around them.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#endif /* !FREE_MEM_ESTIMATE */
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This routine determines what format the input file is,
|
||||||
|
* and selects the appropriate input-reading module.
|
||||||
|
*
|
||||||
|
* To determine which family of input formats the file belongs to,
|
||||||
|
* we may look only at the first byte of the file, since C does not
|
||||||
|
* guarantee that more than one character can be pushed back with ungetc.
|
||||||
|
* Looking at additional bytes would require one of these approaches:
|
||||||
|
* 1) assume we can fseek() the input file (fails for piped input);
|
||||||
|
* 2) assume we can push back more than one character (works in
|
||||||
|
* some C implementations, but unportable);
|
||||||
|
* 3) provide our own buffering (breaks input readers that want to use
|
||||||
|
* stdio directly, such as the RLE library);
|
||||||
|
* or 4) don't put back the data, and modify the input_init methods to assume
|
||||||
|
* they start reading after the start of file (also breaks RLE library).
|
||||||
|
* #1 is attractive for MS-DOS but is untenable on Unix.
|
||||||
|
*
|
||||||
|
* The most portable solution for file types that can't be identified by their
|
||||||
|
* first byte is to make the user tell us what they are. This is also the
|
||||||
|
* only approach for "raw" file types that contain only arbitrary values.
|
||||||
|
* We presently apply this method for Targa files. Most of the time Targa
|
||||||
|
* files start with 0x00, so we recognize that case. Potentially, however,
|
||||||
|
* a Targa file could start with any byte value (byte 0 is the length of the
|
||||||
|
* seldom-used ID field), so we provide a switch to force Targa input mode.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static boolean is_targa; /* records user -targa switch */
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(cjpeg_source_ptr)
|
||||||
|
select_file_type (j_compress_ptr cinfo, FILE * infile)
|
||||||
|
{
|
||||||
|
int c;
|
||||||
|
|
||||||
|
if (is_targa) {
|
||||||
|
#ifdef TARGA_SUPPORTED
|
||||||
|
return jinit_read_targa(cinfo);
|
||||||
|
#else
|
||||||
|
ERREXIT(cinfo, JERR_TGA_NOTCOMP);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((c = getc(infile)) == EOF)
|
||||||
|
ERREXIT(cinfo, JERR_INPUT_EMPTY);
|
||||||
|
if (ungetc(c, infile) == EOF)
|
||||||
|
ERREXIT(cinfo, JERR_UNGETC_FAILED);
|
||||||
|
|
||||||
|
switch (c) {
|
||||||
|
#ifdef BMP_SUPPORTED
|
||||||
|
case 'B':
|
||||||
|
return jinit_read_bmp(cinfo);
|
||||||
|
#endif
|
||||||
|
#ifdef GIF_SUPPORTED
|
||||||
|
case 'G':
|
||||||
|
return jinit_read_gif(cinfo);
|
||||||
|
#endif
|
||||||
|
#ifdef PPM_SUPPORTED
|
||||||
|
case 'P':
|
||||||
|
return jinit_read_ppm(cinfo);
|
||||||
|
#endif
|
||||||
|
#ifdef RLE_SUPPORTED
|
||||||
|
case 'R':
|
||||||
|
return jinit_read_rle(cinfo);
|
||||||
|
#endif
|
||||||
|
#ifdef TARGA_SUPPORTED
|
||||||
|
case 0x00:
|
||||||
|
return jinit_read_targa(cinfo);
|
||||||
|
#endif
|
||||||
|
default:
|
||||||
|
ERREXIT(cinfo, JERR_UNKNOWN_FORMAT);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return NULL; /* suppress compiler warnings */
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Argument-parsing code.
|
||||||
|
* The switch parser is designed to be useful with DOS-style command line
|
||||||
|
* syntax, ie, intermixed switches and file names, where only the switches
|
||||||
|
* to the left of a given file name affect processing of that file.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
static const char * progname; /* program name for error messages */
|
||||||
|
static char * outfilename; /* for -outfile switch */
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
usage (void)
|
||||||
|
/* complain about bad command line */
|
||||||
|
{
|
||||||
|
fprintf(stderr, "usage: %s [switches] inputfile(s)\n", progname);
|
||||||
|
fprintf(stderr, "List of input files may use wildcards (* and ?)\n");
|
||||||
|
fprintf(stderr, "Output filename is same as input filename, but extension .jpg\n");
|
||||||
|
|
||||||
|
fprintf(stderr, "Switches (names may be abbreviated):\n");
|
||||||
|
fprintf(stderr, " -quality N Compression quality (0..100; 5-95 is useful range)\n");
|
||||||
|
fprintf(stderr, " -grayscale Create monochrome JPEG file\n");
|
||||||
|
#ifdef ENTROPY_OPT_SUPPORTED
|
||||||
|
fprintf(stderr, " -optimize Optimize Huffman table (smaller file, but slow compression)\n");
|
||||||
|
#endif
|
||||||
|
#ifdef C_PROGRESSIVE_SUPPORTED
|
||||||
|
fprintf(stderr, " -progressive Create progressive JPEG file\n");
|
||||||
|
#endif
|
||||||
|
#ifdef TARGA_SUPPORTED
|
||||||
|
fprintf(stderr, " -targa Input file is Targa format (usually not needed)\n");
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, "Switches for advanced users:\n");
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
fprintf(stderr, " -dct int Use integer DCT method%s\n",
|
||||||
|
(JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
fprintf(stderr, " -dct fast Use fast integer DCT (less accurate)%s\n",
|
||||||
|
(JDCT_DEFAULT == JDCT_IFAST ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
fprintf(stderr, " -dct float Use floating-point DCT method%s\n",
|
||||||
|
(JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " -restart N Set restart interval in rows, or in blocks with B\n");
|
||||||
|
#ifdef INPUT_SMOOTHING_SUPPORTED
|
||||||
|
fprintf(stderr, " -smooth N Smooth dithered input (N=1..100 is strength)\n");
|
||||||
|
#endif
|
||||||
|
#ifndef FREE_MEM_ESTIMATE
|
||||||
|
fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " -outfile name Specify name for output file\n");
|
||||||
|
fprintf(stderr, " -verbose or -debug Emit debug output\n");
|
||||||
|
fprintf(stderr, "Switches for wizards:\n");
|
||||||
|
#ifdef C_ARITH_CODING_SUPPORTED
|
||||||
|
fprintf(stderr, " -arithmetic Use arithmetic coding\n");
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " -baseline Force baseline quantization tables\n");
|
||||||
|
fprintf(stderr, " -qtables file Use quantization tables given in file\n");
|
||||||
|
fprintf(stderr, " -qslots N[,...] Set component quantization tables\n");
|
||||||
|
fprintf(stderr, " -sample HxV[,...] Set component sampling factors\n");
|
||||||
|
#ifdef C_MULTISCAN_FILES_SUPPORTED
|
||||||
|
fprintf(stderr, " -scans file Create multi-scan JPEG per script file\n");
|
||||||
|
#endif
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
print_simd_info (FILE * file, char * labelstr, unsigned int simd)
|
||||||
|
{
|
||||||
|
fprintf(file, "%s%s%s%s%s%s\n", labelstr,
|
||||||
|
simd & JSIMD_MMX ? " MMX" : "",
|
||||||
|
simd & JSIMD_3DNOW ? " 3DNow!" : "",
|
||||||
|
simd & JSIMD_SSE ? " SSE" : "",
|
||||||
|
simd & JSIMD_SSE2 ? " SSE2" : "",
|
||||||
|
simd == JSIMD_NONE ? " NONE" : "");
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(int)
|
||||||
|
parse_switches (j_compress_ptr cinfo, int argc, char **argv,
|
||||||
|
int last_file_arg_seen, boolean for_real)
|
||||||
|
/* Parse optional switches.
|
||||||
|
* Returns argv[] index of first file-name argument (== argc if none).
|
||||||
|
* Any file names with indexes <= last_file_arg_seen are ignored;
|
||||||
|
* they have presumably been processed in a previous iteration.
|
||||||
|
* (Pass 0 for last_file_arg_seen on the first or only iteration.)
|
||||||
|
* for_real is FALSE on the first (dummy) pass; we may skip any expensive
|
||||||
|
* processing.
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
int argn;
|
||||||
|
char * arg;
|
||||||
|
int quality; /* -quality parameter */
|
||||||
|
int q_scale_factor; /* scaling percentage for -qtables */
|
||||||
|
boolean force_baseline;
|
||||||
|
boolean simple_progressive;
|
||||||
|
char * qtablefile = NULL; /* saves -qtables filename if any */
|
||||||
|
char * qslotsarg = NULL; /* saves -qslots parm if any */
|
||||||
|
char * samplearg = NULL; /* saves -sample parm if any */
|
||||||
|
char * scansarg = NULL; /* saves -scans parm if any */
|
||||||
|
|
||||||
|
/* Set up default JPEG parameters. */
|
||||||
|
/* Note that default -quality level need not, and does not,
|
||||||
|
* match the default scaling for an explicit -qtables argument.
|
||||||
|
*/
|
||||||
|
quality = 75; /* default -quality value */
|
||||||
|
q_scale_factor = 100; /* default to no scaling for -qtables */
|
||||||
|
force_baseline = FALSE; /* by default, allow 16-bit quantizers */
|
||||||
|
simple_progressive = FALSE;
|
||||||
|
is_targa = FALSE;
|
||||||
|
outfilename = NULL;
|
||||||
|
cinfo->err->trace_level = 0;
|
||||||
|
if (default_maxmem > 0) /* override library's default value */
|
||||||
|
cinfo->mem->max_memory_to_use = default_maxmem;
|
||||||
|
|
||||||
|
/* Scan command line options, adjust parameters */
|
||||||
|
|
||||||
|
for (argn = 1; argn < argc; argn++) {
|
||||||
|
arg = argv[argn];
|
||||||
|
if (*arg != '-') {
|
||||||
|
/* Not a switch, must be a file name argument */
|
||||||
|
if (argn <= last_file_arg_seen) {
|
||||||
|
outfilename = NULL; /* -outfile applies to just one input file */
|
||||||
|
continue; /* ignore this name if previously processed */
|
||||||
|
}
|
||||||
|
break; /* else done parsing switches */
|
||||||
|
}
|
||||||
|
arg++; /* advance past switch marker character */
|
||||||
|
|
||||||
|
if (keymatch(arg, "arithmetic", 1)) {
|
||||||
|
/* Use arithmetic coding. */
|
||||||
|
#ifdef C_ARITH_CODING_SUPPORTED
|
||||||
|
cinfo->arith_code = TRUE;
|
||||||
|
#else
|
||||||
|
fprintf(stderr, "%s: sorry, arithmetic coding not supported\n",
|
||||||
|
progname);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "baseline", 1)) {
|
||||||
|
/* Force baseline-compatible output (8-bit quantizer values). */
|
||||||
|
force_baseline = TRUE;
|
||||||
|
|
||||||
|
#ifndef JSIMD_MASKFUNC_NOT_SUPPORTED
|
||||||
|
} else if (keymatch(arg, "nosimd" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_ALL);
|
||||||
|
} else if (keymatch(arg, "nommx" , 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_MMX);
|
||||||
|
} else if (keymatch(arg, "no3dnow", 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_3DNOW);
|
||||||
|
} else if (keymatch(arg, "nosse" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE);
|
||||||
|
} else if (keymatch(arg, "nosse2" , 6)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE2);
|
||||||
|
#endif /* !JSIMD_MASKFUNC_NOT_SUPPORTED */
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "dct", 2)) {
|
||||||
|
/* Select DCT algorithm. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (keymatch(argv[argn], "int", 1)) {
|
||||||
|
cinfo->dct_method = JDCT_ISLOW;
|
||||||
|
} else if (keymatch(argv[argn], "fast", 2)) {
|
||||||
|
cinfo->dct_method = JDCT_IFAST;
|
||||||
|
} else if (keymatch(argv[argn], "float", 2)) {
|
||||||
|
cinfo->dct_method = JDCT_FLOAT;
|
||||||
|
} else
|
||||||
|
usage();
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) {
|
||||||
|
/* Enable debug printouts. */
|
||||||
|
/* On first -d, print version identification */
|
||||||
|
static boolean printed_version = FALSE;
|
||||||
|
|
||||||
|
if (! printed_version) {
|
||||||
|
fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",
|
||||||
|
JVERSION, JCOPYRIGHT);
|
||||||
|
fprintf(stderr,
|
||||||
|
"\nx86 SIMD extension for IJG JPEG library, version %s\n\n",
|
||||||
|
JPEG_SIMDEXT_VER_STR);
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "SIMD instructions supported by the system :",
|
||||||
|
jpeg_simd_support(NULL));
|
||||||
|
|
||||||
|
fprintf(stderr, "\n === SIMD Operation Modes ===\n");
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Accurate integer DCT (-dct int) :",
|
||||||
|
jpeg_simd_forward_dct(cinfo, JDCT_ISLOW));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Fast integer DCT (-dct fast) :",
|
||||||
|
jpeg_simd_forward_dct(cinfo, JDCT_IFAST));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Floating-point DCT (-dct float) :",
|
||||||
|
jpeg_simd_forward_dct(cinfo, JDCT_FLOAT));
|
||||||
|
#endif
|
||||||
|
print_simd_info(stderr, "Downsampling (-sample 2x2 or 2x1) :",
|
||||||
|
jpeg_simd_downsampler(cinfo));
|
||||||
|
print_simd_info(stderr, "Colorspace conversion (RGB->YCbCr) :",
|
||||||
|
jpeg_simd_color_converter(cinfo));
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
printed_version = TRUE;
|
||||||
|
}
|
||||||
|
cinfo->err->trace_level++;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "grayscale", 2) || keymatch(arg, "greyscale",2)) {
|
||||||
|
/* Force a monochrome JPEG file to be generated. */
|
||||||
|
jpeg_set_colorspace(cinfo, JCS_GRAYSCALE);
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "maxmemory", 3)) {
|
||||||
|
/* Maximum memory in Kb (or Mb with 'm'). */
|
||||||
|
long lval;
|
||||||
|
char ch = 'x';
|
||||||
|
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%ld%c", &lval, &ch) < 1)
|
||||||
|
usage();
|
||||||
|
if (ch == 'm' || ch == 'M')
|
||||||
|
lval *= 1000L;
|
||||||
|
cinfo->mem->max_memory_to_use = lval * 1000L;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "optimize", 1) || keymatch(arg, "optimise", 1)) {
|
||||||
|
/* Enable entropy parm optimization. */
|
||||||
|
#ifdef ENTROPY_OPT_SUPPORTED
|
||||||
|
cinfo->optimize_coding = TRUE;
|
||||||
|
#else
|
||||||
|
fprintf(stderr, "%s: sorry, entropy optimization was not compiled\n",
|
||||||
|
progname);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "outfile", 4)) {
|
||||||
|
/* Set output file name. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
outfilename = argv[argn]; /* save it away for later use */
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "progressive", 1)) {
|
||||||
|
/* Select simple progressive mode. */
|
||||||
|
#ifdef C_PROGRESSIVE_SUPPORTED
|
||||||
|
simple_progressive = TRUE;
|
||||||
|
/* We must postpone execution until num_components is known. */
|
||||||
|
#else
|
||||||
|
fprintf(stderr, "%s: sorry, progressive output was not compiled\n",
|
||||||
|
progname);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "quality", 1)) {
|
||||||
|
/* Quality factor (quantization table scaling factor). */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%d", &quality) != 1)
|
||||||
|
usage();
|
||||||
|
/* Change scale factor in case -qtables is present. */
|
||||||
|
q_scale_factor = jpeg_quality_scaling(quality);
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "qslots", 2)) {
|
||||||
|
/* Quantization table slot numbers. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
qslotsarg = argv[argn];
|
||||||
|
/* Must delay setting qslots until after we have processed any
|
||||||
|
* colorspace-determining switches, since jpeg_set_colorspace sets
|
||||||
|
* default quant table numbers.
|
||||||
|
*/
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "qtables", 2)) {
|
||||||
|
/* Quantization tables fetched from file. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
qtablefile = argv[argn];
|
||||||
|
/* We postpone actually reading the file in case -quality comes later. */
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "restart", 1)) {
|
||||||
|
/* Restart interval in MCU rows (or in MCUs with 'b'). */
|
||||||
|
long lval;
|
||||||
|
char ch = 'x';
|
||||||
|
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%ld%c", &lval, &ch) < 1)
|
||||||
|
usage();
|
||||||
|
if (lval < 0 || lval > 65535L)
|
||||||
|
usage();
|
||||||
|
if (ch == 'b' || ch == 'B') {
|
||||||
|
cinfo->restart_interval = (unsigned int) lval;
|
||||||
|
cinfo->restart_in_rows = 0; /* else prior '-restart n' overrides me */
|
||||||
|
} else {
|
||||||
|
cinfo->restart_in_rows = (int) lval;
|
||||||
|
/* restart_interval will be computed during startup */
|
||||||
|
}
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "sample", 2)) {
|
||||||
|
/* Set sampling factors. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
samplearg = argv[argn];
|
||||||
|
/* Must delay setting sample factors until after we have processed any
|
||||||
|
* colorspace-determining switches, since jpeg_set_colorspace sets
|
||||||
|
* default sampling factors.
|
||||||
|
*/
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "scans", 2)) {
|
||||||
|
/* Set scan script. */
|
||||||
|
#ifdef C_MULTISCAN_FILES_SUPPORTED
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
scansarg = argv[argn];
|
||||||
|
/* We must postpone reading the file in case -progressive appears. */
|
||||||
|
#else
|
||||||
|
fprintf(stderr, "%s: sorry, multi-scan output was not compiled\n",
|
||||||
|
progname);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "smooth", 2)) {
|
||||||
|
/* Set input smoothing factor. */
|
||||||
|
int val;
|
||||||
|
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%d", &val) != 1)
|
||||||
|
usage();
|
||||||
|
if (val < 0 || val > 100)
|
||||||
|
usage();
|
||||||
|
cinfo->smoothing_factor = val;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "targa", 1)) {
|
||||||
|
/* Input file is Targa format. */
|
||||||
|
is_targa = TRUE;
|
||||||
|
|
||||||
|
} else {
|
||||||
|
usage(); /* bogus switch */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Post-switch-scanning cleanup */
|
||||||
|
|
||||||
|
if (for_real) {
|
||||||
|
|
||||||
|
/* Set quantization tables for selected quality. */
|
||||||
|
/* Some or all may be overridden if -qtables is present. */
|
||||||
|
jpeg_set_quality(cinfo, quality, force_baseline);
|
||||||
|
|
||||||
|
if (qtablefile != NULL) /* process -qtables if it was present */
|
||||||
|
if (! read_quant_tables(cinfo, qtablefile,
|
||||||
|
q_scale_factor, force_baseline))
|
||||||
|
usage();
|
||||||
|
|
||||||
|
if (qslotsarg != NULL) /* process -qslots if it was present */
|
||||||
|
if (! set_quant_slots(cinfo, qslotsarg))
|
||||||
|
usage();
|
||||||
|
|
||||||
|
if (samplearg != NULL) /* process -sample if it was present */
|
||||||
|
if (! set_sample_factors(cinfo, samplearg))
|
||||||
|
usage();
|
||||||
|
|
||||||
|
#ifdef C_PROGRESSIVE_SUPPORTED
|
||||||
|
if (simple_progressive) /* process -progressive; -scans can override */
|
||||||
|
jpeg_simple_progression(cinfo);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef C_MULTISCAN_FILES_SUPPORTED
|
||||||
|
if (scansarg != NULL) /* process -scans if it was present */
|
||||||
|
if (! read_scan_script(cinfo, scansarg))
|
||||||
|
usage();
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
return argn; /* return index of next arg (file name) */
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check for overwrite of an existing file; clear it with user
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef NO_OVERWRITE_CHECK
|
||||||
|
|
||||||
|
LOCAL(boolean)
|
||||||
|
is_write_ok (char * outfname)
|
||||||
|
{
|
||||||
|
FILE * ofile;
|
||||||
|
int ch;
|
||||||
|
|
||||||
|
ofile = fopen(outfname, READ_BINARY);
|
||||||
|
if (ofile == NULL)
|
||||||
|
return TRUE; /* not present */
|
||||||
|
fclose(ofile); /* oops, it is present */
|
||||||
|
|
||||||
|
for (;;) {
|
||||||
|
fprintf(stderr, "%s already exists, overwrite it? [y/n] ",
|
||||||
|
outfname);
|
||||||
|
fflush(stderr);
|
||||||
|
ch = getc(stdin);
|
||||||
|
if (ch != '\n') /* flush rest of line */
|
||||||
|
while (getc(stdin) != '\n')
|
||||||
|
/* nothing */;
|
||||||
|
|
||||||
|
switch (ch) {
|
||||||
|
case 'Y':
|
||||||
|
case 'y':
|
||||||
|
return TRUE;
|
||||||
|
case 'N':
|
||||||
|
case 'n':
|
||||||
|
return FALSE;
|
||||||
|
/* otherwise, ask again */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Process a single input file name, and return its index in argv[].
|
||||||
|
* File names at or to left of old_file_index have been processed already.
|
||||||
|
*/
|
||||||
|
|
||||||
|
LOCAL(int)
|
||||||
|
process_one_file (int argc, char **argv, int old_file_index)
|
||||||
|
{
|
||||||
|
struct jpeg_compress_struct cinfo;
|
||||||
|
struct jpeg_error_mgr jerr;
|
||||||
|
char *infilename;
|
||||||
|
char workfilename[PATH_MAX];
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
struct cdjpeg_progress_mgr progress;
|
||||||
|
#endif
|
||||||
|
int file_index;
|
||||||
|
cjpeg_source_ptr src_mgr;
|
||||||
|
FILE * input_file = NULL;
|
||||||
|
FILE * output_file = NULL;
|
||||||
|
JDIMENSION num_scanlines;
|
||||||
|
|
||||||
|
/* Initialize the JPEG compression object with default error handling. */
|
||||||
|
cinfo.err = jpeg_std_error(&jerr);
|
||||||
|
jpeg_create_compress(&cinfo);
|
||||||
|
/* Add some application-specific error messages (from cderror.h) */
|
||||||
|
jerr.addon_message_table = cdjpeg_message_table;
|
||||||
|
jerr.first_addon_message = JMSG_FIRSTADDONCODE;
|
||||||
|
jerr.last_addon_message = JMSG_LASTADDONCODE;
|
||||||
|
|
||||||
|
/* Now safe to enable signal catcher. */
|
||||||
|
#ifdef NEED_SIGNAL_CATCHER
|
||||||
|
enable_signal_catcher((j_common_ptr) &cinfo);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Initialize JPEG parameters.
|
||||||
|
* Much of this may be overridden later.
|
||||||
|
* In particular, we don't yet know the input file's color space,
|
||||||
|
* but we need to provide some value for jpeg_set_defaults() to work.
|
||||||
|
*/
|
||||||
|
|
||||||
|
cinfo.in_color_space = JCS_RGB; /* arbitrary guess */
|
||||||
|
jpeg_set_defaults(&cinfo);
|
||||||
|
|
||||||
|
/* Scan command line to find next file name.
|
||||||
|
* It is convenient to use just one switch-parsing routine, but the switch
|
||||||
|
* values read here are ignored; we will rescan the switches after opening
|
||||||
|
* the input file.
|
||||||
|
*/
|
||||||
|
|
||||||
|
file_index = parse_switches(&cinfo, argc, argv, old_file_index, FALSE);
|
||||||
|
if (file_index >= argc) {
|
||||||
|
fprintf(stderr, "%s: missing input file name\n", progname);
|
||||||
|
usage();
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Open the input file. */
|
||||||
|
infilename = argv[file_index];
|
||||||
|
if ((input_file = fopen(infilename, READ_BINARY)) == NULL) {
|
||||||
|
fprintf(stderr, "%s: can't open %s\n", progname, infilename);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
start_progress_monitor((j_common_ptr) &cinfo, &progress);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Figure out the input file format, and set up to read it. */
|
||||||
|
src_mgr = select_file_type(&cinfo, input_file);
|
||||||
|
src_mgr->input_file = input_file;
|
||||||
|
|
||||||
|
/* Read the input file header to obtain file size & colorspace. */
|
||||||
|
(*src_mgr->start_input) (&cinfo, src_mgr);
|
||||||
|
|
||||||
|
/* Now that we know input colorspace, fix colorspace-dependent defaults */
|
||||||
|
jpeg_default_colorspace(&cinfo);
|
||||||
|
|
||||||
|
/* Adjust default compression parameters by re-parsing the options */
|
||||||
|
file_index = parse_switches(&cinfo, argc, argv, old_file_index, TRUE);
|
||||||
|
|
||||||
|
/* If user didn't supply -outfile switch, select output file name. */
|
||||||
|
if (outfilename == NULL) {
|
||||||
|
int i;
|
||||||
|
|
||||||
|
outfilename = workfilename;
|
||||||
|
/* Make outfilename be infilename with .jpg substituted for extension */
|
||||||
|
strcpy(outfilename, infilename);
|
||||||
|
for (i = strlen(outfilename)-1; i >= 0; i--) {
|
||||||
|
switch (outfilename[i]) {
|
||||||
|
case ':':
|
||||||
|
case '/':
|
||||||
|
case '\\':
|
||||||
|
i = 0; /* stop scanning */
|
||||||
|
break;
|
||||||
|
case '.':
|
||||||
|
outfilename[i] = '\0'; /* lop off existing extension */
|
||||||
|
i = 0; /* stop scanning */
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
break; /* keep scanning */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
strcat(outfilename, ".jpg");
|
||||||
|
}
|
||||||
|
|
||||||
|
fprintf(stderr, "Compressing %s => %s\n", infilename, outfilename);
|
||||||
|
#ifndef NO_OVERWRITE_CHECK
|
||||||
|
if (! is_write_ok(outfilename))
|
||||||
|
goto fail;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Open the output file. */
|
||||||
|
if ((output_file = fopen(outfilename, WRITE_BINARY)) == NULL) {
|
||||||
|
fprintf(stderr, "%s: can't create %s\n", progname, outfilename);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Specify data destination for compression */
|
||||||
|
jpeg_stdio_dest(&cinfo, output_file);
|
||||||
|
|
||||||
|
/* Start compressor */
|
||||||
|
jpeg_start_compress(&cinfo, TRUE);
|
||||||
|
|
||||||
|
/* Process data */
|
||||||
|
while (cinfo.next_scanline < cinfo.image_height) {
|
||||||
|
num_scanlines = (*src_mgr->get_pixel_rows) (&cinfo, src_mgr);
|
||||||
|
(void) jpeg_write_scanlines(&cinfo, src_mgr->buffer, num_scanlines);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Finish compression and release memory */
|
||||||
|
(*src_mgr->finish_input) (&cinfo, src_mgr);
|
||||||
|
jpeg_finish_compress(&cinfo);
|
||||||
|
|
||||||
|
/* Clean up and exit */
|
||||||
|
fail:
|
||||||
|
jpeg_destroy_compress(&cinfo);
|
||||||
|
|
||||||
|
if (input_file != NULL) fclose(input_file);
|
||||||
|
if (output_file != NULL) fclose(output_file);
|
||||||
|
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
end_progress_monitor((j_common_ptr) &cinfo);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Disable signal catcher. */
|
||||||
|
#ifdef NEED_SIGNAL_CATCHER
|
||||||
|
enable_signal_catcher((j_common_ptr) NULL);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return file_index;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The main program.
|
||||||
|
*/
|
||||||
|
|
||||||
|
int
|
||||||
|
main (int argc, char **argv)
|
||||||
|
{
|
||||||
|
int file_index;
|
||||||
|
|
||||||
|
/* On Mac, fetch a command line. */
|
||||||
|
#ifdef USE_CCOMMAND
|
||||||
|
argc = ccommand(&argv);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef MSDOS
|
||||||
|
progname = "cjpeg"; /* DOS tends to be too verbose about argv[0] */
|
||||||
|
#else
|
||||||
|
progname = argv[0];
|
||||||
|
if (progname == NULL || progname[0] == 0)
|
||||||
|
progname = "cjpeg"; /* in case C library doesn't provide it */
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* The default maxmem must be computed only once at program startup,
|
||||||
|
* since releasing memory with free() won't give it back to the OS.
|
||||||
|
*/
|
||||||
|
#ifdef FREE_MEM_ESTIMATE
|
||||||
|
default_maxmem = FREE_MEM_ESTIMATE;
|
||||||
|
#else
|
||||||
|
default_maxmem = 0;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Scan command line, parse switches and locate input file names */
|
||||||
|
|
||||||
|
if (argc < 2)
|
||||||
|
usage(); /* nothing on the command line?? */
|
||||||
|
|
||||||
|
file_index = 0;
|
||||||
|
|
||||||
|
while (file_index < argc-1)
|
||||||
|
file_index = process_one_file(argc, argv, file_index);
|
||||||
|
|
||||||
|
/* All done. */
|
||||||
|
exit(EXIT_SUCCESS);
|
||||||
|
return 0; /* suppress no-return-value warnings */
|
||||||
|
}
|
||||||
836
altui/djpeg.c
Normal file
836
altui/djpeg.c
Normal file
@@ -0,0 +1,836 @@
|
|||||||
|
/*
|
||||||
|
* alternate djpeg.c
|
||||||
|
*
|
||||||
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
|
* This file is part of the Independent JPEG Group's software.
|
||||||
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 6, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
|
* This file contains an alternate user interface for the JPEG decompressor.
|
||||||
|
* One or more input files are named on the command line, and output file
|
||||||
|
* names are created by substituting an appropriate extension.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "cdjpeg.h" /* Common decls for cjpeg/djpeg applications */
|
||||||
|
#include "jversion.h" /* for version message */
|
||||||
|
|
||||||
|
#include <ctype.h> /* to declare isprint() */
|
||||||
|
|
||||||
|
#ifdef USE_CCOMMAND /* command-line reader for Macintosh */
|
||||||
|
#ifdef __MWERKS__
|
||||||
|
#include <SIOUX.h> /* Metrowerks needs this */
|
||||||
|
#include <console.h> /* ... and this */
|
||||||
|
#endif
|
||||||
|
#ifdef THINK_C
|
||||||
|
#include <console.h> /* Think declares it here */
|
||||||
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifndef PATH_MAX /* ANSI maximum-pathname-length constant */
|
||||||
|
#define PATH_MAX 256
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/* Create the add-on message string table. */
|
||||||
|
|
||||||
|
#define JMESSAGE(code,string) string ,
|
||||||
|
|
||||||
|
static const char * const cdjpeg_message_table[] = {
|
||||||
|
#include "cderror.h"
|
||||||
|
NULL
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: compiler-specific hacks to enable filename wild-card expansion
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER /* Microsoft Visual C++ */
|
||||||
|
/* from setargv.c (setargv.obj) */
|
||||||
|
/* Tested under Visual C++ V6.0, Toolkit 2003, and 2005 Express Edition */
|
||||||
|
int __cdecl _setargv(void) { int __cdecl __setargv(void); return __setargv(); }
|
||||||
|
#endif
|
||||||
|
#ifdef __BORLANDC__ /* Borland C++ */
|
||||||
|
/* from wildargs.c (wildargs.obj) */
|
||||||
|
/* Tested under Borland C++ Compiler 5.5 (win32) */
|
||||||
|
#include <wildargs.h>
|
||||||
|
typedef void _RTLENTRY (* _RTLENTRY _argv_expand_fnc)(char *, _PFN_ADDARG);
|
||||||
|
_argv_expand_fnc _argv_expand_ptr = _expand_wild;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Automatic determination of available memory.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static long default_maxmem; /* saves value determined at startup, or 0 */
|
||||||
|
|
||||||
|
#ifndef FREE_MEM_ESTIMATE /* may be defined from command line */
|
||||||
|
|
||||||
|
#ifdef MSDOS /* For MS-DOS (unless flat-memory model) */
|
||||||
|
|
||||||
|
#include <dos.h> /* for access to intdos() call */
|
||||||
|
|
||||||
|
LOCAL(long)
|
||||||
|
unused_dos_memory (void)
|
||||||
|
/* Obtain total amount of unallocated DOS memory */
|
||||||
|
{
|
||||||
|
union REGS regs;
|
||||||
|
long nparas;
|
||||||
|
|
||||||
|
regs.h.ah = 0x48; /* DOS function Allocate Memory Block */
|
||||||
|
regs.x.bx = 0xFFFF; /* Ask for more memory than DOS can have */
|
||||||
|
(void) intdos(®s, ®s);
|
||||||
|
/* DOS will fail and return # of paragraphs actually available in BX. */
|
||||||
|
nparas = (unsigned int) regs.x.bx;
|
||||||
|
/* Times 16 to convert to bytes. */
|
||||||
|
return nparas << 4;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* The default memory setting is 95% of the available space. */
|
||||||
|
#define FREE_MEM_ESTIMATE ((unused_dos_memory() * 95L) / 100L)
|
||||||
|
|
||||||
|
#endif /* MSDOS */
|
||||||
|
|
||||||
|
#ifdef ATARI /* For Atari ST/STE/TT, Pure C or Turbo C */
|
||||||
|
|
||||||
|
#include <ext.h>
|
||||||
|
|
||||||
|
/* The default memory setting is 90% of the available space. */
|
||||||
|
#define FREE_MEM_ESTIMATE (((long) coreleft() * 90L) / 100L)
|
||||||
|
|
||||||
|
#endif /* ATARI */
|
||||||
|
|
||||||
|
/* Add memory-estimation procedures for other operating systems here,
|
||||||
|
* with appropriate #ifdef's around them.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#endif /* !FREE_MEM_ESTIMATE */
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This list defines the known output image formats
|
||||||
|
* (not all of which need be supported by a given version).
|
||||||
|
* You can change the default output format by defining DEFAULT_FMT;
|
||||||
|
* indeed, you had better do so if you undefine PPM_SUPPORTED.
|
||||||
|
*/
|
||||||
|
|
||||||
|
typedef enum {
|
||||||
|
FMT_BMP, /* BMP format (Windows flavor) */
|
||||||
|
FMT_GIF, /* GIF format */
|
||||||
|
FMT_OS2, /* BMP format (OS/2 flavor) */
|
||||||
|
FMT_PPM, /* PPM/PGM (PBMPLUS formats) */
|
||||||
|
FMT_RLE, /* RLE format */
|
||||||
|
FMT_TARGA, /* Targa format */
|
||||||
|
FMT_TIFF /* TIFF format */
|
||||||
|
} IMAGE_FORMATS;
|
||||||
|
|
||||||
|
#ifndef DEFAULT_FMT /* so can override from CFLAGS in Makefile */
|
||||||
|
#define DEFAULT_FMT FMT_GIF
|
||||||
|
#endif
|
||||||
|
|
||||||
|
static IMAGE_FORMATS requested_fmt;
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Argument-parsing code.
|
||||||
|
* The switch parser is designed to be useful with DOS-style command line
|
||||||
|
* syntax, ie, intermixed switches and file names, where only the switches
|
||||||
|
* to the left of a given file name affect processing of that file.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
static const char * progname; /* program name for error messages */
|
||||||
|
static char * outfilename; /* for -outfile switch */
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
usage (void)
|
||||||
|
/* complain about bad command line */
|
||||||
|
{
|
||||||
|
fprintf(stderr, "usage: %s [switches] inputfile(s)\n", progname);
|
||||||
|
fprintf(stderr, "List of input files may use wildcards (* and ?)\n");
|
||||||
|
fprintf(stderr, "Output filename is same as input filename except for extension\n");
|
||||||
|
|
||||||
|
fprintf(stderr, "Switches (names may be abbreviated):\n");
|
||||||
|
fprintf(stderr, " -colors N Reduce image to no more than N colors\n");
|
||||||
|
fprintf(stderr, " -fast Fast, low-quality processing\n");
|
||||||
|
fprintf(stderr, " -grayscale Force grayscale output\n");
|
||||||
|
#ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
fprintf(stderr, " -scale M/N Scale output image by fraction M/N, eg, 1/8\n");
|
||||||
|
#endif
|
||||||
|
#ifdef BMP_SUPPORTED
|
||||||
|
fprintf(stderr, " -bmp Select BMP output format (Windows style)%s\n",
|
||||||
|
(DEFAULT_FMT == FMT_BMP ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef GIF_SUPPORTED
|
||||||
|
fprintf(stderr, " -gif Select GIF output format%s\n",
|
||||||
|
(DEFAULT_FMT == FMT_GIF ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef BMP_SUPPORTED
|
||||||
|
fprintf(stderr, " -os2 Select BMP output format (OS/2 style)%s\n",
|
||||||
|
(DEFAULT_FMT == FMT_OS2 ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef PPM_SUPPORTED
|
||||||
|
fprintf(stderr, " -pnm Select PBMPLUS (PPM/PGM) output format%s\n",
|
||||||
|
(DEFAULT_FMT == FMT_PPM ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef RLE_SUPPORTED
|
||||||
|
fprintf(stderr, " -rle Select Utah RLE output format%s\n",
|
||||||
|
(DEFAULT_FMT == FMT_RLE ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef TARGA_SUPPORTED
|
||||||
|
fprintf(stderr, " -targa Select Targa output format%s\n",
|
||||||
|
(DEFAULT_FMT == FMT_TARGA ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, "Switches for advanced users:\n");
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
fprintf(stderr, " -dct int Use integer DCT method%s\n",
|
||||||
|
(JDCT_DEFAULT == JDCT_ISLOW ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
fprintf(stderr, " -dct fast Use fast integer DCT (less accurate)%s\n",
|
||||||
|
(JDCT_DEFAULT == JDCT_IFAST ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
fprintf(stderr, " -dct float Use floating-point DCT method%s\n",
|
||||||
|
(JDCT_DEFAULT == JDCT_FLOAT ? " (default)" : ""));
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " -dither fs Use F-S dithering (default)\n");
|
||||||
|
fprintf(stderr, " -dither none Don't use dithering in quantization\n");
|
||||||
|
fprintf(stderr, " -dither ordered Use ordered dither (medium speed, quality)\n");
|
||||||
|
#ifdef QUANT_2PASS_SUPPORTED
|
||||||
|
fprintf(stderr, " -map FILE Map to colors used in named image file\n");
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " -nosmooth Don't use high-quality upsampling\n");
|
||||||
|
#ifdef QUANT_1PASS_SUPPORTED
|
||||||
|
fprintf(stderr, " -onepass Use 1-pass quantization (fast, low quality)\n");
|
||||||
|
#endif
|
||||||
|
#ifndef FREE_MEM_ESTIMATE
|
||||||
|
fprintf(stderr, " -maxmemory N Maximum memory to use (in kbytes)\n");
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " -outfile name Specify name for output file\n");
|
||||||
|
fprintf(stderr, " -verbose or -debug Emit debug output\n");
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
print_simd_info (FILE * file, char * labelstr, unsigned int simd)
|
||||||
|
{
|
||||||
|
fprintf(file, "%s%s%s%s%s%s\n", labelstr,
|
||||||
|
simd & JSIMD_MMX ? " MMX" : "",
|
||||||
|
simd & JSIMD_3DNOW ? " 3DNow!" : "",
|
||||||
|
simd & JSIMD_SSE ? " SSE" : "",
|
||||||
|
simd & JSIMD_SSE2 ? " SSE2" : "",
|
||||||
|
simd == JSIMD_NONE ? " NONE" : "");
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(int)
|
||||||
|
parse_switches (j_decompress_ptr cinfo, int argc, char **argv,
|
||||||
|
int last_file_arg_seen, boolean for_real)
|
||||||
|
/* Parse optional switches.
|
||||||
|
* Returns argv[] index of first file-name argument (== argc if none).
|
||||||
|
* Any file names with indexes <= last_file_arg_seen are ignored;
|
||||||
|
* they have presumably been processed in a previous iteration.
|
||||||
|
* (Pass 0 for last_file_arg_seen on the first or only iteration.)
|
||||||
|
* for_real is FALSE on the first (dummy) pass; we may skip any expensive
|
||||||
|
* processing.
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
int argn;
|
||||||
|
char * arg;
|
||||||
|
|
||||||
|
/* Set up default JPEG parameters. */
|
||||||
|
requested_fmt = DEFAULT_FMT; /* set default output file format */
|
||||||
|
outfilename = NULL;
|
||||||
|
cinfo->err->trace_level = 0;
|
||||||
|
if (default_maxmem > 0) /* override library's default value */
|
||||||
|
cinfo->mem->max_memory_to_use = default_maxmem;
|
||||||
|
|
||||||
|
/* Scan command line options, adjust parameters */
|
||||||
|
|
||||||
|
for (argn = 1; argn < argc; argn++) {
|
||||||
|
arg = argv[argn];
|
||||||
|
if (*arg != '-') {
|
||||||
|
/* Not a switch, must be a file name argument */
|
||||||
|
if (argn <= last_file_arg_seen) {
|
||||||
|
outfilename = NULL; /* -outfile applies to just one input file */
|
||||||
|
continue; /* ignore this name if previously processed */
|
||||||
|
}
|
||||||
|
break; /* else done parsing switches */
|
||||||
|
}
|
||||||
|
arg++; /* advance past switch marker character */
|
||||||
|
|
||||||
|
if (keymatch(arg, "bmp", 1)) {
|
||||||
|
/* BMP output format. */
|
||||||
|
requested_fmt = FMT_BMP;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "colors", 1) || keymatch(arg, "colours", 1) ||
|
||||||
|
keymatch(arg, "quantize", 1) || keymatch(arg, "quantise", 1)) {
|
||||||
|
/* Do color quantization. */
|
||||||
|
int val;
|
||||||
|
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%d", &val) != 1)
|
||||||
|
usage();
|
||||||
|
cinfo->desired_number_of_colors = val;
|
||||||
|
cinfo->quantize_colors = TRUE;
|
||||||
|
|
||||||
|
#ifndef JSIMD_MASKFUNC_NOT_SUPPORTED
|
||||||
|
} else if (keymatch(arg, "nosimd" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_ALL);
|
||||||
|
} else if (keymatch(arg, "nommx" , 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_MMX);
|
||||||
|
} else if (keymatch(arg, "no3dnow", 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_3DNOW);
|
||||||
|
} else if (keymatch(arg, "nosse" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE);
|
||||||
|
} else if (keymatch(arg, "nosse2" , 6)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE2);
|
||||||
|
#endif /* !JSIMD_MASKFUNC_NOT_SUPPORTED */
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "dct", 2)) {
|
||||||
|
/* Select IDCT algorithm. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (keymatch(argv[argn], "int", 1)) {
|
||||||
|
cinfo->dct_method = JDCT_ISLOW;
|
||||||
|
} else if (keymatch(argv[argn], "fast", 2)) {
|
||||||
|
cinfo->dct_method = JDCT_IFAST;
|
||||||
|
} else if (keymatch(argv[argn], "float", 2)) {
|
||||||
|
cinfo->dct_method = JDCT_FLOAT;
|
||||||
|
} else
|
||||||
|
usage();
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "dither", 2)) {
|
||||||
|
/* Select dithering algorithm. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (keymatch(argv[argn], "fs", 2)) {
|
||||||
|
cinfo->dither_mode = JDITHER_FS;
|
||||||
|
} else if (keymatch(argv[argn], "none", 2)) {
|
||||||
|
cinfo->dither_mode = JDITHER_NONE;
|
||||||
|
} else if (keymatch(argv[argn], "ordered", 2)) {
|
||||||
|
cinfo->dither_mode = JDITHER_ORDERED;
|
||||||
|
} else
|
||||||
|
usage();
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "debug", 1) || keymatch(arg, "verbose", 1)) {
|
||||||
|
/* Enable debug printouts. */
|
||||||
|
/* On first -d, print version identification */
|
||||||
|
static boolean printed_version = FALSE;
|
||||||
|
|
||||||
|
if (! printed_version) {
|
||||||
|
fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",
|
||||||
|
JVERSION, JCOPYRIGHT);
|
||||||
|
fprintf(stderr,
|
||||||
|
"\nx86 SIMD extension for IJG JPEG library, version %s\n\n",
|
||||||
|
JPEG_SIMDEXT_VER_STR);
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "SIMD instructions supported by the system :",
|
||||||
|
jpeg_simd_support(NULL));
|
||||||
|
|
||||||
|
fprintf(stderr, "\n === SIMD Operation Modes ===\n");
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Accurate integer DCT (-dct int) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_ISLOW));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Fast integer DCT (-dct fast) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_IFAST));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Floating-point DCT (-dct float) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_FLOAT));
|
||||||
|
#endif
|
||||||
|
#ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Reduced-size DCT (-scale M/N) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_FLOAT+1));
|
||||||
|
#endif
|
||||||
|
print_simd_info(stderr, "High-quality upsampling (default) :",
|
||||||
|
jpeg_simd_upsampler(cinfo, TRUE));
|
||||||
|
print_simd_info(stderr, "Low-quality upsampling (-nosmooth) :",
|
||||||
|
jpeg_simd_upsampler(cinfo, FALSE));
|
||||||
|
print_simd_info(stderr, "Colorspace conversion (YCbCr->RGB) :",
|
||||||
|
jpeg_simd_color_deconverter(cinfo));
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
printed_version = TRUE;
|
||||||
|
}
|
||||||
|
cinfo->err->trace_level++;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "fast", 1)) {
|
||||||
|
/* Select recommended processing options for quick-and-dirty output. */
|
||||||
|
cinfo->two_pass_quantize = FALSE;
|
||||||
|
cinfo->dither_mode = JDITHER_ORDERED;
|
||||||
|
if (! cinfo->quantize_colors) /* don't override an earlier -colors */
|
||||||
|
cinfo->desired_number_of_colors = 216;
|
||||||
|
cinfo->dct_method = JDCT_FASTEST;
|
||||||
|
cinfo->do_fancy_upsampling = FALSE;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "gif", 1)) {
|
||||||
|
/* GIF output format. */
|
||||||
|
requested_fmt = FMT_GIF;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "grayscale", 2) || keymatch(arg, "greyscale",2)) {
|
||||||
|
/* Force monochrome output. */
|
||||||
|
cinfo->out_color_space = JCS_GRAYSCALE;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "map", 3)) {
|
||||||
|
/* Quantize to a color map taken from an input file. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (for_real) { /* too expensive to do twice! */
|
||||||
|
#ifdef QUANT_2PASS_SUPPORTED /* otherwise can't quantize to supplied map */
|
||||||
|
FILE * mapfile;
|
||||||
|
|
||||||
|
if ((mapfile = fopen(argv[argn], READ_BINARY)) == NULL) {
|
||||||
|
fprintf(stderr, "%s: can't open %s\n", progname, argv[argn]);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
read_color_map(cinfo, mapfile);
|
||||||
|
fclose(mapfile);
|
||||||
|
cinfo->quantize_colors = TRUE;
|
||||||
|
#else
|
||||||
|
ERREXIT(cinfo, JERR_NOT_COMPILED);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "maxmemory", 3)) {
|
||||||
|
/* Maximum memory in Kb (or Mb with 'm'). */
|
||||||
|
long lval;
|
||||||
|
char ch = 'x';
|
||||||
|
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%ld%c", &lval, &ch) < 1)
|
||||||
|
usage();
|
||||||
|
if (ch == 'm' || ch == 'M')
|
||||||
|
lval *= 1000L;
|
||||||
|
cinfo->mem->max_memory_to_use = lval * 1000L;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "nosmooth", 3)) {
|
||||||
|
/* Suppress fancy upsampling */
|
||||||
|
cinfo->do_fancy_upsampling = FALSE;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "onepass", 3)) {
|
||||||
|
/* Use fast one-pass quantization. */
|
||||||
|
cinfo->two_pass_quantize = FALSE;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "os2", 3)) {
|
||||||
|
/* BMP output format (OS/2 flavor). */
|
||||||
|
requested_fmt = FMT_OS2;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "outfile", 4)) {
|
||||||
|
/* Set output file name. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
outfilename = argv[argn]; /* save it away for later use */
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "pnm", 1) || keymatch(arg, "ppm", 1)) {
|
||||||
|
/* PPM/PGM output format. */
|
||||||
|
requested_fmt = FMT_PPM;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "rle", 1)) {
|
||||||
|
/* RLE output format. */
|
||||||
|
requested_fmt = FMT_RLE;
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "scale", 1)) {
|
||||||
|
/* Scale the output image by a fraction M/N. */
|
||||||
|
if (++argn >= argc) /* advance to next argument */
|
||||||
|
usage();
|
||||||
|
if (sscanf(argv[argn], "%d/%d",
|
||||||
|
&cinfo->scale_num, &cinfo->scale_denom) != 2)
|
||||||
|
usage();
|
||||||
|
|
||||||
|
} else if (keymatch(arg, "targa", 1)) {
|
||||||
|
/* Targa output format. */
|
||||||
|
requested_fmt = FMT_TARGA;
|
||||||
|
|
||||||
|
} else {
|
||||||
|
usage(); /* bogus switch */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return argn; /* return index of next arg (file name) */
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Marker processor for COM and interesting APPn markers.
|
||||||
|
* This replaces the library's built-in processor, which just skips the marker.
|
||||||
|
* We want to print out the marker as text, to the extent possible.
|
||||||
|
* Note this code relies on a non-suspending data source.
|
||||||
|
*/
|
||||||
|
|
||||||
|
LOCAL(unsigned int)
|
||||||
|
jpeg_getc (j_decompress_ptr cinfo)
|
||||||
|
/* Read next byte */
|
||||||
|
{
|
||||||
|
struct jpeg_source_mgr * datasrc = cinfo->src;
|
||||||
|
|
||||||
|
if (datasrc->bytes_in_buffer == 0) {
|
||||||
|
if (! (*datasrc->fill_input_buffer) (cinfo))
|
||||||
|
ERREXIT(cinfo, JERR_CANT_SUSPEND);
|
||||||
|
}
|
||||||
|
datasrc->bytes_in_buffer--;
|
||||||
|
return GETJOCTET(*datasrc->next_input_byte++);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
METHODDEF(boolean)
|
||||||
|
print_text_marker (j_decompress_ptr cinfo)
|
||||||
|
{
|
||||||
|
boolean traceit = (cinfo->err->trace_level >= 1);
|
||||||
|
INT32 length;
|
||||||
|
unsigned int ch;
|
||||||
|
unsigned int lastch = 0;
|
||||||
|
|
||||||
|
length = jpeg_getc(cinfo) << 8;
|
||||||
|
length += jpeg_getc(cinfo);
|
||||||
|
length -= 2; /* discount the length word itself */
|
||||||
|
|
||||||
|
if (traceit) {
|
||||||
|
if (cinfo->unread_marker == JPEG_COM)
|
||||||
|
fprintf(stderr, "Comment, length %ld:\n", (long) length);
|
||||||
|
else /* assume it is an APPn otherwise */
|
||||||
|
fprintf(stderr, "APP%d, length %ld:\n",
|
||||||
|
cinfo->unread_marker - JPEG_APP0, (long) length);
|
||||||
|
}
|
||||||
|
|
||||||
|
while (--length >= 0) {
|
||||||
|
ch = jpeg_getc(cinfo);
|
||||||
|
if (traceit) {
|
||||||
|
/* Emit the character in a readable form.
|
||||||
|
* Nonprintables are converted to \nnn form,
|
||||||
|
* while \ is converted to \\.
|
||||||
|
* Newlines in CR, CR/LF, or LF form will be printed as one newline.
|
||||||
|
*/
|
||||||
|
if (ch == '\r') {
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
} else if (ch == '\n') {
|
||||||
|
if (lastch != '\r')
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
} else if (ch == '\\') {
|
||||||
|
fprintf(stderr, "\\\\");
|
||||||
|
} else if (isprint(ch)) {
|
||||||
|
putc(ch, stderr);
|
||||||
|
} else {
|
||||||
|
fprintf(stderr, "\\%03o", ch);
|
||||||
|
}
|
||||||
|
lastch = ch;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (traceit)
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Check for overwrite of an existing file; clear it with user
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef NO_OVERWRITE_CHECK
|
||||||
|
|
||||||
|
LOCAL(boolean)
|
||||||
|
is_write_ok (char * outfname)
|
||||||
|
{
|
||||||
|
FILE * ofile;
|
||||||
|
int ch;
|
||||||
|
|
||||||
|
ofile = fopen(outfname, READ_BINARY);
|
||||||
|
if (ofile == NULL)
|
||||||
|
return TRUE; /* not present */
|
||||||
|
fclose(ofile); /* oops, it is present */
|
||||||
|
|
||||||
|
for (;;) {
|
||||||
|
fprintf(stderr, "%s already exists, overwrite it? [y/n] ",
|
||||||
|
outfname);
|
||||||
|
fflush(stderr);
|
||||||
|
ch = getc(stdin);
|
||||||
|
if (ch != '\n') /* flush rest of line */
|
||||||
|
while (getc(stdin) != '\n')
|
||||||
|
/* nothing */;
|
||||||
|
|
||||||
|
switch (ch) {
|
||||||
|
case 'Y':
|
||||||
|
case 'y':
|
||||||
|
return TRUE;
|
||||||
|
case 'N':
|
||||||
|
case 'n':
|
||||||
|
return FALSE;
|
||||||
|
/* otherwise, ask again */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Process a single input file name, and return its index in argv[].
|
||||||
|
* File names at or to left of old_file_index have been processed already.
|
||||||
|
*/
|
||||||
|
|
||||||
|
LOCAL(int)
|
||||||
|
process_one_file (int argc, char **argv, int old_file_index)
|
||||||
|
{
|
||||||
|
struct jpeg_decompress_struct cinfo;
|
||||||
|
struct jpeg_error_mgr jerr;
|
||||||
|
char *infilename;
|
||||||
|
char workfilename[PATH_MAX];
|
||||||
|
const char *default_extension = NULL;
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
struct cdjpeg_progress_mgr progress;
|
||||||
|
#endif
|
||||||
|
int file_index;
|
||||||
|
djpeg_dest_ptr dest_mgr = NULL;
|
||||||
|
FILE * input_file = NULL;
|
||||||
|
FILE * output_file = NULL;
|
||||||
|
JDIMENSION num_scanlines;
|
||||||
|
|
||||||
|
/* Initialize the JPEG decompression object with default error handling. */
|
||||||
|
cinfo.err = jpeg_std_error(&jerr);
|
||||||
|
jpeg_create_decompress(&cinfo);
|
||||||
|
/* Add some application-specific error messages (from cderror.h) */
|
||||||
|
jerr.addon_message_table = cdjpeg_message_table;
|
||||||
|
jerr.first_addon_message = JMSG_FIRSTADDONCODE;
|
||||||
|
jerr.last_addon_message = JMSG_LASTADDONCODE;
|
||||||
|
|
||||||
|
/* Insert custom marker processor for COM and APP12.
|
||||||
|
* APP12 is used by some digital camera makers for textual info,
|
||||||
|
* so we provide the ability to display it as text.
|
||||||
|
* If you like, additional APPn marker types can be selected for display,
|
||||||
|
* but don't try to override APP0 or APP14 this way (see libjpeg.doc).
|
||||||
|
*/
|
||||||
|
jpeg_set_marker_processor(&cinfo, JPEG_COM, print_text_marker);
|
||||||
|
jpeg_set_marker_processor(&cinfo, JPEG_APP0+12, print_text_marker);
|
||||||
|
|
||||||
|
/* Now safe to enable signal catcher. */
|
||||||
|
#ifdef NEED_SIGNAL_CATCHER
|
||||||
|
enable_signal_catcher((j_common_ptr) &cinfo);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Scan command line to find next file name.
|
||||||
|
* It is convenient to use just one switch-parsing routine, but the switch
|
||||||
|
* values read here are ignored; we will rescan the switches after opening
|
||||||
|
* the input file.
|
||||||
|
* (Exception: tracing level set here controls verbosity for COM markers
|
||||||
|
* found during jpeg_read_header...)
|
||||||
|
*/
|
||||||
|
|
||||||
|
file_index = parse_switches(&cinfo, argc, argv, old_file_index, FALSE);
|
||||||
|
if (file_index >= argc) {
|
||||||
|
fprintf(stderr, "%s: missing input file name\n", progname);
|
||||||
|
usage();
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Open the input file. */
|
||||||
|
infilename = argv[file_index];
|
||||||
|
if ((input_file = fopen(infilename, READ_BINARY)) == NULL) {
|
||||||
|
fprintf(stderr, "%s: can't open %s\n", progname, infilename);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
start_progress_monitor((j_common_ptr) &cinfo, &progress);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Specify data source for decompression */
|
||||||
|
jpeg_stdio_src(&cinfo, input_file);
|
||||||
|
|
||||||
|
/* Read file header, set default decompression parameters */
|
||||||
|
(void) jpeg_read_header(&cinfo, TRUE);
|
||||||
|
|
||||||
|
/* Adjust default decompression parameters by re-parsing the options */
|
||||||
|
file_index = parse_switches(&cinfo, argc, argv, old_file_index, TRUE);
|
||||||
|
|
||||||
|
/* Initialize the output module now to let it override any crucial
|
||||||
|
* option settings (for instance, GIF wants to force color quantization).
|
||||||
|
*/
|
||||||
|
switch (requested_fmt) {
|
||||||
|
#ifdef BMP_SUPPORTED
|
||||||
|
case FMT_BMP:
|
||||||
|
dest_mgr = jinit_write_bmp(&cinfo, FALSE);
|
||||||
|
default_extension = ".bmp";
|
||||||
|
break;
|
||||||
|
case FMT_OS2:
|
||||||
|
dest_mgr = jinit_write_bmp(&cinfo, TRUE);
|
||||||
|
default_extension = ".bmp";
|
||||||
|
break;
|
||||||
|
#endif
|
||||||
|
#ifdef GIF_SUPPORTED
|
||||||
|
case FMT_GIF:
|
||||||
|
dest_mgr = jinit_write_gif(&cinfo);
|
||||||
|
default_extension = ".gif";
|
||||||
|
break;
|
||||||
|
#endif
|
||||||
|
#ifdef PPM_SUPPORTED
|
||||||
|
case FMT_PPM:
|
||||||
|
dest_mgr = jinit_write_ppm(&cinfo);
|
||||||
|
default_extension = ".ppm";
|
||||||
|
break;
|
||||||
|
#endif
|
||||||
|
#ifdef RLE_SUPPORTED
|
||||||
|
case FMT_RLE:
|
||||||
|
dest_mgr = jinit_write_rle(&cinfo);
|
||||||
|
default_extension = ".rle";
|
||||||
|
break;
|
||||||
|
#endif
|
||||||
|
#ifdef TARGA_SUPPORTED
|
||||||
|
case FMT_TARGA:
|
||||||
|
dest_mgr = jinit_write_targa(&cinfo);
|
||||||
|
default_extension = ".tga";
|
||||||
|
break;
|
||||||
|
#endif
|
||||||
|
default:
|
||||||
|
ERREXIT(&cinfo, JERR_UNSUPPORTED_FORMAT);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* If user didn't supply -outfile switch, select output file name. */
|
||||||
|
if (outfilename == NULL) {
|
||||||
|
int i;
|
||||||
|
|
||||||
|
outfilename = workfilename;
|
||||||
|
/* Make outfilename be infilename with appropriate extension */
|
||||||
|
strcpy(outfilename, infilename);
|
||||||
|
for (i = strlen(outfilename)-1; i >= 0; i--) {
|
||||||
|
switch (outfilename[i]) {
|
||||||
|
case ':':
|
||||||
|
case '/':
|
||||||
|
case '\\':
|
||||||
|
i = 0; /* stop scanning */
|
||||||
|
break;
|
||||||
|
case '.':
|
||||||
|
outfilename[i] = '\0'; /* lop off existing extension */
|
||||||
|
i = 0; /* stop scanning */
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
break; /* keep scanning */
|
||||||
|
}
|
||||||
|
}
|
||||||
|
strcat(outfilename, default_extension);
|
||||||
|
}
|
||||||
|
|
||||||
|
fprintf(stderr, "Decompressing %s => %s\n", infilename, outfilename);
|
||||||
|
#ifndef NO_OVERWRITE_CHECK
|
||||||
|
if (! is_write_ok(outfilename))
|
||||||
|
goto fail;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Open the output file. */
|
||||||
|
if ((output_file = fopen(outfilename, WRITE_BINARY)) == NULL) {
|
||||||
|
fprintf(stderr, "%s: can't create %s\n", progname, outfilename);
|
||||||
|
goto fail;
|
||||||
|
}
|
||||||
|
dest_mgr->output_file = output_file;
|
||||||
|
|
||||||
|
/* Start decompressor */
|
||||||
|
(void) jpeg_start_decompress(&cinfo);
|
||||||
|
|
||||||
|
/* Write output file header */
|
||||||
|
(*dest_mgr->start_output) (&cinfo, dest_mgr);
|
||||||
|
|
||||||
|
/* Process data */
|
||||||
|
while (cinfo.output_scanline < cinfo.output_height) {
|
||||||
|
num_scanlines = jpeg_read_scanlines(&cinfo, dest_mgr->buffer,
|
||||||
|
dest_mgr->buffer_height);
|
||||||
|
(*dest_mgr->put_pixel_rows) (&cinfo, dest_mgr, num_scanlines);
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
/* Hack: count final pass as done in case finish_output does an extra pass.
|
||||||
|
* The library won't have updated completed_passes.
|
||||||
|
*/
|
||||||
|
progress.pub.completed_passes = progress.pub.total_passes;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Finish decompression and release memory.
|
||||||
|
* I must do it in this order because output module has allocated memory
|
||||||
|
* of lifespan JPOOL_IMAGE; it needs to finish before releasing memory.
|
||||||
|
*/
|
||||||
|
(*dest_mgr->finish_output) (&cinfo, dest_mgr);
|
||||||
|
(void) jpeg_finish_decompress(&cinfo);
|
||||||
|
|
||||||
|
/* Clean up and exit */
|
||||||
|
fail:
|
||||||
|
jpeg_destroy_decompress(&cinfo);
|
||||||
|
|
||||||
|
if (input_file != NULL) fclose(input_file);
|
||||||
|
if (output_file != NULL) fclose(output_file);
|
||||||
|
|
||||||
|
#ifdef PROGRESS_REPORT
|
||||||
|
end_progress_monitor((j_common_ptr) &cinfo);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Disable signal catcher. */
|
||||||
|
#ifdef NEED_SIGNAL_CATCHER
|
||||||
|
enable_signal_catcher((j_common_ptr) NULL);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return file_index;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The main program.
|
||||||
|
*/
|
||||||
|
|
||||||
|
int
|
||||||
|
main (int argc, char **argv)
|
||||||
|
{
|
||||||
|
int file_index;
|
||||||
|
|
||||||
|
/* On Mac, fetch a command line. */
|
||||||
|
#ifdef USE_CCOMMAND
|
||||||
|
argc = ccommand(&argv);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef MSDOS
|
||||||
|
progname = "djpeg"; /* DOS tends to be too verbose about argv[0] */
|
||||||
|
#else
|
||||||
|
progname = argv[0];
|
||||||
|
if (progname == NULL || progname[0] == 0)
|
||||||
|
progname = "djpeg"; /* in case C library doesn't provide it */
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* The default maxmem must be computed only once at program startup,
|
||||||
|
* since releasing memory with free() won't give it back to the OS.
|
||||||
|
*/
|
||||||
|
#ifdef FREE_MEM_ESTIMATE
|
||||||
|
default_maxmem = FREE_MEM_ESTIMATE;
|
||||||
|
#else
|
||||||
|
default_maxmem = 0;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Scan command line, parse switches and locate input file names */
|
||||||
|
|
||||||
|
if (argc < 2)
|
||||||
|
usage(); /* nothing on the command line?? */
|
||||||
|
|
||||||
|
file_index = 0;
|
||||||
|
|
||||||
|
while (file_index < argc-1)
|
||||||
|
file_index = process_one_file(argc, argv, file_index);
|
||||||
|
|
||||||
|
/* All done. */
|
||||||
|
exit(EXIT_SUCCESS);
|
||||||
|
return 0; /* suppress no-return-value warnings */
|
||||||
|
}
|
||||||
62
altui/usage.alt
Normal file
62
altui/usage.alt
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
(Most of the standard usage.doc file also applies to this alternate version,
|
||||||
|
but replace its "GENERAL USAGE" section with the text below. Edit the text
|
||||||
|
as necessary if you don't support wildcards or overwrite checking. Be sure
|
||||||
|
to fix the djpeg switch descriptions if you are not defaulting to PPM output.
|
||||||
|
Also, if you've provided an accurate memory-estimation procedure, you can
|
||||||
|
probably eliminate the HINTS related to the -maxmemory switch.)
|
||||||
|
|
||||||
|
|
||||||
|
GENERAL USAGE
|
||||||
|
|
||||||
|
We provide two programs, cjpeg to compress an image file into JPEG format,
|
||||||
|
and djpeg to decompress a JPEG file back into a conventional image format.
|
||||||
|
|
||||||
|
The basic command line is:
|
||||||
|
cjpeg [switches] list of image files
|
||||||
|
or
|
||||||
|
djpeg [switches] list of jpeg files
|
||||||
|
|
||||||
|
Each file named is compressed or decompressed. The input file(s) are not
|
||||||
|
modified; the output data is written to files which have the same names
|
||||||
|
except for extension. cjpeg always uses ".jpg" for the output file name's
|
||||||
|
extension; djpeg uses one of ".bmp", ".gif", ".ppm", ".rle", or ".tga",
|
||||||
|
depending on what output format is selected by the switches.
|
||||||
|
|
||||||
|
For example, to convert xxx.bmp to xxx.jpg and yyy.ppm to yyy.jpg, say:
|
||||||
|
cjpeg xxx.bmp yyy.ppm
|
||||||
|
|
||||||
|
On most systems you can use standard wildcards to specify the list of input
|
||||||
|
files; for example, on DOS "djpeg *.jpg" decompresses all the JPEG files in
|
||||||
|
the current directory.
|
||||||
|
|
||||||
|
If an intended output file already exists, you'll be asked whether or not to
|
||||||
|
overwrite it. If you say no, the program skips that input file and goes on
|
||||||
|
to the next one.
|
||||||
|
|
||||||
|
You can intermix switches and file names; for example
|
||||||
|
djpeg -gif file1.jpg -targa file2.jpg
|
||||||
|
decompresses file1.jpg into GIF format (file1.gif) and file2.jpg into Targa
|
||||||
|
format (file2.tga). Only switches to the left of a given file name affect
|
||||||
|
processing of that file; when there are conflicting switches, the rightmost
|
||||||
|
one takes precedence.
|
||||||
|
|
||||||
|
You can override the program's choice of output file name by using the
|
||||||
|
-outfile switch, as in
|
||||||
|
cjpeg -outfile output.jpg input.ppm
|
||||||
|
-outfile only affects the first input file name to its right.
|
||||||
|
|
||||||
|
The currently supported image file formats are: PPM (PBMPLUS color format),
|
||||||
|
PGM (PBMPLUS gray-scale format), BMP, GIF, Targa, and RLE (Utah Raster
|
||||||
|
Toolkit format). (RLE is supported only if the URT library is available,
|
||||||
|
which it isn't on most non-Unix systems.) cjpeg recognizes the input image
|
||||||
|
format automatically, with the exception of some Targa-format files. You
|
||||||
|
have to tell djpeg which format to generate.
|
||||||
|
|
||||||
|
JPEG files are in the defacto standard JFIF file format. There are other,
|
||||||
|
less widely used JPEG-based file formats, but we don't support them.
|
||||||
|
|
||||||
|
All switch names may be abbreviated; for example, -grayscale may be written
|
||||||
|
-gray or -gr. Most of the "basic" switches can be abbreviated to as little as
|
||||||
|
one letter. Upper and lower case are equivalent (-BMP is the same as -bmp).
|
||||||
|
British spellings are also accepted (e.g., -greyscale), though for brevity
|
||||||
|
these are not mentioned below.
|
||||||
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* cderror.h
|
* cderror.h
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994, Thomas G. Lane.
|
* Copyright (C) 1994-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -72,7 +72,7 @@ JMESSAGE(JWRN_GIF_NOMOREDATA, "Ran out of GIF bits")
|
|||||||
#ifdef PPM_SUPPORTED
|
#ifdef PPM_SUPPORTED
|
||||||
JMESSAGE(JERR_PPM_COLORSPACE, "PPM output must be grayscale or RGB")
|
JMESSAGE(JERR_PPM_COLORSPACE, "PPM output must be grayscale or RGB")
|
||||||
JMESSAGE(JERR_PPM_NONNUMERIC, "Nonnumeric data in PPM file")
|
JMESSAGE(JERR_PPM_NONNUMERIC, "Nonnumeric data in PPM file")
|
||||||
JMESSAGE(JERR_PPM_NOT, "Not a PPM file")
|
JMESSAGE(JERR_PPM_NOT, "Not a PPM/PGM file")
|
||||||
JMESSAGE(JTRC_PGM, "%ux%u PGM image")
|
JMESSAGE(JTRC_PGM, "%ux%u PGM image")
|
||||||
JMESSAGE(JTRC_PGM_TEXT, "%ux%u text PGM image")
|
JMESSAGE(JTRC_PGM_TEXT, "%ux%u text PGM image")
|
||||||
JMESSAGE(JTRC_PPM, "%ux%u PPM image")
|
JMESSAGE(JTRC_PPM, "%ux%u PPM image")
|
||||||
|
|||||||
4
cdjpeg.c
4
cdjpeg.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* cdjpeg.c
|
* cdjpeg.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -47,7 +47,9 @@ GLOBAL(void)
|
|||||||
enable_signal_catcher (j_common_ptr cinfo)
|
enable_signal_catcher (j_common_ptr cinfo)
|
||||||
{
|
{
|
||||||
sig_cinfo = cinfo;
|
sig_cinfo = cinfo;
|
||||||
|
#ifdef SIGINT /* not all systems have SIGINT */
|
||||||
signal(SIGINT, signal_catcher);
|
signal(SIGINT, signal_catcher);
|
||||||
|
#endif
|
||||||
#ifdef SIGTERM /* not all systems have SIGTERM */
|
#ifdef SIGTERM /* not all systems have SIGTERM */
|
||||||
signal(SIGTERM, signal_catcher);
|
signal(SIGTERM, signal_catcher);
|
||||||
#endif
|
#endif
|
||||||
|
|||||||
7
cdjpeg.h
7
cdjpeg.h
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* cdjpeg.h
|
* cdjpeg.h
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1996, Thomas G. Lane.
|
* Copyright (C) 1994-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -156,9 +156,14 @@ EXTERN(FILE *) write_stdout JPP((void));
|
|||||||
#define READ_BINARY "r"
|
#define READ_BINARY "r"
|
||||||
#define WRITE_BINARY "w"
|
#define WRITE_BINARY "w"
|
||||||
#else
|
#else
|
||||||
|
#ifdef VMS /* VMS is very nonstandard */
|
||||||
|
#define READ_BINARY "rb", "ctx=stm"
|
||||||
|
#define WRITE_BINARY "wb", "ctx=stm"
|
||||||
|
#else /* standard ANSI-compliant case */
|
||||||
#define READ_BINARY "rb"
|
#define READ_BINARY "rb"
|
||||||
#define WRITE_BINARY "wb"
|
#define WRITE_BINARY "wb"
|
||||||
#endif
|
#endif
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifndef EXIT_FAILURE /* define exit() codes if not provided */
|
#ifndef EXIT_FAILURE /* define exit() codes if not provided */
|
||||||
#define EXIT_FAILURE 1
|
#define EXIT_FAILURE 1
|
||||||
|
|||||||
65
change.log
65
change.log
@@ -1,6 +1,71 @@
|
|||||||
CHANGE LOG for Independent JPEG Group's JPEG software
|
CHANGE LOG for Independent JPEG Group's JPEG software
|
||||||
|
|
||||||
|
|
||||||
|
Version 6b 27-Mar-1998
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
jpegtran has new features for lossless image transformations (rotation
|
||||||
|
and flipping) as well as "lossless" reduction to grayscale.
|
||||||
|
|
||||||
|
jpegtran now copies comments by default; it has a -copy switch to enable
|
||||||
|
copying all APPn blocks as well, or to suppress comments. (Formerly it
|
||||||
|
always suppressed comments and APPn blocks.) jpegtran now also preserves
|
||||||
|
JFIF version and resolution information.
|
||||||
|
|
||||||
|
New decompressor library feature: COM and APPn markers found in the input
|
||||||
|
file can be saved in memory for later use by the application. (Before,
|
||||||
|
you had to code this up yourself with a custom marker processor.)
|
||||||
|
|
||||||
|
There is an unused field "void * client_data" now in compress and decompress
|
||||||
|
parameter structs; this may be useful in some applications.
|
||||||
|
|
||||||
|
JFIF version number information is now saved by the decoder and accepted by
|
||||||
|
the encoder. jpegtran uses this to copy the source file's version number,
|
||||||
|
to ensure "jpegtran -copy all" won't create bogus files that contain JFXX
|
||||||
|
extensions but claim to be version 1.01. Applications that generate their
|
||||||
|
own JFXX extension markers also (finally) have a supported way to cause the
|
||||||
|
encoder to emit JFIF version number 1.02.
|
||||||
|
|
||||||
|
djpeg's trace mode reports JFIF 1.02 thumbnail images as such, rather
|
||||||
|
than as unknown APP0 markers.
|
||||||
|
|
||||||
|
In -verbose mode, djpeg and rdjpgcom will try to print the contents of
|
||||||
|
APP12 markers as text. Some digital cameras store useful text information
|
||||||
|
in APP12 markers.
|
||||||
|
|
||||||
|
Handling of truncated data streams is more robust: blocks beyond the one in
|
||||||
|
which the error occurs will be output as uniform gray, or left unchanged
|
||||||
|
if decoding a progressive JPEG. The appearance no longer depends on the
|
||||||
|
Huffman tables being used.
|
||||||
|
|
||||||
|
Huffman tables are checked for validity much more carefully than before.
|
||||||
|
|
||||||
|
To avoid the Unisys LZW patent, djpeg's GIF output capability has been
|
||||||
|
changed to produce "uncompressed GIFs", and cjpeg's GIF input capability
|
||||||
|
has been removed altogether. We're not happy about it either, but there
|
||||||
|
seems to be no good alternative.
|
||||||
|
|
||||||
|
The configure script now supports building libjpeg as a shared library
|
||||||
|
on many flavors of Unix (all the ones that GNU libtool knows how to
|
||||||
|
build shared libraries for). Use "./configure --enable-shared" to
|
||||||
|
try this out.
|
||||||
|
|
||||||
|
New jconfig file and makefiles for Microsoft Visual C++ and Developer Studio.
|
||||||
|
Also, a jconfig file and a build script for Metrowerks CodeWarrior
|
||||||
|
on Apple Macintosh. makefile.dj has been updated for DJGPP v2, and there
|
||||||
|
are miscellaneous other minor improvements in the makefiles.
|
||||||
|
|
||||||
|
jmemmac.c now knows how to create temporary files following Mac System 7
|
||||||
|
conventions.
|
||||||
|
|
||||||
|
djpeg's -map switch is now able to read raw-format PPM files reliably.
|
||||||
|
|
||||||
|
cjpeg -progressive -restart no longer generates any unnecessary DRI markers.
|
||||||
|
|
||||||
|
Multiple calls to jpeg_simple_progression for a single JPEG object
|
||||||
|
no longer leak memory.
|
||||||
|
|
||||||
|
|
||||||
Version 6a 7-Feb-96
|
Version 6a 7-Feb-96
|
||||||
--------------------
|
--------------------
|
||||||
|
|
||||||
|
|||||||
34
cjpeg.1
34
cjpeg.1
@@ -1,4 +1,4 @@
|
|||||||
.TH CJPEG 1 "15 June 1995"
|
.TH CJPEG 1 "20 March 1998"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
cjpeg \- compress an image file to a JPEG file
|
cjpeg \- compress an image file to a JPEG file
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
@@ -16,7 +16,7 @@ cjpeg \- compress an image file to a JPEG file
|
|||||||
compresses the named image file, or the standard input if no file is
|
compresses the named image file, or the standard input if no file is
|
||||||
named, and produces a JPEG/JFIF file on the standard output.
|
named, and produces a JPEG/JFIF file on the standard output.
|
||||||
The currently supported input file formats are: PPM (PBMPLUS color
|
The currently supported input file formats are: PPM (PBMPLUS color
|
||||||
format), PGM (PBMPLUS gray-scale format), BMP, GIF, Targa, and RLE (Utah Raster
|
format), PGM (PBMPLUS gray-scale format), BMP, Targa, and RLE (Utah Raster
|
||||||
Toolkit format). (RLE is supported only if the URT library is available.)
|
Toolkit format). (RLE is supported only if the URT library is available.)
|
||||||
.SH OPTIONS
|
.SH OPTIONS
|
||||||
All switch names may be abbreviated; for example,
|
All switch names may be abbreviated; for example,
|
||||||
@@ -27,9 +27,9 @@ or
|
|||||||
.BR \-gr .
|
.BR \-gr .
|
||||||
Most of the "basic" switches can be abbreviated to as little as one letter.
|
Most of the "basic" switches can be abbreviated to as little as one letter.
|
||||||
Upper and lower case are equivalent (thus
|
Upper and lower case are equivalent (thus
|
||||||
.B \-GIF
|
.B \-BMP
|
||||||
is the same as
|
is the same as
|
||||||
.BR \-gif ).
|
.BR \-bmp ).
|
||||||
British spellings are also accepted (e.g.,
|
British spellings are also accepted (e.g.,
|
||||||
.BR \-greyscale ),
|
.BR \-greyscale ),
|
||||||
though for brevity these are not mentioned below.
|
though for brevity these are not mentioned below.
|
||||||
@@ -42,9 +42,9 @@ Scale quantization tables to adjust image quality. Quality is 0 (worst) to
|
|||||||
.TP
|
.TP
|
||||||
.B \-grayscale
|
.B \-grayscale
|
||||||
Create monochrome JPEG file from color input. Be sure to use this switch when
|
Create monochrome JPEG file from color input. Be sure to use this switch when
|
||||||
compressing a grayscale GIF file, because
|
compressing a grayscale BMP file, because
|
||||||
.B cjpeg
|
.B cjpeg
|
||||||
isn't bright enough to notice whether a GIF file uses only shades of gray.
|
isn't bright enough to notice whether a BMP file uses only shades of gray.
|
||||||
By saying
|
By saying
|
||||||
.BR \-grayscale ,
|
.BR \-grayscale ,
|
||||||
you'll get a smaller JPEG file that takes less time to process.
|
you'll get a smaller JPEG file that takes less time to process.
|
||||||
@@ -180,16 +180,22 @@ for images that will be transmitted across unreliable networks such as Usenet.
|
|||||||
The
|
The
|
||||||
.B \-smooth
|
.B \-smooth
|
||||||
option filters the input to eliminate fine-scale noise. This is often useful
|
option filters the input to eliminate fine-scale noise. This is often useful
|
||||||
when converting GIF files to JPEG: a moderate smoothing factor of 10 to 50
|
when converting dithered images to JPEG: a moderate smoothing factor of 10 to
|
||||||
gets rid of dithering patterns in the input file, resulting in a smaller JPEG
|
50 gets rid of dithering patterns in the input file, resulting in a smaller
|
||||||
file and a better-looking image. Too large a smoothing factor will visibly
|
JPEG file and a better-looking image. Too large a smoothing factor will
|
||||||
blur the image, however.
|
visibly blur the image, however.
|
||||||
.PP
|
.PP
|
||||||
Switches for wizards:
|
Switches for wizards:
|
||||||
.TP
|
.TP
|
||||||
.B \-baseline
|
.B \-baseline
|
||||||
Force a baseline JPEG file to be generated. This clamps quantization values
|
Force baseline-compatible quantization tables to be generated. This clamps
|
||||||
to 8 bits even at low quality settings.
|
quantization values to 8 bits even at low quality settings. (This switch is
|
||||||
|
poorly named, since it does not ensure that the output is actually baseline
|
||||||
|
JPEG. For example, you can use
|
||||||
|
.B \-baseline
|
||||||
|
and
|
||||||
|
.B \-progressive
|
||||||
|
together.)
|
||||||
.TP
|
.TP
|
||||||
.BI \-qtables " file"
|
.BI \-qtables " file"
|
||||||
Use the quantization tables given in the specified text file.
|
Use the quantization tables given in the specified text file.
|
||||||
@@ -272,6 +278,10 @@ Independent JPEG Group
|
|||||||
.SH BUGS
|
.SH BUGS
|
||||||
Arithmetic coding is not supported for legal reasons.
|
Arithmetic coding is not supported for legal reasons.
|
||||||
.PP
|
.PP
|
||||||
|
GIF input files are no longer supported, to avoid the Unisys LZW patent.
|
||||||
|
Use a Unisys-licensed program if you need to read a GIF file. (Conversion
|
||||||
|
of GIF files to JPEG is usually a bad idea anyway.)
|
||||||
|
.PP
|
||||||
Not all variants of BMP and Targa file formats are supported.
|
Not all variants of BMP and Targa file formats are supported.
|
||||||
.PP
|
.PP
|
||||||
The
|
The
|
||||||
|
|||||||
68
cjpeg.c
68
cjpeg.c
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* cjpeg.c
|
* cjpeg.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : August 23, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains a command-line user interface for the JPEG compressor.
|
* This file contains a command-line user interface for the JPEG compressor.
|
||||||
* It should work on any system with Unix- or MS-DOS-style command lines.
|
* It should work on any system with Unix- or MS-DOS-style command lines.
|
||||||
*
|
*
|
||||||
@@ -184,7 +191,7 @@ usage (void)
|
|||||||
#ifdef C_ARITH_CODING_SUPPORTED
|
#ifdef C_ARITH_CODING_SUPPORTED
|
||||||
fprintf(stderr, " -arithmetic Use arithmetic coding\n");
|
fprintf(stderr, " -arithmetic Use arithmetic coding\n");
|
||||||
#endif
|
#endif
|
||||||
fprintf(stderr, " -baseline Force baseline output\n");
|
fprintf(stderr, " -baseline Force baseline quantization tables\n");
|
||||||
fprintf(stderr, " -qtables file Use quantization tables given in file\n");
|
fprintf(stderr, " -qtables file Use quantization tables given in file\n");
|
||||||
fprintf(stderr, " -qslots N[,...] Set component quantization tables\n");
|
fprintf(stderr, " -qslots N[,...] Set component quantization tables\n");
|
||||||
fprintf(stderr, " -sample HxV[,...] Set component sampling factors\n");
|
fprintf(stderr, " -sample HxV[,...] Set component sampling factors\n");
|
||||||
@@ -195,6 +202,22 @@ usage (void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
print_simd_info (FILE * file, char * labelstr, unsigned int simd)
|
||||||
|
{
|
||||||
|
fprintf(file, "%s%s%s%s%s%s\n", labelstr,
|
||||||
|
simd & JSIMD_MMX ? " MMX" : "",
|
||||||
|
simd & JSIMD_3DNOW ? " 3DNow!" : "",
|
||||||
|
simd & JSIMD_SSE ? " SSE" : "",
|
||||||
|
simd & JSIMD_SSE2 ? " SSE2" : "",
|
||||||
|
simd == JSIMD_NONE ? " NONE" : "");
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
LOCAL(int)
|
LOCAL(int)
|
||||||
parse_switches (j_compress_ptr cinfo, int argc, char **argv,
|
parse_switches (j_compress_ptr cinfo, int argc, char **argv,
|
||||||
int last_file_arg_seen, boolean for_real)
|
int last_file_arg_seen, boolean for_real)
|
||||||
@@ -255,9 +278,22 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
|
|||||||
#endif
|
#endif
|
||||||
|
|
||||||
} else if (keymatch(arg, "baseline", 1)) {
|
} else if (keymatch(arg, "baseline", 1)) {
|
||||||
/* Force baseline output (8-bit quantizer values). */
|
/* Force baseline-compatible output (8-bit quantizer values). */
|
||||||
force_baseline = TRUE;
|
force_baseline = TRUE;
|
||||||
|
|
||||||
|
#ifndef JSIMD_MASKFUNC_NOT_SUPPORTED
|
||||||
|
} else if (keymatch(arg, "nosimd" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_ALL);
|
||||||
|
} else if (keymatch(arg, "nommx" , 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_MMX);
|
||||||
|
} else if (keymatch(arg, "no3dnow", 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_3DNOW);
|
||||||
|
} else if (keymatch(arg, "nosse" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE);
|
||||||
|
} else if (keymatch(arg, "nosse2" , 6)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE2);
|
||||||
|
#endif /* !JSIMD_MASKFUNC_NOT_SUPPORTED */
|
||||||
|
|
||||||
} else if (keymatch(arg, "dct", 2)) {
|
} else if (keymatch(arg, "dct", 2)) {
|
||||||
/* Select DCT algorithm. */
|
/* Select DCT algorithm. */
|
||||||
if (++argn >= argc) /* advance to next argument */
|
if (++argn >= argc) /* advance to next argument */
|
||||||
@@ -279,6 +315,32 @@ parse_switches (j_compress_ptr cinfo, int argc, char **argv,
|
|||||||
if (! printed_version) {
|
if (! printed_version) {
|
||||||
fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",
|
fprintf(stderr, "Independent JPEG Group's CJPEG, version %s\n%s\n",
|
||||||
JVERSION, JCOPYRIGHT);
|
JVERSION, JCOPYRIGHT);
|
||||||
|
fprintf(stderr,
|
||||||
|
"\nx86 SIMD extension for IJG JPEG library, version %s\n\n",
|
||||||
|
JPEG_SIMDEXT_VER_STR);
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "SIMD instructions supported by the system :",
|
||||||
|
jpeg_simd_support(NULL));
|
||||||
|
|
||||||
|
fprintf(stderr, "\n === SIMD Operation Modes ===\n");
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Accurate integer DCT (-dct int) :",
|
||||||
|
jpeg_simd_forward_dct(cinfo, JDCT_ISLOW));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Fast integer DCT (-dct fast) :",
|
||||||
|
jpeg_simd_forward_dct(cinfo, JDCT_IFAST));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Floating-point DCT (-dct float) :",
|
||||||
|
jpeg_simd_forward_dct(cinfo, JDCT_FLOAT));
|
||||||
|
#endif
|
||||||
|
print_simd_info(stderr, "Downsampling (-sample 2x2 or 2x1) :",
|
||||||
|
jpeg_simd_downsampler(cinfo));
|
||||||
|
print_simd_info(stderr, "Colorspace conversion (RGB->YCbCr) :",
|
||||||
|
jpeg_simd_color_converter(cinfo));
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
printed_version = TRUE;
|
printed_version = TRUE;
|
||||||
}
|
}
|
||||||
cinfo->err->trace_level++;
|
cinfo->err->trace_level++;
|
||||||
|
|||||||
22
ckconfig.c
22
ckconfig.c
@@ -4,6 +4,13 @@
|
|||||||
* Copyright (C) 1991-1994, Thomas G. Lane.
|
* Copyright (C) 1991-1994, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : March 28, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@@ -361,6 +368,10 @@ int main (argc, argv)
|
|||||||
fprintf(outfile, "#define INCOMPLETE_TYPES_BROKEN\n");
|
fprintf(outfile, "#define INCOMPLETE_TYPES_BROKEN\n");
|
||||||
#else
|
#else
|
||||||
fprintf(outfile, "#undef INCOMPLETE_TYPES_BROKEN\n");
|
fprintf(outfile, "#undef INCOMPLETE_TYPES_BROKEN\n");
|
||||||
|
#endif
|
||||||
|
#ifdef _WIN32
|
||||||
|
fprintf(outfile, "\n/* Define "boolean" as unsigned char, not int, per Windows custom */\n");
|
||||||
|
fprintf(outfile, "#define TYPEDEF_UCHAR_BOOLEAN\n");
|
||||||
#endif
|
#endif
|
||||||
fprintf(outfile, "\n#ifdef JPEG_INTERNALS\n\n");
|
fprintf(outfile, "\n#ifdef JPEG_INTERNALS\n\n");
|
||||||
if (is_shifting_signed(-0x7F7E80B1L))
|
if (is_shifting_signed(-0x7F7E80B1L))
|
||||||
@@ -368,6 +379,14 @@ int main (argc, argv)
|
|||||||
else
|
else
|
||||||
fprintf(outfile, "#define RIGHT_SHIFT_IS_UNSIGNED\n");
|
fprintf(outfile, "#define RIGHT_SHIFT_IS_UNSIGNED\n");
|
||||||
fprintf(outfile, "\n#endif /* JPEG_INTERNALS */\n");
|
fprintf(outfile, "\n#endif /* JPEG_INTERNALS */\n");
|
||||||
|
|
||||||
|
fprintf(outfile, "\n#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)\n");
|
||||||
|
fprintf(outfile, "#undef JSIMD_MMX_NOT_SUPPORTED\n");
|
||||||
|
fprintf(outfile, "#undef JSIMD_3DNOW_NOT_SUPPORTED\n");
|
||||||
|
fprintf(outfile, "#undef JSIMD_SSE_NOT_SUPPORTED\n");
|
||||||
|
fprintf(outfile, "#undef JSIMD_SSE2_NOT_SUPPORTED\n");
|
||||||
|
fprintf(outfile, "#endif\n");
|
||||||
|
|
||||||
fprintf(outfile, "\n#ifdef JPEG_CJPEG_DJPEG\n\n");
|
fprintf(outfile, "\n#ifdef JPEG_CJPEG_DJPEG\n\n");
|
||||||
fprintf(outfile, "#define BMP_SUPPORTED /* BMP image file format */\n");
|
fprintf(outfile, "#define BMP_SUPPORTED /* BMP image file format */\n");
|
||||||
fprintf(outfile, "#define GIF_SUPPORTED /* GIF image file format */\n");
|
fprintf(outfile, "#define GIF_SUPPORTED /* GIF image file format */\n");
|
||||||
@@ -375,6 +394,9 @@ int main (argc, argv)
|
|||||||
fprintf(outfile, "#undef RLE_SUPPORTED /* Utah RLE image file format */\n");
|
fprintf(outfile, "#undef RLE_SUPPORTED /* Utah RLE image file format */\n");
|
||||||
fprintf(outfile, "#define TARGA_SUPPORTED /* Targa image file format */\n\n");
|
fprintf(outfile, "#define TARGA_SUPPORTED /* Targa image file format */\n\n");
|
||||||
fprintf(outfile, "#undef TWO_FILE_COMMANDLINE /* You may need this on non-Unix systems */\n");
|
fprintf(outfile, "#undef TWO_FILE_COMMANDLINE /* You may need this on non-Unix systems */\n");
|
||||||
|
#ifdef _WIN32
|
||||||
|
fprintf(outfile, "#define USE_SETMODE /* Needed to make one-file style work */\n");
|
||||||
|
#endif
|
||||||
fprintf(outfile, "#undef NEED_SIGNAL_CATCHER /* Define this if you use jmemname.c */\n");
|
fprintf(outfile, "#undef NEED_SIGNAL_CATCHER /* Define this if you use jmemname.c */\n");
|
||||||
fprintf(outfile, "#undef DONT_USE_B_MODE\n");
|
fprintf(outfile, "#undef DONT_USE_B_MODE\n");
|
||||||
fprintf(outfile, "/* #define PROGRESS_REPORT */ /* optional */\n");
|
fprintf(outfile, "/* #define PROGRESS_REPORT */ /* optional */\n");
|
||||||
|
|||||||
1491
config.guess
vendored
Normal file
1491
config.guess
vendored
Normal file
File diff suppressed because it is too large
Load Diff
1606
config.sub
vendored
Normal file
1606
config.sub
vendored
Normal file
File diff suppressed because it is too large
Load Diff
44
config.ver
Normal file
44
config.ver
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
|
||||||
|
JPEG_VER_MAJOR=62
|
||||||
|
JPEG_VER_MINOR=1
|
||||||
|
JPEG_REVISION=0
|
||||||
|
|
||||||
|
case $host_os in
|
||||||
|
cygwin*)
|
||||||
|
# The shared library built from this source code is *not* binary
|
||||||
|
# compatible with the cygwin's official binary release (cygjpeg-62.dll).
|
||||||
|
# This is because the official binary has been built with
|
||||||
|
# the lossless jpeg patch which is available as ljpeg-6b.tar.gz .
|
||||||
|
# Therefore we decided to give the shared library the version number
|
||||||
|
# other than 62.
|
||||||
|
#
|
||||||
|
JPEG_VER_MAJOR=162
|
||||||
|
JPEG_VER_MINOR=0
|
||||||
|
;;
|
||||||
|
freebsd*)
|
||||||
|
# This follows the official binary release in the ports collection.
|
||||||
|
JPEG_VER_MAJOR=9
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# convert absolute version numbers to libtool ages
|
||||||
|
case $version_type in
|
||||||
|
freebsd-aout|freebsd-elf|sunos)
|
||||||
|
JPEG_LT_CURRENT=$JPEG_VER_MAJOR
|
||||||
|
JPEG_LT_REVISION=$JPEG_VER_MINOR
|
||||||
|
JPEG_LT_AGE=0
|
||||||
|
;;
|
||||||
|
irix|nonstopux)
|
||||||
|
JPEG_LT_CURRENT=`expr $JPEG_VER_MAJOR + $JPEG_VER_MINOR - 1`
|
||||||
|
JPEG_LT_AGE=$JPEG_VER_MINOR
|
||||||
|
JPEG_LT_REVISION=$JPEG_VER_MINOR
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
JPEG_LT_CURRENT=`expr $JPEG_VER_MAJOR + $JPEG_VER_MINOR`
|
||||||
|
JPEG_LT_AGE=$JPEG_VER_MINOR
|
||||||
|
JPEG_LT_REVISION=$JPEG_REVISION
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
JPEG_LIB_VERSION=$JPEG_LT_CURRENT:$JPEG_LT_REVISION:$JPEG_LT_AGE
|
||||||
|
|
||||||
634
configure.in
Normal file
634
configure.in
Normal file
@@ -0,0 +1,634 @@
|
|||||||
|
dnl Process this file with autoconf to produce a configure script.
|
||||||
|
AC_INIT([jcmaster.c])
|
||||||
|
AC_CONFIG_HEADER([jconfig.h:jconfig.cfg])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_PROG_CC
|
||||||
|
AC_PROG_CPP
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for function prototypes])
|
||||||
|
AC_CACHE_VAL([ijg_cv_have_prototypes],[AC_TRY_COMPILE([
|
||||||
|
int testfunction (int arg1, int * arg2); /* check prototypes */
|
||||||
|
struct methods_struct { /* check method-pointer declarations */
|
||||||
|
int (*error_exit) (char *msgtext);
|
||||||
|
int (*trace_message) (char *msgtext);
|
||||||
|
int (*another_method) (void);
|
||||||
|
};
|
||||||
|
int testfunction (int arg1, int * arg2) /* check definitions */
|
||||||
|
{ return arg2[arg1]; }
|
||||||
|
int test2function (void) /* check void arg list */
|
||||||
|
{ return 0; }
|
||||||
|
],[ ],[ijg_cv_have_prototypes=yes],[ijg_cv_have_prototypes=no])])
|
||||||
|
AC_MSG_RESULT([$ijg_cv_have_prototypes])
|
||||||
|
if test $ijg_cv_have_prototypes = yes; then
|
||||||
|
AC_DEFINE([HAVE_PROTOTYPES],)
|
||||||
|
else
|
||||||
|
echo [Your compiler does not seem to know about function prototypes.]
|
||||||
|
echo [Perhaps it needs a special switch to enable ANSI C mode.]
|
||||||
|
echo [If so, we recommend running configure like this:]
|
||||||
|
echo [" ./configure CC='cc -switch'"]
|
||||||
|
echo [where -switch is the proper switch.]
|
||||||
|
fi
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_CHECK_HEADER([stddef.h],[AC_DEFINE([HAVE_STDDEF_H],)])
|
||||||
|
AC_CHECK_HEADER([stdlib.h],[AC_DEFINE([HAVE_STDLIB_H],)])
|
||||||
|
AC_CHECK_HEADER([string.h],[:],[AC_DEFINE([NEED_BSD_STRINGS],)])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for size_t])
|
||||||
|
AC_TRY_COMPILE([
|
||||||
|
#ifdef HAVE_STDDEF_H
|
||||||
|
#include <stddef.h>
|
||||||
|
#endif
|
||||||
|
#ifdef HAVE_STDLIB_H
|
||||||
|
#include <stdlib.h>
|
||||||
|
#endif
|
||||||
|
#include <stdio.h>
|
||||||
|
#ifdef NEED_BSD_STRINGS
|
||||||
|
#include <strings.h>
|
||||||
|
#else
|
||||||
|
#include <string.h>
|
||||||
|
#endif
|
||||||
|
typedef size_t my_size_t;
|
||||||
|
],[ my_size_t foovar; ],
|
||||||
|
[ijg_size_t_ok=yes],
|
||||||
|
[ijg_size_t_ok="not ANSI, perhaps it is in sys/types.h"])
|
||||||
|
AC_MSG_RESULT([$ijg_size_t_ok])
|
||||||
|
if test "$ijg_size_t_ok" != yes; then
|
||||||
|
AC_CHECK_HEADER([sys/types.h],[AC_DEFINE([NEED_SYS_TYPES_H],)
|
||||||
|
AC_EGREP_HEADER([size_t],[sys/types.h],
|
||||||
|
[ijg_size_t_ok="size_t is in sys/types.h"],[ijg_size_t_ok=no])],
|
||||||
|
[ijg_size_t_ok=no])
|
||||||
|
AC_MSG_RESULT([$ijg_size_t_ok])
|
||||||
|
if test "$ijg_size_t_ok" = no; then
|
||||||
|
echo [Type size_t is not defined in any of the usual places.]
|
||||||
|
echo [Try putting '"typedef unsigned int size_t;"' in jconfig.h.]
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for type unsigned char])
|
||||||
|
AC_TRY_COMPILE(,[ unsigned char un_char; ],[AC_MSG_RESULT(yes)
|
||||||
|
AC_DEFINE([HAVE_UNSIGNED_CHAR],)],[AC_MSG_RESULT(no)])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for type unsigned short])
|
||||||
|
AC_TRY_COMPILE(,[ unsigned short un_short; ],[AC_MSG_RESULT(yes)
|
||||||
|
AC_DEFINE([HAVE_UNSIGNED_SHORT],)],[AC_MSG_RESULT(no)])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for type void])
|
||||||
|
AC_TRY_COMPILE([
|
||||||
|
/* Caution: a C++ compiler will insist on valid prototypes */
|
||||||
|
typedef void * void_ptr; /* check void * */
|
||||||
|
#ifdef HAVE_PROTOTYPES /* check ptr to function returning void */
|
||||||
|
typedef void (*void_func) (int a, int b);
|
||||||
|
#else
|
||||||
|
typedef void (*void_func) ();
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef HAVE_PROTOTYPES /* check void function result */
|
||||||
|
void test3function (void_ptr arg1, void_func arg2)
|
||||||
|
#else
|
||||||
|
void test3function (arg1, arg2)
|
||||||
|
void_ptr arg1;
|
||||||
|
void_func arg2;
|
||||||
|
#endif
|
||||||
|
{
|
||||||
|
char * locptr = (char *) arg1; /* check casting to and from void * */
|
||||||
|
arg1 = (void *) locptr;
|
||||||
|
(*arg2) (1, 2); /* check call of fcn returning void */
|
||||||
|
}
|
||||||
|
],[ ],[AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)
|
||||||
|
AC_DEFINE([void],[char])])
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for working const])
|
||||||
|
AC_CACHE_VAL([ac_cv_c_const],[AC_TRY_COMPILE(,[
|
||||||
|
/* Ultrix mips cc rejects this. */
|
||||||
|
typedef int charset[2]; const charset x;
|
||||||
|
/* SunOS 4.1.1 cc rejects this. */
|
||||||
|
char const *const *ccp;
|
||||||
|
char **p;
|
||||||
|
/* NEC SVR4.0.2 mips cc rejects this. */
|
||||||
|
struct point {int x, y;};
|
||||||
|
static struct point const zero = {0,0};
|
||||||
|
/* AIX XL C 1.02.0.0 rejects this.
|
||||||
|
It does not let you subtract one const X* pointer from another in an arm
|
||||||
|
of an if-expression whose if-part is not a constant expression */
|
||||||
|
const char *g = "string";
|
||||||
|
ccp = &g + (g ? g-g : 0);
|
||||||
|
/* HPUX 7.0 cc rejects these. */
|
||||||
|
++ccp;
|
||||||
|
p = (char**) ccp;
|
||||||
|
ccp = (char const *const *) p;
|
||||||
|
{ /* SCO 3.2v4 cc rejects this. */
|
||||||
|
char *t;
|
||||||
|
char const *s = 0 ? (char *) 0 : (char const *) 0;
|
||||||
|
|
||||||
|
*t++ = 0;
|
||||||
|
}
|
||||||
|
{ /* Someone thinks the Sun supposedly-ANSI compiler will reject this. */
|
||||||
|
int x[] = {25, 17};
|
||||||
|
const int *foo = &x[0];
|
||||||
|
++foo;
|
||||||
|
}
|
||||||
|
{ /* Sun SC1.0 ANSI compiler rejects this -- but not the above. */
|
||||||
|
typedef const int *iptr;
|
||||||
|
iptr p = 0;
|
||||||
|
++p;
|
||||||
|
}
|
||||||
|
{ /* AIX XL C 1.02.0.0 rejects this saying
|
||||||
|
"k.c", line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */
|
||||||
|
struct s { int j; const int *ap[3]; };
|
||||||
|
struct s *b; b->j = 5;
|
||||||
|
}
|
||||||
|
{ /* ULTRIX-32 V3.1 (Rev 9) vcc rejects this */
|
||||||
|
const int foo = 10;
|
||||||
|
}
|
||||||
|
],[ac_cv_c_const=yes],[ac_cv_c_const=no])])
|
||||||
|
AC_MSG_RESULT([$ac_cv_c_const])
|
||||||
|
if test $ac_cv_c_const = no; then
|
||||||
|
AC_DEFINE([const],)
|
||||||
|
fi
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for inline])
|
||||||
|
ijg_cv_inline=""
|
||||||
|
AC_TRY_COMPILE(,[} __inline__ int foo() { return 0; }
|
||||||
|
int bar() { return foo();],[ijg_cv_inline="__inline__"],
|
||||||
|
[AC_TRY_COMPILE(,[} __inline int foo() { return 0; }
|
||||||
|
int bar() { return foo();],[ijg_cv_inline="__inline"],
|
||||||
|
[AC_TRY_COMPILE(,[} inline int foo() { return 0; }
|
||||||
|
int bar() { return foo();],[ijg_cv_inline="inline"],)])])
|
||||||
|
AC_MSG_RESULT([$ijg_cv_inline])
|
||||||
|
AC_DEFINE_UNQUOTED([INLINE],[$ijg_cv_inline])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for broken incomplete types])
|
||||||
|
AC_TRY_COMPILE([ typedef struct undefined_structure * undef_struct_ptr; ],
|
||||||
|
,[AC_MSG_RESULT(ok)],[AC_MSG_RESULT(broken)
|
||||||
|
AC_DEFINE([INCOMPLETE_TYPES_BROKEN],)])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([for short external names])
|
||||||
|
AC_TRY_LINK([
|
||||||
|
int possibly_duplicate_function () { return 0; }
|
||||||
|
int possibly_dupli_function () { return 1; }
|
||||||
|
],[ ],[AC_MSG_RESULT(ok)],[AC_MSG_RESULT(short)
|
||||||
|
AC_DEFINE([NEED_SHORT_EXTERNAL_NAMES],)])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([to see if char is signed])
|
||||||
|
AC_TRY_RUN([
|
||||||
|
#ifdef HAVE_PROTOTYPES
|
||||||
|
int is_char_signed (int arg)
|
||||||
|
#else
|
||||||
|
int is_char_signed (arg)
|
||||||
|
int arg;
|
||||||
|
#endif
|
||||||
|
{
|
||||||
|
if (arg == 189) { /* expected result for unsigned char */
|
||||||
|
return 0; /* type char is unsigned */
|
||||||
|
}
|
||||||
|
else if (arg != -67) { /* expected result for signed char */
|
||||||
|
printf("Hmm, it seems 'char' is not eight bits wide on your machine.\n");
|
||||||
|
printf("I fear the JPEG software will not work at all.\n\n");
|
||||||
|
}
|
||||||
|
return 1; /* assume char is signed otherwise */
|
||||||
|
}
|
||||||
|
char signed_char_check = (char) (-67);
|
||||||
|
main() {
|
||||||
|
exit(is_char_signed((int) signed_char_check));
|
||||||
|
}],[AC_MSG_RESULT(no)
|
||||||
|
AC_DEFINE([CHAR_IS_UNSIGNED],)],[AC_MSG_RESULT(yes)],
|
||||||
|
[echo Assuming that char is signed on target machine.
|
||||||
|
echo If it is unsigned, this will be a little bit inefficient.
|
||||||
|
])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([to see if right shift is signed])
|
||||||
|
AC_TRY_RUN([
|
||||||
|
#ifdef HAVE_PROTOTYPES
|
||||||
|
int is_shifting_signed (long arg)
|
||||||
|
#else
|
||||||
|
int is_shifting_signed (arg)
|
||||||
|
long arg;
|
||||||
|
#endif
|
||||||
|
/* See whether right-shift on a long is signed or not. */
|
||||||
|
{
|
||||||
|
long res = arg >> 4;
|
||||||
|
|
||||||
|
if (res == -0x7F7E80CL) { /* expected result for signed shift */
|
||||||
|
return 1; /* right shift is signed */
|
||||||
|
}
|
||||||
|
/* see if unsigned-shift hack will fix it. */
|
||||||
|
/* we can't just test exact value since it depends on width of long... */
|
||||||
|
res |= (~0L) << (32-4);
|
||||||
|
if (res == -0x7F7E80CL) { /* expected result now? */
|
||||||
|
return 0; /* right shift is unsigned */
|
||||||
|
}
|
||||||
|
printf("Right shift isn't acting as I expect it to.\n");
|
||||||
|
printf("I fear the JPEG software will not work at all.\n\n");
|
||||||
|
return 0; /* try it with unsigned anyway */
|
||||||
|
}
|
||||||
|
main() {
|
||||||
|
exit(is_shifting_signed(-0x7F7E80B1L));
|
||||||
|
}],[AC_MSG_RESULT(no)
|
||||||
|
AC_DEFINE([RIGHT_SHIFT_IS_UNSIGNED],)],[AC_MSG_RESULT(yes)],
|
||||||
|
[AC_MSG_RESULT([Assuming that right shift is signed on target machine.])])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_MSG_CHECKING([to see if fopen accepts b spec])
|
||||||
|
AC_TRY_RUN([
|
||||||
|
#include <stdio.h>
|
||||||
|
main() {
|
||||||
|
if (fopen("conftestdata", "wb") != NULL)
|
||||||
|
exit(0);
|
||||||
|
exit(1);
|
||||||
|
}],[AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)
|
||||||
|
AC_DEFINE([DONT_USE_B_MODE],)],[AC_MSG_RESULT([Assuming that it does.])])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_PROG_INSTALL
|
||||||
|
AC_PROG_RANLIB
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
|
||||||
|
AC_CANONICAL_HOST
|
||||||
|
AC_EXEEXT
|
||||||
|
|
||||||
|
# Decide whether to use libtool,
|
||||||
|
# and if so whether to build shared, static, or both flavors of library.
|
||||||
|
AC_DISABLE_SHARED
|
||||||
|
AC_DISABLE_STATIC
|
||||||
|
if test "x$enable_shared" != xno -o "x$enable_static" != xno; then
|
||||||
|
USELIBTOOL="yes"
|
||||||
|
# LIBTOOL="./libtool"
|
||||||
|
O="lo"
|
||||||
|
A="la"
|
||||||
|
LN='$(LIBTOOL) --mode=link $(CC)'
|
||||||
|
INSTALL_LIB='$(LIBTOOL) --mode=install ${INSTALL}'
|
||||||
|
INSTALL_PROGRAM="\$(LIBTOOL) --mode=install $INSTALL_PROGRAM"
|
||||||
|
UNINSTALL='$(LIBTOOL) --mode=uninstall $(RM)'
|
||||||
|
else
|
||||||
|
USELIBTOOL="no"
|
||||||
|
LIBTOOL=""
|
||||||
|
O="o"
|
||||||
|
A="a"
|
||||||
|
LN='$(CC)'
|
||||||
|
INSTALL_LIB="$INSTALL_DATA"
|
||||||
|
UNINSTALL='$(RM)'
|
||||||
|
fi
|
||||||
|
AC_SUBST([LIBTOOL])
|
||||||
|
AC_SUBST([O])
|
||||||
|
AC_SUBST([A])
|
||||||
|
AC_SUBST([LN])
|
||||||
|
AC_SUBST([INSTALL_LIB])
|
||||||
|
AC_SUBST([UNINSTALL])
|
||||||
|
|
||||||
|
# Configure libtool if needed.
|
||||||
|
if test $USELIBTOOL = yes; then
|
||||||
|
AC_LIBTOOL_DLOPEN
|
||||||
|
AC_LIBTOOL_WIN32_DLL
|
||||||
|
AC_PROG_LIBTOOL
|
||||||
|
fi
|
||||||
|
# if libtool >= 1.5
|
||||||
|
TAGCC=ifdef([AC_LIBTOOL_GCJ],[--tag=CC])
|
||||||
|
AC_SUBST([TAGCC])
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
# Select memory manager depending on user input.
|
||||||
|
# If no "-enable-maxmem", use jmemnobs
|
||||||
|
MEMORYMGR='jmemnobs.$(O)'
|
||||||
|
MAXMEM="no"
|
||||||
|
AC_ARG_ENABLE([maxmem],
|
||||||
|
[ --enable-maxmem[=N] enable use of temp files, set max mem usage to N MB],
|
||||||
|
[MAXMEM="$enableval"])
|
||||||
|
# support --with-maxmem for backwards compatibility with IJG V5.
|
||||||
|
AC_ARG_WITH([maxmem],,[MAXMEM="$withval"])
|
||||||
|
if test "x$MAXMEM" = xyes; then
|
||||||
|
MAXMEM=1
|
||||||
|
fi
|
||||||
|
if test "x$MAXMEM" != xno; then
|
||||||
|
if test -n "`echo $MAXMEM | sed 's/[[0-9]]//g'`"; then
|
||||||
|
AC_MSG_ERROR([non-numeric argument to --enable-maxmem])
|
||||||
|
fi
|
||||||
|
DEFAULTMAXMEM=`expr $MAXMEM \* 1048576`
|
||||||
|
AC_DEFINE_UNQUOTED([DEFAULT_MAX_MEM],[${DEFAULTMAXMEM}])
|
||||||
|
AC_MSG_CHECKING([for 'tmpfile()'])
|
||||||
|
AC_TRY_LINK([#include <stdio.h>],[ FILE * tfile = tmpfile(); ],
|
||||||
|
[AC_MSG_RESULT(yes)
|
||||||
|
MEMORYMGR='jmemansi.$(O)'],
|
||||||
|
[AC_MSG_RESULT(no)
|
||||||
|
MEMORYMGR='jmemname.$(O)'
|
||||||
|
AC_DEFINE([NEED_SIGNAL_CATCHER],)
|
||||||
|
AC_MSG_CHECKING([for 'mktemp()'])
|
||||||
|
AC_TRY_LINK(,[ char fname[80]; mktemp(fname); ],
|
||||||
|
[AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)
|
||||||
|
AC_DEFINE([NO_MKTEMP],)])])
|
||||||
|
fi
|
||||||
|
AC_SUBST([MEMORYMGR])
|
||||||
|
|
||||||
|
dnl ====================================================================
|
||||||
|
|
||||||
|
AC_MSG_CHECKING([to see if the host cpu type is i386 or compatible])
|
||||||
|
case "$host_cpu" in
|
||||||
|
i*86 | x86 | ia32)
|
||||||
|
AC_MSG_RESULT(yes)
|
||||||
|
;;
|
||||||
|
x86_64 | amd64 | aa64)
|
||||||
|
AC_MSG_RESULT([no (x86_64)])
|
||||||
|
AC_MSG_ERROR([Currently, this version of JPEG library cannot be compiled as 64-bit code. sorry.])
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
AC_MSG_RESULT([no ("$host_cpu")])
|
||||||
|
AC_MSG_ERROR([This version of JPEG library is for i386 or compatible processors only.])
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
if test -z "$NAFLAGS" ; then
|
||||||
|
AC_MSG_CHECKING([for object file format of host system])
|
||||||
|
case "$host_os" in
|
||||||
|
cygwin* | mingw* | pw32* | interix*)
|
||||||
|
objfmt='Win32-COFF'
|
||||||
|
;;
|
||||||
|
msdosdjgpp* | go32*)
|
||||||
|
objfmt='COFF'
|
||||||
|
;;
|
||||||
|
os2-emx*) # not tested
|
||||||
|
objfmt='MSOMF' # obj
|
||||||
|
;;
|
||||||
|
linux*coff* | linux*oldld*)
|
||||||
|
objfmt='COFF' # ???
|
||||||
|
;;
|
||||||
|
linux*aout*)
|
||||||
|
objfmt='a.out'
|
||||||
|
;;
|
||||||
|
linux*)
|
||||||
|
objfmt='ELF'
|
||||||
|
;;
|
||||||
|
freebsd* | netbsd* | openbsd*)
|
||||||
|
if echo __ELF__ | $CC -E - | grep __ELF__ > /dev/null; then
|
||||||
|
objfmt='BSD-a.out'
|
||||||
|
else
|
||||||
|
objfmt='ELF'
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
solaris* | sunos* | sysv* | sco*)
|
||||||
|
objfmt='ELF'
|
||||||
|
;;
|
||||||
|
darwin* | rhapsody* | nextstep* | openstep* | macos*)
|
||||||
|
objfmt='Mach-O'
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
objfmt='ELF ?'
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
AC_MSG_RESULT([$objfmt])
|
||||||
|
if test "$objfmt" = 'ELF ?'; then
|
||||||
|
objfmt='ELF'
|
||||||
|
AC_MSG_WARN([unexpected host system. assumed that the format is $objfmt.])
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
objfmt=''
|
||||||
|
fi
|
||||||
|
AC_MSG_CHECKING([for object file format specifier (NAFLAGS) ])
|
||||||
|
case "$objfmt" in
|
||||||
|
MSOMF) NAFLAGS='-fobj -DOBJ32';;
|
||||||
|
Win32-COFF) NAFLAGS='-fwin32 -DWIN32';;
|
||||||
|
COFF) NAFLAGS='-fcoff -DCOFF';;
|
||||||
|
a.out) NAFLAGS='-faout -DAOUT';;
|
||||||
|
BSD-a.out) NAFLAGS='-faoutb -DAOUT';;
|
||||||
|
ELF) NAFLAGS='-felf -DELF';;
|
||||||
|
RDF) NAFLAGS='-frdf -DRDF';;
|
||||||
|
Mach-O) NAFLAGS='-fmacho -DMACHO';;
|
||||||
|
esac
|
||||||
|
AC_MSG_RESULT([$NAFLAGS])
|
||||||
|
AC_SUBST([NAFLAGS])
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
|
||||||
|
AC_CHECK_PROGS(NASM, [nasm nasmw])
|
||||||
|
test -z "$NASM" && AC_MSG_ERROR([no nasm (Netwide Assembler) found in \$PATH])
|
||||||
|
if echo "$NASM" | grep yasm > /dev/null; then
|
||||||
|
AC_MSG_WARN([DON'T USE YASM! CURRENT VERSION (R0.4.0) IS BUGGY!])
|
||||||
|
fi
|
||||||
|
|
||||||
|
AC_MSG_CHECKING([whether the assembler ($NASM $NAFLAGS) works])
|
||||||
|
cat > conftest.asm <<EOF
|
||||||
|
[%line __oline__ "configure"
|
||||||
|
section .text
|
||||||
|
bits 32
|
||||||
|
global _main,main
|
||||||
|
_main:
|
||||||
|
main: xor eax,eax
|
||||||
|
ret
|
||||||
|
]EOF
|
||||||
|
try_nasm='$NASM $NAFLAGS -o conftest.o conftest.asm'
|
||||||
|
if AC_TRY_EVAL(try_nasm) && test -s conftest.o; then
|
||||||
|
AC_MSG_RESULT(yes)
|
||||||
|
else
|
||||||
|
echo "configure: failed program was:" >&AC_FD_CC
|
||||||
|
cat conftest.asm >&AC_FD_CC
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(no)
|
||||||
|
AC_MSG_ERROR([installation or configuration problem: assembler cannot create object files.])
|
||||||
|
fi
|
||||||
|
AC_MSG_CHECKING([whether the linker accepts assembler output])
|
||||||
|
try_nasm='${CC-cc} -o conftest${ac_exeext} $LDFLAGS conftest.o $LIBS 1>&AC_FD_CC'
|
||||||
|
if AC_TRY_EVAL(try_nasm) && test -s conftest${ac_exeext}; then
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(yes)
|
||||||
|
else
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(no)
|
||||||
|
AC_MSG_ERROR([configuration problem: maybe object file format mismatch.])
|
||||||
|
fi
|
||||||
|
|
||||||
|
AC_MSG_CHECKING([whether the assembler supports line continuation character])
|
||||||
|
cat > conftest.asm <<\EOF
|
||||||
|
[%line __oline__ "configure"
|
||||||
|
; The line continuation character '\'
|
||||||
|
; was introduced in nasm 0.98.25.
|
||||||
|
section .text
|
||||||
|
bits 32
|
||||||
|
global _zero
|
||||||
|
_zero: xor \
|
||||||
|
eax,eax
|
||||||
|
ret
|
||||||
|
]EOF
|
||||||
|
try_nasm='$NASM $NAFLAGS -o conftest.o conftest.asm'
|
||||||
|
if AC_TRY_EVAL(try_nasm) && test -s conftest.o; then
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(yes)
|
||||||
|
else
|
||||||
|
echo "configure: failed program was:" >&AC_FD_CC
|
||||||
|
cat conftest.asm >&AC_FD_CC
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(no)
|
||||||
|
AC_MSG_ERROR([you have to use a more recent version of the assembler.])
|
||||||
|
fi
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
|
||||||
|
AC_MSG_CHECKING([SIMD instruction sets requested to use])
|
||||||
|
simd_to_use=""
|
||||||
|
|
||||||
|
AC_ARG_ENABLE(mmx,
|
||||||
|
[ --disable-mmx do not use MMX instruction set],
|
||||||
|
[if test "x$enableval" = xno; then
|
||||||
|
AC_DEFINE([JSIMD_MMX_NOT_SUPPORTED],)
|
||||||
|
else
|
||||||
|
simd_to_use="$simd_to_use MMX"
|
||||||
|
fi], [simd_to_use="$simd_to_use MMX"])
|
||||||
|
|
||||||
|
AC_ARG_ENABLE(3dnow,
|
||||||
|
[ --disable-3dnow do not use 3DNow! instruction set],
|
||||||
|
[if test "x$enableval" = xno; then
|
||||||
|
AC_DEFINE([JSIMD_3DNOW_NOT_SUPPORTED],)
|
||||||
|
else
|
||||||
|
simd_to_use="$simd_to_use 3DNow!"
|
||||||
|
fi], [simd_to_use="$simd_to_use 3DNow!"])
|
||||||
|
|
||||||
|
AC_ARG_ENABLE(sse,
|
||||||
|
[ --disable-sse do not use SSE instruction set],
|
||||||
|
[if test "x$enableval" = xno; then
|
||||||
|
AC_DEFINE([JSIMD_SSE_NOT_SUPPORTED],)
|
||||||
|
else
|
||||||
|
simd_to_use="$simd_to_use SSE"
|
||||||
|
fi], [simd_to_use="$simd_to_use SSE"])
|
||||||
|
|
||||||
|
AC_ARG_ENABLE(sse2,
|
||||||
|
[ --disable-sse2 do not use SSE2 instruction set],
|
||||||
|
[if test "x$enableval" = xno; then
|
||||||
|
AC_DEFINE([JSIMD_SSE2_NOT_SUPPORTED],)
|
||||||
|
else
|
||||||
|
simd_to_use="$simd_to_use SSE2"
|
||||||
|
fi], [simd_to_use="$simd_to_use SSE2"])
|
||||||
|
|
||||||
|
test -z "$simd_to_use" && simd_to_use="NONE"
|
||||||
|
AC_MSG_RESULT([$simd_to_use])
|
||||||
|
|
||||||
|
for simd_name in $simd_to_use; do
|
||||||
|
case "$simd_name" in
|
||||||
|
MMX) simd_instruction='psubw mm0,mm0';;
|
||||||
|
3DNow!) simd_instruction='pfsub mm0,mm0';;
|
||||||
|
SSE) simd_instruction='subps xmm0,xmm0';;
|
||||||
|
SSE2) simd_instruction='subpd xmm0,xmm0';;
|
||||||
|
*) continue;;
|
||||||
|
esac
|
||||||
|
AC_MSG_CHECKING([whether the assembler supports $simd_name instructions])
|
||||||
|
cat > conftest.asm <<EOF
|
||||||
|
[%line __oline__ "configure"
|
||||||
|
section .text
|
||||||
|
bits 32
|
||||||
|
global _simd
|
||||||
|
_simd: $simd_instruction
|
||||||
|
ret
|
||||||
|
]EOF
|
||||||
|
try_nasm='$NASM $NAFLAGS -o conftest.o conftest.asm'
|
||||||
|
if AC_TRY_EVAL(try_nasm) && test -s conftest.o; then
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(yes)
|
||||||
|
else
|
||||||
|
echo "configure: failed program was:" >&AC_FD_CC
|
||||||
|
cat conftest.asm >&AC_FD_CC
|
||||||
|
rm -rf conftest*
|
||||||
|
AC_MSG_RESULT(no)
|
||||||
|
AC_MSG_ERROR([you have to use a more recent version of the assembler.])
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
# Select OS-dependent SIMD instruction support checker.
|
||||||
|
# jsimdw32.$(O) (Win32) / jsimddjg.$(O) (DJGPP V.2) / jsimdgcc.$(O) (Unix/gcc)
|
||||||
|
if test "x$SIMDCHECKER" = x ; then
|
||||||
|
case "$host_os" in
|
||||||
|
cygwin* | mingw* | pw32* | interix*)
|
||||||
|
SIMDCHECKER='jsimdw32.$(O)'
|
||||||
|
;;
|
||||||
|
msdosdjgpp* | go32*)
|
||||||
|
SIMDCHECKER='jsimddjg.$(O)'
|
||||||
|
;;
|
||||||
|
os2-emx*) # not tested
|
||||||
|
SIMDCHECKER='jsimdgcc.$(O)'
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
SIMDCHECKER='jsimdgcc.$(O)'
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
fi
|
||||||
|
AC_SUBST([SIMDCHECKER])
|
||||||
|
|
||||||
|
case "$host_os" in
|
||||||
|
cygwin* | mingw* | pw32* | os2-emx* | msdosdjgpp* | go32*)
|
||||||
|
AC_DEFINE([USE_SETMODE],)
|
||||||
|
;;
|
||||||
|
# _host_name_*)
|
||||||
|
# AC_DEFINE([USE_FDOPEN],)
|
||||||
|
# ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# This is for UNIX-like environments on Windows platform.
|
||||||
|
AC_ARG_ENABLE(uchar-boolean,
|
||||||
|
[ --enable-uchar-boolean define type \"boolean\" as unsigned char (for Windows)],
|
||||||
|
[if test "x$enableval" != xno; then
|
||||||
|
AC_DEFINE([TYPEDEF_UCHAR_BOOLEAN],)
|
||||||
|
fi])
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
|
||||||
|
JPEG_LIB_VERSION="63:0:1"
|
||||||
|
confv_dirs="$srcdir $srcdir/.. $srcdir/../.."
|
||||||
|
config_ver=
|
||||||
|
for ac_dir in $confv_dirs; do
|
||||||
|
if test -r $ac_dir/config.ver; then
|
||||||
|
config_ver=$ac_dir/config.ver
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if test -z "$config_ver"; then
|
||||||
|
AC_MSG_WARN([cannot find config.ver in $confv_dirs])
|
||||||
|
AC_MSG_WARN([default version number $JPEG_LIB_VERSION is used])
|
||||||
|
AC_MSG_CHECKING([libjpeg version number for libtool])
|
||||||
|
AC_MSG_RESULT([$JPEG_LIB_VERSION])
|
||||||
|
else
|
||||||
|
AC_MSG_CHECKING([libjpeg version number for libtool])
|
||||||
|
. $config_ver
|
||||||
|
AC_MSG_RESULT([$JPEG_LIB_VERSION])
|
||||||
|
echo "configure: if you want to change the version number, modify $config_ver" 1>&2
|
||||||
|
fi
|
||||||
|
AC_SUBST([JPEG_LIB_VERSION])
|
||||||
|
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
# Prepare to massage makefile.cfg correctly.
|
||||||
|
if test $ijg_cv_have_prototypes = yes; then
|
||||||
|
A2K_DEPS=""
|
||||||
|
COM_A2K="# "
|
||||||
|
else
|
||||||
|
A2K_DEPS="ansi2knr"
|
||||||
|
COM_A2K=""
|
||||||
|
fi
|
||||||
|
AC_SUBST([A2K_DEPS])
|
||||||
|
AC_SUBST([COM_A2K])
|
||||||
|
# ansi2knr needs -DBSD if string.h is missing
|
||||||
|
if test $ac_cv_header_string_h = no; then
|
||||||
|
ANSI2KNRFLAGS="-DBSD"
|
||||||
|
else
|
||||||
|
ANSI2KNRFLAGS=""
|
||||||
|
fi
|
||||||
|
AC_SUBST([ANSI2KNRFLAGS])
|
||||||
|
# Substitutions to enable or disable libtool-related stuff
|
||||||
|
if test $USELIBTOOL = yes -a $ijg_cv_have_prototypes = yes; then
|
||||||
|
COM_LT=""
|
||||||
|
else
|
||||||
|
COM_LT="# "
|
||||||
|
fi
|
||||||
|
AC_SUBST([COM_LT])
|
||||||
|
if test "x$enable_shared" != xno; then
|
||||||
|
FORCE_INSTALL_LIB="install-lib"
|
||||||
|
UNINSTALL_LIB="uninstall-lib"
|
||||||
|
else
|
||||||
|
FORCE_INSTALL_LIB=""
|
||||||
|
UNINSTALL_LIB=""
|
||||||
|
fi
|
||||||
|
AC_SUBST([FORCE_INSTALL_LIB])
|
||||||
|
AC_SUBST([UNINSTALL_LIB])
|
||||||
|
# Set up -I directives
|
||||||
|
if test "x$srcdir" = x.; then
|
||||||
|
INCLUDEFLAGS='-I$(srcdir)'
|
||||||
|
else
|
||||||
|
INCLUDEFLAGS='-I. -I$(srcdir)'
|
||||||
|
fi
|
||||||
|
AC_SUBST([INCLUDEFLAGS])
|
||||||
|
dnl --------------------------------------------------------------------
|
||||||
|
AC_OUTPUT([Makefile:makefile.cfg])
|
||||||
19
djpeg.1
19
djpeg.1
@@ -1,4 +1,4 @@
|
|||||||
.TH DJPEG 1 "15 June 1995"
|
.TH DJPEG 1 "22 August 1997"
|
||||||
.SH NAME
|
.SH NAME
|
||||||
djpeg \- decompress a JPEG file to an image file
|
djpeg \- decompress a JPEG file to an image file
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
@@ -26,9 +26,9 @@ or
|
|||||||
.BR \-gr .
|
.BR \-gr .
|
||||||
Most of the "basic" switches can be abbreviated to as little as one letter.
|
Most of the "basic" switches can be abbreviated to as little as one letter.
|
||||||
Upper and lower case are equivalent (thus
|
Upper and lower case are equivalent (thus
|
||||||
.B \-GIF
|
.B \-BMP
|
||||||
is the same as
|
is the same as
|
||||||
.BR \-gif ).
|
.BR \-bmp ).
|
||||||
British spellings are also accepted (e.g.,
|
British spellings are also accepted (e.g.,
|
||||||
.BR \-greyscale ),
|
.BR \-greyscale ),
|
||||||
though for brevity these are not mentioned below.
|
though for brevity these are not mentioned below.
|
||||||
@@ -182,13 +182,13 @@ Same as
|
|||||||
.BR \-verbose .
|
.BR \-verbose .
|
||||||
.SH EXAMPLES
|
.SH EXAMPLES
|
||||||
.LP
|
.LP
|
||||||
This example decompresses the JPEG file foo.jpg, automatically quantizes to
|
This example decompresses the JPEG file foo.jpg, quantizes it to
|
||||||
256 colors, and saves the output in GIF format in foo.gif:
|
256 colors, and saves the output in 8-bit BMP format in foo.bmp:
|
||||||
.IP
|
.IP
|
||||||
.B djpeg \-gif
|
.B djpeg \-colors 256 \-bmp
|
||||||
.I foo.jpg
|
.I foo.jpg
|
||||||
.B >
|
.B >
|
||||||
.I foo.gif
|
.I foo.bmp
|
||||||
.SH HINTS
|
.SH HINTS
|
||||||
To get a quick preview of an image, use the
|
To get a quick preview of an image, use the
|
||||||
.B \-grayscale
|
.B \-grayscale
|
||||||
@@ -245,4 +245,9 @@ Independent JPEG Group
|
|||||||
.SH BUGS
|
.SH BUGS
|
||||||
Arithmetic coding is not supported for legal reasons.
|
Arithmetic coding is not supported for legal reasons.
|
||||||
.PP
|
.PP
|
||||||
|
To avoid the Unisys LZW patent,
|
||||||
|
.B djpeg
|
||||||
|
produces uncompressed GIF files. These are larger than they should be, but
|
||||||
|
are readable by standard GIF decoders.
|
||||||
|
.PP
|
||||||
Still not as fast as we'd like.
|
Still not as fast as we'd like.
|
||||||
|
|||||||
94
djpeg.c
94
djpeg.c
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* djpeg.c
|
* djpeg.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : August 23, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains a command-line user interface for the JPEG decompressor.
|
* This file contains a command-line user interface for the JPEG decompressor.
|
||||||
* It should work on any system with Unix- or MS-DOS-style command lines.
|
* It should work on any system with Unix- or MS-DOS-style command lines.
|
||||||
*
|
*
|
||||||
@@ -158,6 +165,22 @@ usage (void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
print_simd_info (FILE * file, char * labelstr, unsigned int simd)
|
||||||
|
{
|
||||||
|
fprintf(file, "%s%s%s%s%s%s\n", labelstr,
|
||||||
|
simd & JSIMD_MMX ? " MMX" : "",
|
||||||
|
simd & JSIMD_3DNOW ? " 3DNow!" : "",
|
||||||
|
simd & JSIMD_SSE ? " SSE" : "",
|
||||||
|
simd & JSIMD_SSE2 ? " SSE2" : "",
|
||||||
|
simd == JSIMD_NONE ? " NONE" : "");
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
LOCAL(int)
|
LOCAL(int)
|
||||||
parse_switches (j_decompress_ptr cinfo, int argc, char **argv,
|
parse_switches (j_decompress_ptr cinfo, int argc, char **argv,
|
||||||
int last_file_arg_seen, boolean for_real)
|
int last_file_arg_seen, boolean for_real)
|
||||||
@@ -208,6 +231,19 @@ parse_switches (j_decompress_ptr cinfo, int argc, char **argv,
|
|||||||
cinfo->desired_number_of_colors = val;
|
cinfo->desired_number_of_colors = val;
|
||||||
cinfo->quantize_colors = TRUE;
|
cinfo->quantize_colors = TRUE;
|
||||||
|
|
||||||
|
#ifndef JSIMD_MASKFUNC_NOT_SUPPORTED
|
||||||
|
} else if (keymatch(arg, "nosimd" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_ALL);
|
||||||
|
} else if (keymatch(arg, "nommx" , 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_MMX);
|
||||||
|
} else if (keymatch(arg, "no3dnow", 3)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_3DNOW);
|
||||||
|
} else if (keymatch(arg, "nosse" , 4)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE);
|
||||||
|
} else if (keymatch(arg, "nosse2" , 6)) {
|
||||||
|
jpeg_simd_mask((j_common_ptr) cinfo, JSIMD_NONE, JSIMD_SSE2);
|
||||||
|
#endif /* !JSIMD_MASKFUNC_NOT_SUPPORTED */
|
||||||
|
|
||||||
} else if (keymatch(arg, "dct", 2)) {
|
} else if (keymatch(arg, "dct", 2)) {
|
||||||
/* Select IDCT algorithm. */
|
/* Select IDCT algorithm. */
|
||||||
if (++argn >= argc) /* advance to next argument */
|
if (++argn >= argc) /* advance to next argument */
|
||||||
@@ -242,6 +278,38 @@ parse_switches (j_decompress_ptr cinfo, int argc, char **argv,
|
|||||||
if (! printed_version) {
|
if (! printed_version) {
|
||||||
fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",
|
fprintf(stderr, "Independent JPEG Group's DJPEG, version %s\n%s\n",
|
||||||
JVERSION, JCOPYRIGHT);
|
JVERSION, JCOPYRIGHT);
|
||||||
|
fprintf(stderr,
|
||||||
|
"\nx86 SIMD extension for IJG JPEG library, version %s\n\n",
|
||||||
|
JPEG_SIMDEXT_VER_STR);
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "SIMD instructions supported by the system :",
|
||||||
|
jpeg_simd_support(NULL));
|
||||||
|
|
||||||
|
fprintf(stderr, "\n === SIMD Operation Modes ===\n");
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Accurate integer DCT (-dct int) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_ISLOW));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Fast integer DCT (-dct fast) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_IFAST));
|
||||||
|
#endif
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Floating-point DCT (-dct float) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_FLOAT));
|
||||||
|
#endif
|
||||||
|
#ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
print_simd_info(stderr, "Reduced-size DCT (-scale M/N) :",
|
||||||
|
jpeg_simd_inverse_dct(cinfo, JDCT_FLOAT+1));
|
||||||
|
#endif
|
||||||
|
print_simd_info(stderr, "High-quality upsampling (default) :",
|
||||||
|
jpeg_simd_upsampler(cinfo, TRUE));
|
||||||
|
print_simd_info(stderr, "Low-quality upsampling (-nosmooth) :",
|
||||||
|
jpeg_simd_upsampler(cinfo, FALSE));
|
||||||
|
print_simd_info(stderr, "Colorspace conversion (YCbCr->RGB) :",
|
||||||
|
jpeg_simd_color_deconverter(cinfo));
|
||||||
|
fprintf(stderr, "\n");
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
printed_version = TRUE;
|
printed_version = TRUE;
|
||||||
}
|
}
|
||||||
cinfo->err->trace_level++;
|
cinfo->err->trace_level++;
|
||||||
@@ -344,9 +412,9 @@ parse_switches (j_decompress_ptr cinfo, int argc, char **argv,
|
|||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Marker processor for COM markers.
|
* Marker processor for COM and interesting APPn markers.
|
||||||
* This replaces the library's built-in processor, which just skips the marker.
|
* This replaces the library's built-in processor, which just skips the marker.
|
||||||
* We want to print out the marker as text, if possible.
|
* We want to print out the marker as text, to the extent possible.
|
||||||
* Note this code relies on a non-suspending data source.
|
* Note this code relies on a non-suspending data source.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
@@ -366,7 +434,7 @@ jpeg_getc (j_decompress_ptr cinfo)
|
|||||||
|
|
||||||
|
|
||||||
METHODDEF(boolean)
|
METHODDEF(boolean)
|
||||||
COM_handler (j_decompress_ptr cinfo)
|
print_text_marker (j_decompress_ptr cinfo)
|
||||||
{
|
{
|
||||||
boolean traceit = (cinfo->err->trace_level >= 1);
|
boolean traceit = (cinfo->err->trace_level >= 1);
|
||||||
INT32 length;
|
INT32 length;
|
||||||
@@ -377,8 +445,13 @@ COM_handler (j_decompress_ptr cinfo)
|
|||||||
length += jpeg_getc(cinfo);
|
length += jpeg_getc(cinfo);
|
||||||
length -= 2; /* discount the length word itself */
|
length -= 2; /* discount the length word itself */
|
||||||
|
|
||||||
if (traceit)
|
if (traceit) {
|
||||||
|
if (cinfo->unread_marker == JPEG_COM)
|
||||||
fprintf(stderr, "Comment, length %ld:\n", (long) length);
|
fprintf(stderr, "Comment, length %ld:\n", (long) length);
|
||||||
|
else /* assume it is an APPn otherwise */
|
||||||
|
fprintf(stderr, "APP%d, length %ld:\n",
|
||||||
|
cinfo->unread_marker - JPEG_APP0, (long) length);
|
||||||
|
}
|
||||||
|
|
||||||
while (--length >= 0) {
|
while (--length >= 0) {
|
||||||
ch = jpeg_getc(cinfo);
|
ch = jpeg_getc(cinfo);
|
||||||
@@ -445,8 +518,15 @@ main (int argc, char **argv)
|
|||||||
jerr.addon_message_table = cdjpeg_message_table;
|
jerr.addon_message_table = cdjpeg_message_table;
|
||||||
jerr.first_addon_message = JMSG_FIRSTADDONCODE;
|
jerr.first_addon_message = JMSG_FIRSTADDONCODE;
|
||||||
jerr.last_addon_message = JMSG_LASTADDONCODE;
|
jerr.last_addon_message = JMSG_LASTADDONCODE;
|
||||||
/* Insert custom COM marker processor. */
|
|
||||||
jpeg_set_marker_processor(&cinfo, JPEG_COM, COM_handler);
|
/* Insert custom marker processor for COM and APP12.
|
||||||
|
* APP12 is used by some digital camera makers for textual info,
|
||||||
|
* so we provide the ability to display it as text.
|
||||||
|
* If you like, additional APPn marker types can be selected for display,
|
||||||
|
* but don't try to override APP0 or APP14 this way (see libjpeg.doc).
|
||||||
|
*/
|
||||||
|
jpeg_set_marker_processor(&cinfo, JPEG_COM, print_text_marker);
|
||||||
|
jpeg_set_marker_processor(&cinfo, JPEG_APP0+12, print_text_marker);
|
||||||
|
|
||||||
/* Now safe to enable signal catcher. */
|
/* Now safe to enable signal catcher. */
|
||||||
#ifdef NEED_SIGNAL_CATCHER
|
#ifdef NEED_SIGNAL_CATCHER
|
||||||
|
|||||||
21
filelist.doc
21
filelist.doc
@@ -1,6 +1,6 @@
|
|||||||
IJG JPEG LIBRARY: FILE LIST
|
IJG JPEG LIBRARY: FILE LIST
|
||||||
|
|
||||||
Copyright (C) 1994-1996, Thomas G. Lane.
|
Copyright (C) 1994-1998, Thomas G. Lane.
|
||||||
This file is part of the Independent JPEG Group's software.
|
This file is part of the Independent JPEG Group's software.
|
||||||
For conditions of distribution and use, see the accompanying README file.
|
For conditions of distribution and use, see the accompanying README file.
|
||||||
|
|
||||||
@@ -113,8 +113,8 @@ module:
|
|||||||
jmemnobs.c "No backing store": assumes adequate virtual memory exists.
|
jmemnobs.c "No backing store": assumes adequate virtual memory exists.
|
||||||
jmemansi.c Makes temporary files with ANSI-standard routine tmpfile().
|
jmemansi.c Makes temporary files with ANSI-standard routine tmpfile().
|
||||||
jmemname.c Makes temporary files with program-generated file names.
|
jmemname.c Makes temporary files with program-generated file names.
|
||||||
jmemdos.c Custom implementation for MS-DOS: knows about extended and
|
jmemdos.c Custom implementation for MS-DOS (16-bit environment only):
|
||||||
expanded memory as well as temporary files.
|
can use extended and expanded memory as well as temp files.
|
||||||
jmemmac.c Custom implementation for Apple Macintosh.
|
jmemmac.c Custom implementation for Apple Macintosh.
|
||||||
|
|
||||||
Exactly one of the system-dependent modules should be configured into an
|
Exactly one of the system-dependent modules should be configured into an
|
||||||
@@ -134,8 +134,9 @@ CJPEG/DJPEG/JPEGTRAN
|
|||||||
|
|
||||||
Include files:
|
Include files:
|
||||||
|
|
||||||
cdjpeg.h Declarations shared by cjpeg/djpeg modules.
|
cdjpeg.h Declarations shared by cjpeg/djpeg/jpegtran modules.
|
||||||
cderror.h Additional error and trace message codes for cjpeg/djpeg.
|
cderror.h Additional error and trace message codes for cjpeg et al.
|
||||||
|
transupp.h Declarations for jpegtran support routines in transupp.c.
|
||||||
|
|
||||||
C source code files:
|
C source code files:
|
||||||
|
|
||||||
@@ -146,11 +147,12 @@ cdjpeg.c Utility routines used by all three programs.
|
|||||||
rdcolmap.c Code to read a colormap file for djpeg's "-map" switch.
|
rdcolmap.c Code to read a colormap file for djpeg's "-map" switch.
|
||||||
rdswitch.c Code to process some of cjpeg's more complex switches.
|
rdswitch.c Code to process some of cjpeg's more complex switches.
|
||||||
Also used by jpegtran.
|
Also used by jpegtran.
|
||||||
|
transupp.c Support code for jpegtran: lossless image manipulations.
|
||||||
|
|
||||||
Image file reader modules for cjpeg:
|
Image file reader modules for cjpeg:
|
||||||
|
|
||||||
rdbmp.c BMP file input.
|
rdbmp.c BMP file input.
|
||||||
rdgif.c GIF file input.
|
rdgif.c GIF file input (now just a stub).
|
||||||
rdppm.c PPM/PGM file input.
|
rdppm.c PPM/PGM file input.
|
||||||
rdrle.c Utah RLE file input.
|
rdrle.c Utah RLE file input.
|
||||||
rdtarga.c Targa file input.
|
rdtarga.c Targa file input.
|
||||||
@@ -158,7 +160,7 @@ rdtarga.c Targa file input.
|
|||||||
Image file writer modules for djpeg:
|
Image file writer modules for djpeg:
|
||||||
|
|
||||||
wrbmp.c BMP file output.
|
wrbmp.c BMP file output.
|
||||||
wrgif.c GIF file output.
|
wrgif.c GIF file output (a mere shadow of its former self).
|
||||||
wrppm.c PPM/PGM file output.
|
wrppm.c PPM/PGM file output.
|
||||||
wrrle.c Utah RLE file output.
|
wrrle.c Utah RLE file output.
|
||||||
wrtarga.c Targa file output.
|
wrtarga.c Targa file output.
|
||||||
@@ -190,6 +192,11 @@ example.c Sample code for calling JPEG library.
|
|||||||
Configuration/installation files and programs (see install.doc for more info):
|
Configuration/installation files and programs (see install.doc for more info):
|
||||||
|
|
||||||
configure Unix shell script to perform automatic configuration.
|
configure Unix shell script to perform automatic configuration.
|
||||||
|
ltconfig Support scripts for configure (from GNU libtool).
|
||||||
|
ltmain.sh
|
||||||
|
config.guess
|
||||||
|
config.sub
|
||||||
|
install-sh Install shell script for those Unix systems lacking one.
|
||||||
ckconfig.c Program to generate jconfig.h on non-Unix systems.
|
ckconfig.c Program to generate jconfig.h on non-Unix systems.
|
||||||
jconfig.doc Template for making jconfig.h by hand.
|
jconfig.doc Template for making jconfig.h by hand.
|
||||||
makefile.* Sample makefiles for particular systems.
|
makefile.* Sample makefiles for particular systems.
|
||||||
|
|||||||
323
install-sh
Executable file
323
install-sh
Executable file
@@ -0,0 +1,323 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
# install - install a program, script, or datafile
|
||||||
|
|
||||||
|
scriptversion=2005-05-14.22
|
||||||
|
|
||||||
|
# This originates from X11R5 (mit/util/scripts/install.sh), which was
|
||||||
|
# later released in X11R6 (xc/config/util/install.sh) with the
|
||||||
|
# following copyright and license.
|
||||||
|
#
|
||||||
|
# Copyright (C) 1994 X Consortium
|
||||||
|
#
|
||||||
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
# of this software and associated documentation files (the "Software"), to
|
||||||
|
# deal in the Software without restriction, including without limitation the
|
||||||
|
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||||
|
# sell copies of the Software, and to permit persons to whom the Software is
|
||||||
|
# furnished to do so, subject to the following conditions:
|
||||||
|
#
|
||||||
|
# The above copyright notice and this permission notice shall be included in
|
||||||
|
# all copies or substantial portions of the Software.
|
||||||
|
#
|
||||||
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
# X CONSORTIUM BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
|
||||||
|
# AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNEC-
|
||||||
|
# TION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
||||||
|
#
|
||||||
|
# Except as contained in this notice, the name of the X Consortium shall not
|
||||||
|
# be used in advertising or otherwise to promote the sale, use or other deal-
|
||||||
|
# ings in this Software without prior written authorization from the X Consor-
|
||||||
|
# tium.
|
||||||
|
#
|
||||||
|
#
|
||||||
|
# FSF changes to this file are in the public domain.
|
||||||
|
#
|
||||||
|
# Calling this script install-sh is preferred over install.sh, to prevent
|
||||||
|
# `make' implicit rules from creating a file called install from it
|
||||||
|
# when there is no Makefile.
|
||||||
|
#
|
||||||
|
# This script is compatible with the BSD install script, but was written
|
||||||
|
# from scratch. It can only install one file at a time, a restriction
|
||||||
|
# shared with many OS's install programs.
|
||||||
|
|
||||||
|
# set DOITPROG to echo to test this script
|
||||||
|
|
||||||
|
# Don't use :- since 4.3BSD and earlier shells don't like it.
|
||||||
|
doit="${DOITPROG-}"
|
||||||
|
|
||||||
|
# put in absolute paths if you don't have them in your path; or use env. vars.
|
||||||
|
|
||||||
|
mvprog="${MVPROG-mv}"
|
||||||
|
cpprog="${CPPROG-cp}"
|
||||||
|
chmodprog="${CHMODPROG-chmod}"
|
||||||
|
chownprog="${CHOWNPROG-chown}"
|
||||||
|
chgrpprog="${CHGRPPROG-chgrp}"
|
||||||
|
stripprog="${STRIPPROG-strip}"
|
||||||
|
rmprog="${RMPROG-rm}"
|
||||||
|
mkdirprog="${MKDIRPROG-mkdir}"
|
||||||
|
|
||||||
|
chmodcmd="$chmodprog 0755"
|
||||||
|
chowncmd=
|
||||||
|
chgrpcmd=
|
||||||
|
stripcmd=
|
||||||
|
rmcmd="$rmprog -f"
|
||||||
|
mvcmd="$mvprog"
|
||||||
|
src=
|
||||||
|
dst=
|
||||||
|
dir_arg=
|
||||||
|
dstarg=
|
||||||
|
no_target_directory=
|
||||||
|
|
||||||
|
usage="Usage: $0 [OPTION]... [-T] SRCFILE DSTFILE
|
||||||
|
or: $0 [OPTION]... SRCFILES... DIRECTORY
|
||||||
|
or: $0 [OPTION]... -t DIRECTORY SRCFILES...
|
||||||
|
or: $0 [OPTION]... -d DIRECTORIES...
|
||||||
|
|
||||||
|
In the 1st form, copy SRCFILE to DSTFILE.
|
||||||
|
In the 2nd and 3rd, copy all SRCFILES to DIRECTORY.
|
||||||
|
In the 4th, create DIRECTORIES.
|
||||||
|
|
||||||
|
Options:
|
||||||
|
-c (ignored)
|
||||||
|
-d create directories instead of installing files.
|
||||||
|
-g GROUP $chgrpprog installed files to GROUP.
|
||||||
|
-m MODE $chmodprog installed files to MODE.
|
||||||
|
-o USER $chownprog installed files to USER.
|
||||||
|
-s $stripprog installed files.
|
||||||
|
-t DIRECTORY install into DIRECTORY.
|
||||||
|
-T report an error if DSTFILE is a directory.
|
||||||
|
--help display this help and exit.
|
||||||
|
--version display version info and exit.
|
||||||
|
|
||||||
|
Environment variables override the default commands:
|
||||||
|
CHGRPPROG CHMODPROG CHOWNPROG CPPROG MKDIRPROG MVPROG RMPROG STRIPPROG
|
||||||
|
"
|
||||||
|
|
||||||
|
while test -n "$1"; do
|
||||||
|
case $1 in
|
||||||
|
-c) shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
-d) dir_arg=true
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
-g) chgrpcmd="$chgrpprog $2"
|
||||||
|
shift
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
--help) echo "$usage"; exit $?;;
|
||||||
|
|
||||||
|
-m) chmodcmd="$chmodprog $2"
|
||||||
|
shift
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
-o) chowncmd="$chownprog $2"
|
||||||
|
shift
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
-s) stripcmd=$stripprog
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
-t) dstarg=$2
|
||||||
|
shift
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
-T) no_target_directory=true
|
||||||
|
shift
|
||||||
|
continue;;
|
||||||
|
|
||||||
|
--version) echo "$0 $scriptversion"; exit $?;;
|
||||||
|
|
||||||
|
*) # When -d is used, all remaining arguments are directories to create.
|
||||||
|
# When -t is used, the destination is already specified.
|
||||||
|
test -n "$dir_arg$dstarg" && break
|
||||||
|
# Otherwise, the last argument is the destination. Remove it from $@.
|
||||||
|
for arg
|
||||||
|
do
|
||||||
|
if test -n "$dstarg"; then
|
||||||
|
# $@ is not empty: it contains at least $arg.
|
||||||
|
set fnord "$@" "$dstarg"
|
||||||
|
shift # fnord
|
||||||
|
fi
|
||||||
|
shift # arg
|
||||||
|
dstarg=$arg
|
||||||
|
done
|
||||||
|
break;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
|
||||||
|
if test -z "$1"; then
|
||||||
|
if test -z "$dir_arg"; then
|
||||||
|
echo "$0: no input file specified." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
# It's OK to call `install-sh -d' without argument.
|
||||||
|
# This can happen when creating conditional directories.
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
for src
|
||||||
|
do
|
||||||
|
# Protect names starting with `-'.
|
||||||
|
case $src in
|
||||||
|
-*) src=./$src ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
if test -n "$dir_arg"; then
|
||||||
|
dst=$src
|
||||||
|
src=
|
||||||
|
|
||||||
|
if test -d "$dst"; then
|
||||||
|
mkdircmd=:
|
||||||
|
chmodcmd=
|
||||||
|
else
|
||||||
|
mkdircmd=$mkdirprog
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
# Waiting for this to be detected by the "$cpprog $src $dsttmp" command
|
||||||
|
# might cause directories to be created, which would be especially bad
|
||||||
|
# if $src (and thus $dsttmp) contains '*'.
|
||||||
|
if test ! -f "$src" && test ! -d "$src"; then
|
||||||
|
echo "$0: $src does not exist." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if test -z "$dstarg"; then
|
||||||
|
echo "$0: no destination specified." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
dst=$dstarg
|
||||||
|
# Protect names starting with `-'.
|
||||||
|
case $dst in
|
||||||
|
-*) dst=./$dst ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# If destination is a directory, append the input filename; won't work
|
||||||
|
# if double slashes aren't ignored.
|
||||||
|
if test -d "$dst"; then
|
||||||
|
if test -n "$no_target_directory"; then
|
||||||
|
echo "$0: $dstarg: Is a directory" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
dst=$dst/`basename "$src"`
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# This sed command emulates the dirname command.
|
||||||
|
dstdir=`echo "$dst" | sed -e 's,/*$,,;s,[^/]*$,,;s,/*$,,;s,^$,.,'`
|
||||||
|
|
||||||
|
# Make sure that the destination directory exists.
|
||||||
|
|
||||||
|
# Skip lots of stat calls in the usual case.
|
||||||
|
if test ! -d "$dstdir"; then
|
||||||
|
defaultIFS='
|
||||||
|
'
|
||||||
|
IFS="${IFS-$defaultIFS}"
|
||||||
|
|
||||||
|
oIFS=$IFS
|
||||||
|
# Some sh's can't handle IFS=/ for some reason.
|
||||||
|
IFS='%'
|
||||||
|
set x `echo "$dstdir" | sed -e 's@/@%@g' -e 's@^%@/@'`
|
||||||
|
shift
|
||||||
|
IFS=$oIFS
|
||||||
|
|
||||||
|
pathcomp=
|
||||||
|
|
||||||
|
while test $# -ne 0 ; do
|
||||||
|
pathcomp=$pathcomp$1
|
||||||
|
shift
|
||||||
|
if test ! -d "$pathcomp"; then
|
||||||
|
$mkdirprog "$pathcomp"
|
||||||
|
# mkdir can fail with a `File exist' error in case several
|
||||||
|
# install-sh are creating the directory concurrently. This
|
||||||
|
# is OK.
|
||||||
|
test -d "$pathcomp" || exit
|
||||||
|
fi
|
||||||
|
pathcomp=$pathcomp/
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
|
if test -n "$dir_arg"; then
|
||||||
|
$doit $mkdircmd "$dst" \
|
||||||
|
&& { test -z "$chowncmd" || $doit $chowncmd "$dst"; } \
|
||||||
|
&& { test -z "$chgrpcmd" || $doit $chgrpcmd "$dst"; } \
|
||||||
|
&& { test -z "$stripcmd" || $doit $stripcmd "$dst"; } \
|
||||||
|
&& { test -z "$chmodcmd" || $doit $chmodcmd "$dst"; }
|
||||||
|
|
||||||
|
else
|
||||||
|
dstfile=`basename "$dst"`
|
||||||
|
|
||||||
|
# Make a couple of temp file names in the proper directory.
|
||||||
|
dsttmp=$dstdir/_inst.$$_
|
||||||
|
rmtmp=$dstdir/_rm.$$_
|
||||||
|
|
||||||
|
# Trap to clean up those temp files at exit.
|
||||||
|
trap 'ret=$?; rm -f "$dsttmp" "$rmtmp" && exit $ret' 0
|
||||||
|
trap '(exit $?); exit' 1 2 13 15
|
||||||
|
|
||||||
|
# Copy the file name to the temp name.
|
||||||
|
$doit $cpprog "$src" "$dsttmp" &&
|
||||||
|
|
||||||
|
# and set any options; do chmod last to preserve setuid bits.
|
||||||
|
#
|
||||||
|
# If any of these fail, we abort the whole thing. If we want to
|
||||||
|
# ignore errors from any of these, just make sure not to ignore
|
||||||
|
# errors from the above "$doit $cpprog $src $dsttmp" command.
|
||||||
|
#
|
||||||
|
{ test -z "$chowncmd" || $doit $chowncmd "$dsttmp"; } \
|
||||||
|
&& { test -z "$chgrpcmd" || $doit $chgrpcmd "$dsttmp"; } \
|
||||||
|
&& { test -z "$stripcmd" || $doit $stripcmd "$dsttmp"; } \
|
||||||
|
&& { test -z "$chmodcmd" || $doit $chmodcmd "$dsttmp"; } &&
|
||||||
|
|
||||||
|
# Now rename the file to the real destination.
|
||||||
|
{ $doit $mvcmd -f "$dsttmp" "$dstdir/$dstfile" 2>/dev/null \
|
||||||
|
|| {
|
||||||
|
# The rename failed, perhaps because mv can't rename something else
|
||||||
|
# to itself, or perhaps because mv is so ancient that it does not
|
||||||
|
# support -f.
|
||||||
|
|
||||||
|
# Now remove or move aside any old file at destination location.
|
||||||
|
# We try this two ways since rm can't unlink itself on some
|
||||||
|
# systems and the destination file might be busy for other
|
||||||
|
# reasons. In this case, the final cleanup might fail but the new
|
||||||
|
# file should still install successfully.
|
||||||
|
{
|
||||||
|
if test -f "$dstdir/$dstfile"; then
|
||||||
|
$doit $rmcmd -f "$dstdir/$dstfile" 2>/dev/null \
|
||||||
|
|| $doit $mvcmd -f "$dstdir/$dstfile" "$rmtmp" 2>/dev/null \
|
||||||
|
|| {
|
||||||
|
echo "$0: cannot unlink or rename $dstdir/$dstfile" >&2
|
||||||
|
(exit 1); exit 1
|
||||||
|
}
|
||||||
|
else
|
||||||
|
:
|
||||||
|
fi
|
||||||
|
} &&
|
||||||
|
|
||||||
|
# Now rename the file to the real destination.
|
||||||
|
$doit $mvcmd "$dsttmp" "$dstdir/$dstfile"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
fi || { (exit 1); exit 1; }
|
||||||
|
done
|
||||||
|
|
||||||
|
# The final little trick to "correctly" pass the exit status to the exit trap.
|
||||||
|
{
|
||||||
|
(exit 0); exit 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# Local variables:
|
||||||
|
# eval: (add-hook 'write-file-hooks 'time-stamp)
|
||||||
|
# time-stamp-start: "scriptversion="
|
||||||
|
# time-stamp-format: "%:y-%02m-%02d.%02H"
|
||||||
|
# time-stamp-end: "$"
|
||||||
|
# End:
|
||||||
296
install.doc
296
install.doc
@@ -1,6 +1,6 @@
|
|||||||
INSTALLATION INSTRUCTIONS for the Independent JPEG Group's JPEG software
|
INSTALLATION INSTRUCTIONS for the Independent JPEG Group's JPEG software
|
||||||
|
|
||||||
Copyright (C) 1991-1996, Thomas G. Lane.
|
Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
This file is part of the Independent JPEG Group's software.
|
This file is part of the Independent JPEG Group's software.
|
||||||
For conditions of distribution and use, see the accompanying README file.
|
For conditions of distribution and use, see the accompanying README file.
|
||||||
|
|
||||||
@@ -94,6 +94,19 @@ Configure was created with GNU Autoconf and it follows the usual conventions
|
|||||||
for GNU configure scripts. It makes a few assumptions that you may want to
|
for GNU configure scripts. It makes a few assumptions that you may want to
|
||||||
override. You can do this by providing optional switches to configure:
|
override. You can do this by providing optional switches to configure:
|
||||||
|
|
||||||
|
* If you want to build libjpeg as a shared library, say
|
||||||
|
./configure --enable-shared
|
||||||
|
To get both shared and static libraries, say
|
||||||
|
./configure --enable-shared --enable-static
|
||||||
|
Note that these switches invoke GNU libtool to take care of system-dependent
|
||||||
|
shared library building methods. If things don't work this way, please try
|
||||||
|
running configure without either switch; that should build a static library
|
||||||
|
without using libtool. If that works, your problem is probably with libtool
|
||||||
|
not with the IJG code. libtool is fairly new and doesn't support all flavors
|
||||||
|
of Unix yet. (You might be able to find a newer version of libtool than the
|
||||||
|
one included with libjpeg; see ftp.gnu.org. Report libtool problems to
|
||||||
|
bug-libtool@gnu.org.)
|
||||||
|
|
||||||
* Configure will use gcc (GNU C compiler) if it's available, otherwise cc.
|
* Configure will use gcc (GNU C compiler) if it's available, otherwise cc.
|
||||||
To force a particular compiler to be selected, use the CC option, for example
|
To force a particular compiler to be selected, use the CC option, for example
|
||||||
./configure CC='cc'
|
./configure CC='cc'
|
||||||
@@ -102,8 +115,10 @@ For example, on HP-UX you probably want to say
|
|||||||
./configure CC='cc -Aa'
|
./configure CC='cc -Aa'
|
||||||
to get HP's compiler to run in ANSI mode.
|
to get HP's compiler to run in ANSI mode.
|
||||||
|
|
||||||
* The default CFLAGS setting is "-O". You can override this by saying,
|
* The default CFLAGS setting is "-O" for non-gcc compilers, "-O2" for gcc.
|
||||||
for example, ./configure CFLAGS='-O2'.
|
You can override this by saying, for example,
|
||||||
|
./configure CFLAGS='-g'
|
||||||
|
if you want to compile with debugging support.
|
||||||
|
|
||||||
* Configure will set up the makefile so that "make install" will install files
|
* Configure will set up the makefile so that "make install" will install files
|
||||||
into /usr/local/bin, /usr/local/man, etc. You can specify an installation
|
into /usr/local/bin, /usr/local/man, etc. You can specify an installation
|
||||||
@@ -131,17 +146,20 @@ Makefile jconfig file System and/or compiler
|
|||||||
|
|
||||||
makefile.manx jconfig.manx Amiga, Manx Aztec C
|
makefile.manx jconfig.manx Amiga, Manx Aztec C
|
||||||
makefile.sas jconfig.sas Amiga, SAS C
|
makefile.sas jconfig.sas Amiga, SAS C
|
||||||
|
makeproj.mac jconfig.mac Apple Macintosh, Metrowerks CodeWarrior
|
||||||
mak*jpeg.st jconfig.st Atari ST/STE/TT, Pure C or Turbo C
|
mak*jpeg.st jconfig.st Atari ST/STE/TT, Pure C or Turbo C
|
||||||
makefile.bcc jconfig.bcc MS-DOS or OS/2, Borland C
|
makefile.bcc jconfig.bcc MS-DOS or OS/2, Borland C
|
||||||
makefile.dj jconfig.dj MS-DOS, DJGPP (Delorie's port of GNU C)
|
makefile.dj jconfig.dj MS-DOS, DJGPP (Delorie's port of GNU C)
|
||||||
makefile.mc6 jconfig.mc6 MS-DOS, Microsoft C version 6.x and up
|
makefile.mc6 jconfig.mc6 MS-DOS, Microsoft C (16-bit only)
|
||||||
makefile.wat jconfig.wat MS-DOS, OS/2, or Windows NT, Watcom C
|
makefile.wat jconfig.wat MS-DOS, OS/2, or Windows NT, Watcom C
|
||||||
|
makefile.vc jconfig.vc Windows NT/95, MS Visual C++
|
||||||
|
make*.ds jconfig.vc Windows NT/95, MS Developer Studio
|
||||||
makefile.mms jconfig.vms Digital VMS, with MMS software
|
makefile.mms jconfig.vms Digital VMS, with MMS software
|
||||||
makefile.vms jconfig.vms Digital VMS, without MMS software
|
makefile.vms jconfig.vms Digital VMS, without MMS software
|
||||||
|
|
||||||
Copy the proper jconfig file to jconfig.h and the makefile to Makefile
|
Copy the proper jconfig file to jconfig.h and the makefile to Makefile (or
|
||||||
(or whatever your system uses as the standard makefile name). For the
|
whatever your system uses as the standard makefile name). For more info see
|
||||||
Atari, we provide four project files; see the Atari hints below.
|
the appropriate system-specific hints section near the end of this file.
|
||||||
|
|
||||||
|
|
||||||
Configuring the software by hand
|
Configuring the software by hand
|
||||||
@@ -303,7 +321,7 @@ As a quick test of functionality we've included a small sample image in
|
|||||||
several forms:
|
several forms:
|
||||||
testorig.jpg Starting point for the djpeg tests.
|
testorig.jpg Starting point for the djpeg tests.
|
||||||
testimg.ppm The output of djpeg testorig.jpg
|
testimg.ppm The output of djpeg testorig.jpg
|
||||||
testimg.gif The output of djpeg -gif testorig.jpg
|
testimg.bmp The output of djpeg -bmp -colors 256 testorig.jpg
|
||||||
testimg.jpg The output of cjpeg testimg.ppm
|
testimg.jpg The output of cjpeg testimg.ppm
|
||||||
testprog.jpg Progressive-mode equivalent of testorig.jpg.
|
testprog.jpg Progressive-mode equivalent of testorig.jpg.
|
||||||
testimgp.jpg The output of cjpeg -progressive -optimize testimg.ppm
|
testimgp.jpg The output of cjpeg -progressive -optimize testimg.ppm
|
||||||
@@ -339,10 +357,10 @@ check fails, try recompiling with USE_SETMODE or USE_FDOPEN defined.
|
|||||||
If it still doesn't work, better use two-file style.
|
If it still doesn't work, better use two-file style.
|
||||||
|
|
||||||
If you chose a memory manager other than jmemnobs.c, you should test that
|
If you chose a memory manager other than jmemnobs.c, you should test that
|
||||||
temporary-file usage works. Try "djpeg -gif -max 0 testorig.jpg" and make
|
temporary-file usage works. Try "djpeg -bmp -colors 256 -max 0 testorig.jpg"
|
||||||
sure its output matches testimg.gif. If you have any really large images
|
and make sure its output matches testimg.bmp. If you have any really large
|
||||||
handy, try compressing them with -optimize and/or decompressing with -gif to
|
images handy, try compressing them with -optimize and/or decompressing with
|
||||||
make sure your DEFAULT_MAX_MEM setting is not too large.
|
-colors 256 to make sure your DEFAULT_MAX_MEM setting is not too large.
|
||||||
|
|
||||||
NOTE: this is far from an exhaustive test of the JPEG software; some modules,
|
NOTE: this is far from an exhaustive test of the JPEG software; some modules,
|
||||||
such as 1-pass color quantization, are not exercised at all. It's just a
|
such as 1-pass color quantization, are not exercised at all. It's just a
|
||||||
@@ -357,7 +375,7 @@ Once you're done with the above steps, you can install the software by
|
|||||||
copying the executable files (cjpeg, djpeg, jpegtran, rdjpgcom, and wrjpgcom)
|
copying the executable files (cjpeg, djpeg, jpegtran, rdjpgcom, and wrjpgcom)
|
||||||
to wherever you normally install programs. On Unix systems, you'll also want
|
to wherever you normally install programs. On Unix systems, you'll also want
|
||||||
to put the man pages (cjpeg.1, djpeg.1, jpegtran.1, rdjpgcom.1, wrjpgcom.1)
|
to put the man pages (cjpeg.1, djpeg.1, jpegtran.1, rdjpgcom.1, wrjpgcom.1)
|
||||||
in the man-page directory. The canned makefiles don't support this step
|
in the man-page directory. The pre-fab makefiles don't support this step
|
||||||
since there's such a wide variety of installation procedures on different
|
since there's such a wide variety of installation procedures on different
|
||||||
systems.
|
systems.
|
||||||
|
|
||||||
@@ -370,8 +388,13 @@ to see where configure thought the files should go. You may need to edit
|
|||||||
the Makefile, particularly if your system's conventions for man page
|
the Makefile, particularly if your system's conventions for man page
|
||||||
filenames don't match what configure expects.
|
filenames don't match what configure expects.
|
||||||
|
|
||||||
If you want to install the library file libjpeg.a and the include files j*.h
|
If you want to install the IJG library itself, for use in compiling other
|
||||||
(for use in compiling other programs besides the IJG ones), then say
|
programs besides ours, then you need to put the four include files
|
||||||
|
jpeglib.h jerror.h jconfig.h jmorecfg.h
|
||||||
|
into your include-file directory, and put the library file libjpeg.a
|
||||||
|
(extension may vary depending on system) wherever library files go.
|
||||||
|
If you generated a Makefile with "configure", it will do what it thinks
|
||||||
|
is the right thing if you say
|
||||||
make install-lib
|
make install-lib
|
||||||
|
|
||||||
|
|
||||||
@@ -426,8 +449,8 @@ The PPM reader (rdppm.c) can read 12-bit data from either text-format or
|
|||||||
binary-format PPM and PGM files. Binary-format PPM/PGM files which have a
|
binary-format PPM and PGM files. Binary-format PPM/PGM files which have a
|
||||||
maxval greater than 255 are assumed to use 2 bytes per sample, LSB first
|
maxval greater than 255 are assumed to use 2 bytes per sample, LSB first
|
||||||
(little-endian order). As of early 1995, 2-byte binary format is not
|
(little-endian order). As of early 1995, 2-byte binary format is not
|
||||||
officially supported by the PBMPLUS library, but it is expected that the
|
officially supported by the PBMPLUS library, but it is expected that a
|
||||||
next release of PBMPLUS will support it. Note that the PPM reader will
|
future release of PBMPLUS will support it. Note that the PPM reader will
|
||||||
read files of any maxval regardless of the BITS_IN_JSAMPLE setting; incoming
|
read files of any maxval regardless of the BITS_IN_JSAMPLE setting; incoming
|
||||||
data is automatically rescaled to either maxval=255 or maxval=4095 as
|
data is automatically rescaled to either maxval=255 or maxval=4095 as
|
||||||
appropriate for the cjpeg bit depth.
|
appropriate for the cjpeg bit depth.
|
||||||
@@ -568,19 +591,19 @@ Atari ST/STE/TT:
|
|||||||
Copy the project files makcjpeg.st, makdjpeg.st, maktjpeg.st, and makljpeg.st
|
Copy the project files makcjpeg.st, makdjpeg.st, maktjpeg.st, and makljpeg.st
|
||||||
to cjpeg.prj, djpeg.prj, jpegtran.prj, and libjpeg.prj respectively. The
|
to cjpeg.prj, djpeg.prj, jpegtran.prj, and libjpeg.prj respectively. The
|
||||||
project files should work as-is with Pure C. For Turbo C, change library
|
project files should work as-is with Pure C. For Turbo C, change library
|
||||||
filenames "PC..." to "TC..." in each project file. Note that libjpeg.prj
|
filenames "pc..." to "tc..." in each project file. Note that libjpeg.prj
|
||||||
selects jmemansi.c as the recommended memory manager. You'll probably want to
|
selects jmemansi.c as the recommended memory manager. You'll probably want to
|
||||||
adjust the DEFAULT_MAX_MEM setting --- you want it to be a couple hundred K
|
adjust the DEFAULT_MAX_MEM setting --- you want it to be a couple hundred K
|
||||||
less than your normal free memory. Put "#define DEFAULT_MAX_MEM nnnn" into
|
less than your normal free memory. Put "#define DEFAULT_MAX_MEM nnnn" into
|
||||||
jconfig.h to do this.
|
jconfig.h to do this.
|
||||||
|
|
||||||
To use the 68881/68882 coprocessor for the floating point DCT, add the
|
To use the 68881/68882 coprocessor for the floating point DCT, add the
|
||||||
compiler option "-8" to the project files and replace PCFLTLIB.LIB with
|
compiler option "-8" to the project files and replace pcfltlib.lib with
|
||||||
PC881LIB.LIB in cjpeg.prj and djpeg.prj. Or if you don't have a
|
pc881lib.lib in cjpeg.prj and djpeg.prj. Or if you don't have a
|
||||||
coprocessor, you may prefer to remove the float DCT code by undefining
|
coprocessor, you may prefer to remove the float DCT code by undefining
|
||||||
DCT_FLOAT_SUPPORTED in jmorecfg.h (since without a coprocessor, the float
|
DCT_FLOAT_SUPPORTED in jmorecfg.h (since without a coprocessor, the float
|
||||||
code will be too slow to be useful). In that case, you can delete
|
code will be too slow to be useful). In that case, you can delete
|
||||||
PCFLTLIB.LIB from the project files.
|
pcfltlib.lib from the project files.
|
||||||
|
|
||||||
Note that you must make libjpeg.lib before making cjpeg.ttp, djpeg.ttp,
|
Note that you must make libjpeg.lib before making cjpeg.ttp, djpeg.ttp,
|
||||||
or jpegtran.ttp. You'll have to perform the self-test by hand.
|
or jpegtran.ttp. You'll have to perform the self-test by hand.
|
||||||
@@ -637,49 +660,62 @@ provide a Unix-style command line interface. You can use this interface on
|
|||||||
the Mac by means of the ccommand() library routine provided by Metrowerks
|
the Mac by means of the ccommand() library routine provided by Metrowerks
|
||||||
CodeWarrior or Think C. This is only appropriate for testing the library,
|
CodeWarrior or Think C. This is only appropriate for testing the library,
|
||||||
however; to make a user-friendly equivalent of cjpeg/djpeg you'd really want
|
however; to make a user-friendly equivalent of cjpeg/djpeg you'd really want
|
||||||
to develop a Mac-style user interface. Such an interface exists for pre-v5
|
to develop a Mac-style user interface. There isn't a complete example
|
||||||
IJG libraries (see the Think C entry, below) but at this writing it has not
|
available at the moment, but there are some helpful starting points:
|
||||||
been updated to work with the current release.
|
1. Sam Bushell's free "To JPEG" applet provides drag-and-drop conversion to
|
||||||
|
JPEG under System 7 and later. This only illustrates how to use the
|
||||||
|
compression half of the library, but it does a very nice job of that part.
|
||||||
|
The CodeWarrior source code is available from http://www.pobox.com/~jsam.
|
||||||
|
2. Jim Brunner prepared a Mac-style user interface for both compression and
|
||||||
|
decompression. Unfortunately, it hasn't been updated since IJG v4, and
|
||||||
|
the library's API has changed considerably since then. Still it may be of
|
||||||
|
some help, particularly as a guide to compiling the IJG code under Think C.
|
||||||
|
Jim's code is available from the Info-Mac archives, at sumex-aim.stanford.edu
|
||||||
|
or mirrors thereof; see file /info-mac/dev/src/jpeg-convert-c.hqx.
|
||||||
|
|
||||||
We recommend replacing "malloc" and "free" by "NewPtr" and "DisposePtr" in
|
jmemmac.c is the recommended memory manager back end for Macintosh. It uses
|
||||||
whichever memory manager back end you use, because Mac C libraries often
|
NewPtr/DisposePtr instead of malloc/free, and has a Mac-specific
|
||||||
have inferior implementations of malloc/free. jmemmac.c is recommended;
|
implementation of jpeg_mem_available(). It also creates temporary files that
|
||||||
it is a customized version of jmemansi.c with this change and a Mac-specific
|
follow Mac conventions. (That part of the code relies on System-7-or-later OS
|
||||||
implementation of jpeg_mem_available(). You can also use jmemnobs.c if you
|
functions. See the comments in jmemmac.c if you need to run it on System 6.)
|
||||||
don't care about handling images larger than available memory.
|
NOTE that USE_MAC_MEMMGR must be defined in jconfig.h to use jmemmac.c.
|
||||||
|
|
||||||
|
You can also use jmemnobs.c, if you don't care about handling images larger
|
||||||
Macintosh, MPW:
|
than available memory. If you use any memory manager back end other than
|
||||||
|
jmemmac.c, we recommend replacing "malloc" and "free" by "NewPtr" and
|
||||||
We don't directly support MPW in the current release, but Larry Rosenstein
|
"DisposePtr", because Mac C libraries often have peculiar implementations of
|
||||||
ported an earlier version of the IJG code without very much trouble. There's
|
malloc/free. (For instance, free() may not return the freed space to the
|
||||||
useful notes and conversion scripts in his kit for porting PBMPLUS to MPW.
|
Mac Memory Manager. This is undesirable for the IJG code because jmemmgr.c
|
||||||
You can obtain the kit by FTP to ftp.apple.com, files /pub/lsr/pbmplus-port*.
|
already clumps space requests.)
|
||||||
|
|
||||||
|
|
||||||
Macintosh, Metrowerks CodeWarrior:
|
Macintosh, Metrowerks CodeWarrior:
|
||||||
|
|
||||||
Metrowerks release DR2 has problems with the IJG code; don't use it. Release
|
|
||||||
DR3.5 or later should be OK.
|
|
||||||
|
|
||||||
The Unix-command-line-style interface can be used by defining USE_CCOMMAND.
|
The Unix-command-line-style interface can be used by defining USE_CCOMMAND.
|
||||||
You'll also need to define either TWO_FILE_COMMANDLINE (to avoid stdin/stdout)
|
You'll also need to define TWO_FILE_COMMANDLINE to avoid stdin/stdout.
|
||||||
or USE_FDOPEN (to make stdin/stdout work in binary mode). See the Think C
|
This means that when using the cjpeg/djpeg programs, you'll have to type the
|
||||||
entry for more details.
|
input and output file names in the "Arguments" text-edit box, rather than
|
||||||
|
using the file radio buttons. (Perhaps USE_FDOPEN or USE_SETMODE would
|
||||||
|
eliminate the problem, but I haven't heard from anyone who's tried it.)
|
||||||
|
|
||||||
On 680x0 Macs, Metrowerks defines type "double" as a 10-byte IEEE extended
|
On 680x0 Macs, Metrowerks defines type "double" as a 10-byte IEEE extended
|
||||||
float. jmemmgr.c won't like this: it wants sizeof(ALIGN_TYPE) to be a power
|
float. jmemmgr.c won't like this: it wants sizeof(ALIGN_TYPE) to be a power
|
||||||
of 2. Add "#define ALIGN_TYPE long" to jconfig.h to eliminate the complaint.
|
of 2. Add "#define ALIGN_TYPE long" to jconfig.h to eliminate the complaint.
|
||||||
|
|
||||||
|
The supplied configuration file jconfig.mac can be used for your jconfig.h;
|
||||||
|
it includes all the recommended symbol definitions. If you have AppleScript
|
||||||
|
installed, you can run the supplied script makeproj.mac to create CodeWarrior
|
||||||
|
project files for the library and the testbed applications, then build the
|
||||||
|
library and applications. (Thanks to Dan Sears and Don Agro for this nifty
|
||||||
|
hack, which saves us from trying to maintain CodeWarrior project files as part
|
||||||
|
of the IJG distribution...)
|
||||||
|
|
||||||
|
|
||||||
Macintosh, Think C:
|
Macintosh, Think C:
|
||||||
|
|
||||||
Jim Brunner has prepared a Mac-style user interface for the IJG library.
|
The documentation in Jim Brunner's "JPEG Convert" source code (see above)
|
||||||
Unfortunately, the released version of it only works with pre-v5 libraries;
|
includes detailed build instructions for Think C; it's probably somewhat
|
||||||
still, it may be a useful starting point. You can obtain Jim's additional
|
out of date for the current release, but may be helpful.
|
||||||
source code from the Info-Mac archives, at sumex-aim.stanford.edu or mirrors
|
|
||||||
thereof; see file /info-mac/dev/src/jpeg-convert-c.hqx. Jim's documentation
|
|
||||||
also includes more detailed build instructions for Think C.
|
|
||||||
|
|
||||||
If you want to build the minimal command line version, proceed as follows.
|
If you want to build the minimal command line version, proceed as follows.
|
||||||
You'll have to prepare project files for the programs; we don't include any
|
You'll have to prepare project files for the programs; we don't include any
|
||||||
@@ -695,6 +731,9 @@ On 680x0 Macs, Think C defines type "double" as a 12-byte IEEE extended float.
|
|||||||
jmemmgr.c won't like this: it wants sizeof(ALIGN_TYPE) to be a power of 2.
|
jmemmgr.c won't like this: it wants sizeof(ALIGN_TYPE) to be a power of 2.
|
||||||
Add "#define ALIGN_TYPE long" to jconfig.h to eliminate the complaint.
|
Add "#define ALIGN_TYPE long" to jconfig.h to eliminate the complaint.
|
||||||
|
|
||||||
|
jconfig.mac should work as a jconfig.h configuration file for Think C,
|
||||||
|
but the makeproj.mac AppleScript script is specific to CodeWarrior. Sorry.
|
||||||
|
|
||||||
|
|
||||||
MIPS R3000:
|
MIPS R3000:
|
||||||
|
|
||||||
@@ -705,7 +744,7 @@ Note that the R3000 chip is found in workstations from DEC and others.
|
|||||||
|
|
||||||
MS-DOS, generic comments for 16-bit compilers:
|
MS-DOS, generic comments for 16-bit compilers:
|
||||||
|
|
||||||
The IJG code is designed to be compiled in 80x86 "small" or "medium" memory
|
The IJG code is designed to work well in 80x86 "small" or "medium" memory
|
||||||
models (i.e., data pointers are 16 bits unless explicitly declared "far";
|
models (i.e., data pointers are 16 bits unless explicitly declared "far";
|
||||||
code pointers can be either size). You may be able to use small model to
|
code pointers can be either size). You may be able to use small model to
|
||||||
compile cjpeg or djpeg by itself, but you will probably have to use medium
|
compile cjpeg or djpeg by itself, but you will probably have to use medium
|
||||||
@@ -721,7 +760,7 @@ The DOS-specific memory manager, jmemdos.c, should be used if possible.
|
|||||||
It needs some assembly-code routines which are in jmemdosa.asm; make sure
|
It needs some assembly-code routines which are in jmemdosa.asm; make sure
|
||||||
your makefile assembles that file and includes it in the library. If you
|
your makefile assembles that file and includes it in the library. If you
|
||||||
don't have a suitable assembler, you can get pre-assembled object files for
|
don't have a suitable assembler, you can get pre-assembled object files for
|
||||||
jmemdosa by FTP from ftp.uu.net: graphics/jpeg/jdosaobj.zip. (DOS-oriented
|
jmemdosa by FTP from ftp.uu.net:/graphics/jpeg/jdosaobj.zip. (DOS-oriented
|
||||||
distributions of the IJG source code often include these object files.)
|
distributions of the IJG source code often include these object files.)
|
||||||
|
|
||||||
When using jmemdos.c, jconfig.h must define USE_MSDOS_MEMMGR and must set
|
When using jmemdos.c, jconfig.h must define USE_MSDOS_MEMMGR and must set
|
||||||
@@ -778,31 +817,22 @@ jconfig.bcc already includes #define USE_SETMODE to make this work.
|
|||||||
(fdopen does not work correctly.)
|
(fdopen does not work correctly.)
|
||||||
|
|
||||||
|
|
||||||
MS-DOS, DJGPP:
|
|
||||||
|
|
||||||
Use a recent version of DJGPP (1.11 or better). If you prefer two-file
|
|
||||||
command line style, change the supplied jconfig.dj to define
|
|
||||||
TWO_FILE_COMMANDLINE. makefile.dj is set up to generate only COFF files
|
|
||||||
(cjpeg, djpeg, etc) when you say make. After testing, say "make exe" to
|
|
||||||
make executables with stub.exe, or "make standalone" if you want executables
|
|
||||||
that include go32. You will probably need to tweak the makefile's pointer to
|
|
||||||
go32.exe to do "make standalone".
|
|
||||||
|
|
||||||
|
|
||||||
MS-DOS, Microsoft C:
|
MS-DOS, Microsoft C:
|
||||||
|
|
||||||
makefile.mc6 works with Microsoft C, Visual C++, etc. Note that this
|
makefile.mc6 works with Microsoft C, DOS Visual C++, etc. It should only
|
||||||
makefile assumes that the working copy of itself is called "makefile".
|
be used if you want to build a 16-bit (small or medium memory model) program.
|
||||||
If you want to call it something else, say "makefile.mak", be sure to adjust
|
|
||||||
the dependency line that reads "$(RFILE) : makefile". Otherwise the make
|
|
||||||
will fail because it doesn't know how to create "makefile". Worse, some
|
|
||||||
releases of Microsoft's make utilities give an incorrect error message in
|
|
||||||
this situation.
|
|
||||||
|
|
||||||
If you want one-file command line style, just undefine TWO_FILE_COMMANDLINE.
|
If you want one-file command line style, just undefine TWO_FILE_COMMANDLINE.
|
||||||
jconfig.mc6 already includes #define USE_SETMODE to make this work.
|
jconfig.mc6 already includes #define USE_SETMODE to make this work.
|
||||||
(fdopen does not work correctly.)
|
(fdopen does not work correctly.)
|
||||||
|
|
||||||
|
Note that this makefile assumes that the working copy of itself is called
|
||||||
|
"makefile". If you want to call it something else, say "makefile.mak",
|
||||||
|
be sure to adjust the dependency line that reads "$(RFILE) : makefile".
|
||||||
|
Otherwise the make will fail because it doesn't know how to create "makefile".
|
||||||
|
Worse, some releases of Microsoft's make utilities give an incorrect error
|
||||||
|
message in this situation.
|
||||||
|
|
||||||
Old versions of MS C fail with an "out of macro expansion space" error
|
Old versions of MS C fail with an "out of macro expansion space" error
|
||||||
because they can't cope with the macro TRACEMS8 (defined in jerror.h).
|
because they can't cope with the macro TRACEMS8 (defined in jerror.h).
|
||||||
If this happens to you, the easiest solution is to change TRACEMS8 to
|
If this happens to you, the easiest solution is to change TRACEMS8 to
|
||||||
@@ -813,11 +843,12 @@ Original MS C 6.0 is very buggy; it compiles incorrect code unless you turn
|
|||||||
off optimization entirely (remove -O from CFLAGS). 6.00A is better, but it
|
off optimization entirely (remove -O from CFLAGS). 6.00A is better, but it
|
||||||
still generates bad code if you enable loop optimizations (-Ol or -Ox).
|
still generates bad code if you enable loop optimizations (-Ol or -Ox).
|
||||||
|
|
||||||
MS C 8.0 reportedly fails to compile jquant1.c if optimization is turned off
|
MS C 8.0 crashes when compiling jquant1.c with optimization switch /Oo ...
|
||||||
(yes, off).
|
which is on by default. To work around this bug, compile that one file
|
||||||
|
with /Oo-.
|
||||||
|
|
||||||
|
|
||||||
Microsoft Windows (all versions):
|
Microsoft Windows (all versions), generic comments:
|
||||||
|
|
||||||
Some Windows system include files define typedef boolean as "unsigned char".
|
Some Windows system include files define typedef boolean as "unsigned char".
|
||||||
The IJG code also defines typedef boolean, but we make it "int" by default.
|
The IJG code also defines typedef boolean, but we make it "int" by default.
|
||||||
@@ -825,45 +856,86 @@ This doesn't affect the IJG programs because we don't import those Windows
|
|||||||
include files. But if you use the JPEG library in your own program, and some
|
include files. But if you use the JPEG library in your own program, and some
|
||||||
of your program's files import one definition of boolean while some import the
|
of your program's files import one definition of boolean while some import the
|
||||||
other, you can get all sorts of mysterious problems. A good preventive step
|
other, you can get all sorts of mysterious problems. A good preventive step
|
||||||
is to change jmorecfg.h to define boolean as unsigned char. We recommend
|
is to make the IJG library use "unsigned char" for boolean. To do that,
|
||||||
making that part of jmorecfg.h read like this:
|
add something like this to your jconfig.h file:
|
||||||
|
/* Define "boolean" as unsigned char, not int, per Windows custom */
|
||||||
#ifndef __RPCNDR_H__ /* don't conflict if rpcndr.h already read */
|
#ifndef __RPCNDR_H__ /* don't conflict if rpcndr.h already read */
|
||||||
typedef unsigned char boolean;
|
typedef unsigned char boolean;
|
||||||
#endif
|
#endif
|
||||||
In v6a and later, using incompatible definitions of boolean will usually lead
|
#define HAVE_BOOLEAN /* prevent jmorecfg.h from redefining it */
|
||||||
to the failure message "JPEG parameter struct mismatch", rather than the
|
(This is already in jconfig.vc, by the way.)
|
||||||
difficult-to-diagnose bugs it caused with earlier versions.
|
|
||||||
|
windef.h contains the declarations
|
||||||
|
#define far
|
||||||
|
#define FAR far
|
||||||
|
Since jmorecfg.h tries to define FAR as empty, you may get a compiler
|
||||||
|
warning if you include both jpeglib.h and windef.h (which windows.h
|
||||||
|
includes). To suppress the warning, you can put "#ifndef FAR"/"#endif"
|
||||||
|
around the line "#define FAR" in jmorecfg.h.
|
||||||
|
|
||||||
When using the library in a Windows application, you will almost certainly
|
When using the library in a Windows application, you will almost certainly
|
||||||
want to modify or replace the error handler module jerror.c, since our
|
want to modify or replace the error handler module jerror.c, since our
|
||||||
default error handler does a couple of inappropriate things:
|
default error handler does a couple of inappropriate things:
|
||||||
1. it tries to write error and warning messages on stderr;
|
1. it tries to write error and warning messages on stderr;
|
||||||
2. in event of a fatal error, it exits by calling exit().
|
2. in event of a fatal error, it exits by calling exit().
|
||||||
|
|
||||||
A simple stopgap solution for problem 1 is to replace the line
|
A simple stopgap solution for problem 1 is to replace the line
|
||||||
fprintf(stderr, "%s\n", buffer);
|
fprintf(stderr, "%s\n", buffer);
|
||||||
(in output_message in jerror.c) with something like
|
(in output_message in jerror.c) with
|
||||||
MessageBox(GetActiveWindow(),buffer,"JPEG Error",MB_OK);
|
MessageBox(GetActiveWindow(),buffer,"JPEG Error",MB_OK|MB_ICONERROR);
|
||||||
It's highly recommended that you at least do that much, since otherwise
|
It's highly recommended that you at least do that much, since otherwise
|
||||||
error messages will disappear into nowhere.
|
error messages will disappear into nowhere. (Beginning with IJG v6b, this
|
||||||
|
code is already present in jerror.c; just define USE_WINDOWS_MESSAGEBOX in
|
||||||
|
jconfig.h to enable it.)
|
||||||
|
|
||||||
The proper solution for problem 2 is to return control to your calling
|
The proper solution for problem 2 is to return control to your calling
|
||||||
application after a library error. This can be done with the setjmp/longjmp
|
application after a library error. This can be done with the setjmp/longjmp
|
||||||
technique discussed in libjpeg.doc and illustrated in example.c.
|
technique discussed in libjpeg.doc and illustrated in example.c. (NOTE:
|
||||||
|
some older Windows C compilers provide versions of setjmp/longjmp that
|
||||||
|
don't actually work under Windows. You may need to use the Windows system
|
||||||
|
functions Catch and Throw instead.)
|
||||||
|
|
||||||
|
The recommended memory manager under Windows is jmemnobs.c; in other words,
|
||||||
|
let Windows do any virtual memory management needed. You should NOT use
|
||||||
|
jmemdos.c nor jmemdosa.asm under Windows.
|
||||||
|
|
||||||
|
For Windows 3.1, we recommend compiling in medium or large memory model;
|
||||||
|
for newer Windows versions, use a 32-bit flat memory model. (See the MS-DOS
|
||||||
|
sections above for more info about memory models.) In the 16-bit memory
|
||||||
|
models only, you'll need to put
|
||||||
|
#define MAX_ALLOC_CHUNK 65520L /* Maximum request to malloc() */
|
||||||
|
into jconfig.h to limit allocation chunks to 64Kb. (Without that, you'd
|
||||||
|
have to use huge memory model, which slows things down unnecessarily.)
|
||||||
|
jmemnobs.c works without modification in large or flat memory models, but to
|
||||||
|
use medium model, you need to modify its jpeg_get_large and jpeg_free_large
|
||||||
|
routines to allocate far memory. In any case, you might like to replace
|
||||||
|
its calls to malloc and free with direct calls on Windows memory allocation
|
||||||
|
functions.
|
||||||
|
|
||||||
You may also want to modify jdatasrc.c and jdatadst.c to use Windows file
|
You may also want to modify jdatasrc.c and jdatadst.c to use Windows file
|
||||||
operations rather than fread/fwrite. This is only necessary if your C
|
operations rather than fread/fwrite. This is only necessary if your C
|
||||||
compiler doesn't provide a competent implementation of C stdio functions.
|
compiler doesn't provide a competent implementation of C stdio functions.
|
||||||
|
|
||||||
|
You might want to tweak the RGB_xxx macros in jmorecfg.h so that the library
|
||||||
|
will accept or deliver color pixels in BGR sample order, not RGB; BGR order
|
||||||
|
is usually more convenient under Windows. Note that this change will break
|
||||||
|
the sample applications cjpeg/djpeg, but the library itself works fine.
|
||||||
|
|
||||||
|
|
||||||
Many people want to convert the IJG library into a DLL. This is reasonably
|
Many people want to convert the IJG library into a DLL. This is reasonably
|
||||||
straightforward, but watch out for the following:
|
straightforward, but watch out for the following:
|
||||||
|
|
||||||
1. Don't try to compile as a DLL in small or medium memory model; use
|
1. Don't try to compile as a DLL in small or medium memory model; use
|
||||||
large model, or even better, 32-bit flat model. Many places in the IJG code
|
large model, or even better, 32-bit flat model. Many places in the IJG code
|
||||||
assume the address of a local variable is an ordinary (not FAR) pointer;
|
assume the address of a local variable is an ordinary (not FAR) pointer;
|
||||||
that isn't true in a medium-model DLL.
|
that isn't true in a medium-model DLL.
|
||||||
|
|
||||||
2. Microsoft C cannot pass file pointers between applications and DLLs.
|
2. Microsoft C cannot pass file pointers between applications and DLLs.
|
||||||
(See Microsoft Knowledge Base, PSS ID Number Q50336.) So jdatasrc.c and
|
(See Microsoft Knowledge Base, PSS ID Number Q50336.) So jdatasrc.c and
|
||||||
jdatadst.c don't work if you open a file in your application and then pass
|
jdatadst.c don't work if you open a file in your application and then pass
|
||||||
the pointer to the DLL. One workaround is to make jdatasrc.c/jdatadst.c
|
the pointer to the DLL. One workaround is to make jdatasrc.c/jdatadst.c
|
||||||
part of your main application rather than part of the DLL.
|
part of your main application rather than part of the DLL.
|
||||||
|
|
||||||
3. You'll probably need to modify the macros GLOBAL() and EXTERN() to
|
3. You'll probably need to modify the macros GLOBAL() and EXTERN() to
|
||||||
attach suitable linkage keywords to the exported routine names. Similarly,
|
attach suitable linkage keywords to the exported routine names. Similarly,
|
||||||
you'll want to modify METHODDEF() and JMETHOD() to ensure function pointers
|
you'll want to modify METHODDEF() and JMETHOD() to ensure function pointers
|
||||||
@@ -871,10 +943,13 @@ are declared in a way that lets application routines be called back through
|
|||||||
the function pointers. These macros are in jmorecfg.h. Typical definitions
|
the function pointers. These macros are in jmorecfg.h. Typical definitions
|
||||||
for a 16-bit DLL are:
|
for a 16-bit DLL are:
|
||||||
#define GLOBAL(type) type _far _pascal _loadds _export
|
#define GLOBAL(type) type _far _pascal _loadds _export
|
||||||
#define EXTERN(type) extern type _far _pascal
|
#define EXTERN(type) extern type _far _pascal _loadds
|
||||||
#define METHODDEF(type) static type _far _pascal
|
#define METHODDEF(type) static type _far _pascal
|
||||||
#define JMETHOD(type,methodname,arglist) \
|
#define JMETHOD(type,methodname,arglist) \
|
||||||
type (_far _pascal *methodname) arglist
|
type (_far _pascal *methodname) arglist
|
||||||
|
For a 32-bit DLL you may want something like
|
||||||
|
#define GLOBAL(type) __declspec(dllexport) type
|
||||||
|
#define EXTERN(type) extern __declspec(dllexport) type
|
||||||
Although not all the GLOBAL routines are actually intended to be called by
|
Although not all the GLOBAL routines are actually intended to be called by
|
||||||
the application, the performance cost of making them all DLL entry points is
|
the application, the performance cost of making them all DLL entry points is
|
||||||
negligible.
|
negligible.
|
||||||
@@ -888,6 +963,12 @@ but hasn't been very high priority --- any volunteers out there?
|
|||||||
|
|
||||||
Microsoft Windows, Borland C:
|
Microsoft Windows, Borland C:
|
||||||
|
|
||||||
|
The provided jconfig.bcc should work OK in a 32-bit Windows environment,
|
||||||
|
but you'll need to tweak it in a 16-bit environment (you'd need to define
|
||||||
|
NEED_FAR_POINTERS and MAX_ALLOC_CHUNK). Beware that makefile.bcc will need
|
||||||
|
alteration if you want to use it for Windows --- in particular, you should
|
||||||
|
use jmemnobs.c not jmemdos.c under Windows.
|
||||||
|
|
||||||
Borland C++ 4.5 fails with an internal compiler error when trying to compile
|
Borland C++ 4.5 fails with an internal compiler error when trying to compile
|
||||||
jdmerge.c in 32-bit mode. If enough people complain, perhaps Borland will fix
|
jdmerge.c in 32-bit mode. If enough people complain, perhaps Borland will fix
|
||||||
it. In the meantime, the simplest known workaround is to add a redundant
|
it. In the meantime, the simplest known workaround is to add a redundant
|
||||||
@@ -902,6 +983,57 @@ doesn't trigger the bug.
|
|||||||
Recent reports suggest that this bug does not occur with "bcc32a" (the
|
Recent reports suggest that this bug does not occur with "bcc32a" (the
|
||||||
Pentium-optimized version of the compiler).
|
Pentium-optimized version of the compiler).
|
||||||
|
|
||||||
|
Another report from a user of Borland C 4.5 was that incorrect code (leading
|
||||||
|
to a color shift in processed images) was produced if any of the following
|
||||||
|
optimization switch combinations were used:
|
||||||
|
-Ot -Og
|
||||||
|
-Ot -Op
|
||||||
|
-Ot -Om
|
||||||
|
So try backing off on optimization if you see such a problem. (Are there
|
||||||
|
several different releases all numbered "4.5"??)
|
||||||
|
|
||||||
|
|
||||||
|
Microsoft Windows, Microsoft Visual C++:
|
||||||
|
|
||||||
|
jconfig.vc should work OK with any Microsoft compiler for a 32-bit memory
|
||||||
|
model. makefile.vc is intended for command-line use. (If you are using
|
||||||
|
the Developer Studio environment, you may prefer the DevStudio project
|
||||||
|
files; see below.)
|
||||||
|
|
||||||
|
Some users feel that it's easier to call the library from C++ code if you
|
||||||
|
force VC++ to treat the library as C++ code, which you can do by renaming
|
||||||
|
all the *.c files to *.cpp (and adjusting the makefile to match). This
|
||||||
|
avoids the need to put extern "C" { ... } around #include "jpeglib.h" in
|
||||||
|
your C++ application.
|
||||||
|
|
||||||
|
|
||||||
|
Microsoft Windows, Microsoft Developer Studio:
|
||||||
|
|
||||||
|
We include makefiles that should work as project files in DevStudio 4.2 or
|
||||||
|
later. There is a library makefile that builds the IJG library as a static
|
||||||
|
Win32 library, and an application makefile that builds the sample applications
|
||||||
|
as Win32 console applications. (Even if you only want the library, we
|
||||||
|
recommend building the applications so that you can run the self-test.)
|
||||||
|
|
||||||
|
To use:
|
||||||
|
1. Copy jconfig.vc to jconfig.h, makelib.ds to jpeg.mak, and
|
||||||
|
makeapps.ds to apps.mak. (Note that the renaming is critical!)
|
||||||
|
2. Click on the .mak files to construct project workspaces.
|
||||||
|
(If you are using DevStudio more recent than 4.2, you'll probably
|
||||||
|
get a message saying that the makefiles are being updated.)
|
||||||
|
3. Build the library project, then the applications project.
|
||||||
|
4. Move the application .exe files from `app`\Release to an
|
||||||
|
appropriate location on your path.
|
||||||
|
5. To perform the self-test, execute the command line
|
||||||
|
NMAKE /f makefile.vc test
|
||||||
|
|
||||||
|
|
||||||
|
OS/2, Borland C++:
|
||||||
|
|
||||||
|
Watch out for optimization bugs in older Borland compilers; you may need
|
||||||
|
to back off the optimization switch settings. See the comments in
|
||||||
|
makefile.bcc.
|
||||||
|
|
||||||
|
|
||||||
SGI:
|
SGI:
|
||||||
|
|
||||||
|
|||||||
56
jcapimin.c
56
jcapimin.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jcapimin.c
|
* jcapimin.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1996, Thomas G. Lane.
|
* Copyright (C) 1994-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -39,13 +39,18 @@ jpeg_CreateCompress (j_compress_ptr cinfo, int version, size_t structsize)
|
|||||||
ERREXIT2(cinfo, JERR_BAD_STRUCT_SIZE,
|
ERREXIT2(cinfo, JERR_BAD_STRUCT_SIZE,
|
||||||
(int) SIZEOF(struct jpeg_compress_struct), (int) structsize);
|
(int) SIZEOF(struct jpeg_compress_struct), (int) structsize);
|
||||||
|
|
||||||
/* For debugging purposes, zero the whole master structure.
|
/* For debugging purposes, we zero the whole master structure.
|
||||||
* But error manager pointer is already there, so save and restore it.
|
* But the application has already set the err pointer, and may have set
|
||||||
|
* client_data, so we have to save and restore those fields.
|
||||||
|
* Note: if application hasn't set client_data, tools like Purify may
|
||||||
|
* complain here.
|
||||||
*/
|
*/
|
||||||
{
|
{
|
||||||
struct jpeg_error_mgr * err = cinfo->err;
|
struct jpeg_error_mgr * err = cinfo->err;
|
||||||
|
void * client_data = cinfo->client_data; /* ignore Purify complaint here */
|
||||||
MEMZERO(cinfo, SIZEOF(struct jpeg_compress_struct));
|
MEMZERO(cinfo, SIZEOF(struct jpeg_compress_struct));
|
||||||
cinfo->err = err;
|
cinfo->err = err;
|
||||||
|
cinfo->client_data = client_data;
|
||||||
}
|
}
|
||||||
cinfo->is_decompressor = FALSE;
|
cinfo->is_decompressor = FALSE;
|
||||||
|
|
||||||
@@ -66,6 +71,8 @@ jpeg_CreateCompress (j_compress_ptr cinfo, int version, size_t structsize)
|
|||||||
cinfo->ac_huff_tbl_ptrs[i] = NULL;
|
cinfo->ac_huff_tbl_ptrs[i] = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
cinfo->script_space = NULL;
|
||||||
|
|
||||||
cinfo->input_gamma = 1.0; /* in case application forgets */
|
cinfo->input_gamma = 1.0; /* in case application forgets */
|
||||||
|
|
||||||
/* OK, I'm ready */
|
/* OK, I'm ready */
|
||||||
@@ -185,13 +192,40 @@ GLOBAL(void)
|
|||||||
jpeg_write_marker (j_compress_ptr cinfo, int marker,
|
jpeg_write_marker (j_compress_ptr cinfo, int marker,
|
||||||
const JOCTET *dataptr, unsigned int datalen)
|
const JOCTET *dataptr, unsigned int datalen)
|
||||||
{
|
{
|
||||||
|
JMETHOD(void, write_marker_byte, (j_compress_ptr info, int val));
|
||||||
|
|
||||||
if (cinfo->next_scanline != 0 ||
|
if (cinfo->next_scanline != 0 ||
|
||||||
(cinfo->global_state != CSTATE_SCANNING &&
|
(cinfo->global_state != CSTATE_SCANNING &&
|
||||||
cinfo->global_state != CSTATE_RAW_OK &&
|
cinfo->global_state != CSTATE_RAW_OK &&
|
||||||
cinfo->global_state != CSTATE_WRCOEFS))
|
cinfo->global_state != CSTATE_WRCOEFS))
|
||||||
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
||||||
|
|
||||||
(*cinfo->marker->write_any_marker) (cinfo, marker, dataptr, datalen);
|
(*cinfo->marker->write_marker_header) (cinfo, marker, datalen);
|
||||||
|
write_marker_byte = cinfo->marker->write_marker_byte; /* copy for speed */
|
||||||
|
while (datalen--) {
|
||||||
|
(*write_marker_byte) (cinfo, *dataptr);
|
||||||
|
dataptr++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Same, but piecemeal. */
|
||||||
|
|
||||||
|
GLOBAL(void)
|
||||||
|
jpeg_write_m_header (j_compress_ptr cinfo, int marker, unsigned int datalen)
|
||||||
|
{
|
||||||
|
if (cinfo->next_scanline != 0 ||
|
||||||
|
(cinfo->global_state != CSTATE_SCANNING &&
|
||||||
|
cinfo->global_state != CSTATE_RAW_OK &&
|
||||||
|
cinfo->global_state != CSTATE_WRCOEFS))
|
||||||
|
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
||||||
|
|
||||||
|
(*cinfo->marker->write_marker_header) (cinfo, marker, datalen);
|
||||||
|
}
|
||||||
|
|
||||||
|
GLOBAL(void)
|
||||||
|
jpeg_write_m_byte (j_compress_ptr cinfo, int val)
|
||||||
|
{
|
||||||
|
(*cinfo->marker->write_marker_byte) (cinfo, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -231,6 +265,16 @@ jpeg_write_tables (j_compress_ptr cinfo)
|
|||||||
(*cinfo->marker->write_tables_only) (cinfo);
|
(*cinfo->marker->write_tables_only) (cinfo);
|
||||||
/* And clean up. */
|
/* And clean up. */
|
||||||
(*cinfo->dest->term_destination) (cinfo);
|
(*cinfo->dest->term_destination) (cinfo);
|
||||||
/* We can use jpeg_abort to release memory. */
|
/*
|
||||||
jpeg_abort((j_common_ptr) cinfo);
|
* In library releases up through v6a, we called jpeg_abort() here to free
|
||||||
|
* any working memory allocated by the destination manager and marker
|
||||||
|
* writer. Some applications had a problem with that: they allocated space
|
||||||
|
* of their own from the library memory manager, and didn't want it to go
|
||||||
|
* away during write_tables. So now we do nothing. This will cause a
|
||||||
|
* memory leak if an app calls write_tables repeatedly without doing a full
|
||||||
|
* compression cycle or otherwise resetting the JPEG object. However, that
|
||||||
|
* seems less bad than unexpectedly freeing memory in the normal case.
|
||||||
|
* An app that prefers the old behavior can call jpeg_abort for itself after
|
||||||
|
* each call to jpeg_write_tables().
|
||||||
|
*/
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jccoefct.c
|
* jccoefct.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1996, Thomas G. Lane.
|
* Copyright (C) 1994-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -135,8 +135,8 @@ start_pass_coef (j_compress_ptr cinfo, J_BUF_MODE pass_mode)
|
|||||||
* per call, ie, v_samp_factor block rows for each component in the image.
|
* per call, ie, v_samp_factor block rows for each component in the image.
|
||||||
* Returns TRUE if the iMCU row is completed, FALSE if suspended.
|
* Returns TRUE if the iMCU row is completed, FALSE if suspended.
|
||||||
*
|
*
|
||||||
* NB: input_buf contains a plane for each component in image.
|
* NB: input_buf contains a plane for each component in image,
|
||||||
* For single pass, this is the same as the components in the scan.
|
* which we index according to the component's SOF position.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
METHODDEF(boolean)
|
METHODDEF(boolean)
|
||||||
@@ -175,7 +175,8 @@ compress_data (j_compress_ptr cinfo, JSAMPIMAGE input_buf)
|
|||||||
if (coef->iMCU_row_num < last_iMCU_row ||
|
if (coef->iMCU_row_num < last_iMCU_row ||
|
||||||
yoffset+yindex < compptr->last_row_height) {
|
yoffset+yindex < compptr->last_row_height) {
|
||||||
(*cinfo->fdct->forward_DCT) (cinfo, compptr,
|
(*cinfo->fdct->forward_DCT) (cinfo, compptr,
|
||||||
input_buf[ci], coef->MCU_buffer[blkn],
|
input_buf[compptr->component_index],
|
||||||
|
coef->MCU_buffer[blkn],
|
||||||
ypos, xpos, (JDIMENSION) blockcnt);
|
ypos, xpos, (JDIMENSION) blockcnt);
|
||||||
if (blockcnt < compptr->MCU_width) {
|
if (blockcnt < compptr->MCU_width) {
|
||||||
/* Create some dummy blocks at the right edge of the image. */
|
/* Create some dummy blocks at the right edge of the image. */
|
||||||
|
|||||||
513
jccolmmx.asm
Normal file
513
jccolmmx.asm
Normal file
@@ -0,0 +1,513 @@
|
|||||||
|
;
|
||||||
|
; jccolmmx.asm - colorspace conversion (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
%ifdef JCCOLOR_RGBYCC_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define SCALEBITS 16
|
||||||
|
|
||||||
|
F_0_081 equ 5329 ; FIX(0.08131)
|
||||||
|
F_0_114 equ 7471 ; FIX(0.11400)
|
||||||
|
F_0_168 equ 11059 ; FIX(0.16874)
|
||||||
|
F_0_250 equ 16384 ; FIX(0.25000)
|
||||||
|
F_0_299 equ 19595 ; FIX(0.29900)
|
||||||
|
F_0_331 equ 21709 ; FIX(0.33126)
|
||||||
|
F_0_418 equ 27439 ; FIX(0.41869)
|
||||||
|
F_0_587 equ 38470 ; FIX(0.58700)
|
||||||
|
F_0_337 equ (F_0_587 - F_0_250) ; FIX(0.58700) - FIX(0.25000)
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_rgb_ycc_convert_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_rgb_ycc_convert_mmx):
|
||||||
|
|
||||||
|
PW_F0299_F0337 times 2 dw F_0_299, F_0_337
|
||||||
|
PW_F0114_F0250 times 2 dw F_0_114, F_0_250
|
||||||
|
PW_MF016_MF033 times 2 dw -F_0_168,-F_0_331
|
||||||
|
PW_MF008_MF041 times 2 dw -F_0_081,-F_0_418
|
||||||
|
PD_ONEHALFM1_CJ times 2 dd (1 << (SCALEBITS-1)) - 1 + (CENTERJSAMPLE << SCALEBITS)
|
||||||
|
PD_ONEHALF times 2 dd (1 << (SCALEBITS-1))
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Convert some rows of samples to the output colorspace.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_rgb_ycc_convert_mmx (j_compress_ptr cinfo,
|
||||||
|
; JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
|
||||||
|
; JDIMENSION output_row, int num_rows);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_compress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPARRAY input_buf
|
||||||
|
%define output_buf(b) (b)+16 ; JSAMPIMAGE output_buf
|
||||||
|
%define output_row(b) (b)+20 ; JDIMENSION output_row
|
||||||
|
%define num_rows(b) (b)+24 ; int num_rows
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 8
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_rgb_ycc_convert_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_rgb_ycc_convert_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(eax)]
|
||||||
|
mov ecx, JDIMENSION [jcstruct_image_width(ecx)] ; num_cols
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov esi, JSAMPIMAGE [output_buf(eax)]
|
||||||
|
mov ecx, JDIMENSION [output_row(eax)]
|
||||||
|
mov edi, JSAMPARRAY [esi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [esi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [esi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
lea edi, [edi+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea ebx, [ebx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea edx, [edx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_buf(eax)]
|
||||||
|
mov eax, INT [num_rows(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
pushpic eax
|
||||||
|
push edx
|
||||||
|
push ebx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
push ecx ; col
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr0
|
||||||
|
mov ebx, JSAMPROW [ebx] ; outptr1
|
||||||
|
mov edx, JSAMPROW [edx] ; outptr2
|
||||||
|
movpic eax, POINTER [gotptr] ; load GOT address (eax)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jae short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 ; ---------------
|
||||||
|
|
||||||
|
.column_ld1:
|
||||||
|
push eax
|
||||||
|
push edx
|
||||||
|
lea ecx,[ecx+ecx*2] ; imul ecx,RGB_PIXELSIZE
|
||||||
|
test cl, SIZEOF_BYTE
|
||||||
|
jz short .column_ld2
|
||||||
|
sub ecx, byte SIZEOF_BYTE
|
||||||
|
xor eax,eax
|
||||||
|
mov al, BYTE [esi+ecx]
|
||||||
|
.column_ld2:
|
||||||
|
test cl, SIZEOF_WORD
|
||||||
|
jz short .column_ld4
|
||||||
|
sub ecx, byte SIZEOF_WORD
|
||||||
|
xor edx,edx
|
||||||
|
mov dx, WORD [esi+ecx]
|
||||||
|
shl eax, WORD_BIT
|
||||||
|
or eax,edx
|
||||||
|
.column_ld4:
|
||||||
|
movd mmA,eax
|
||||||
|
pop edx
|
||||||
|
pop eax
|
||||||
|
test cl, SIZEOF_DWORD
|
||||||
|
jz short .column_ld8
|
||||||
|
sub ecx, byte SIZEOF_DWORD
|
||||||
|
movd mmG, DWORD [esi+ecx]
|
||||||
|
psllq mmA, DWORD_BIT
|
||||||
|
por mmA,mmG
|
||||||
|
.column_ld8:
|
||||||
|
test cl, SIZEOF_MMWORD
|
||||||
|
jz short .column_ld16
|
||||||
|
movq mmG,mmA
|
||||||
|
movq mmA, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
mov ecx, SIZEOF_MMWORD
|
||||||
|
jmp short .rgb_ycc_cnv
|
||||||
|
.column_ld16:
|
||||||
|
test cl, 2*SIZEOF_MMWORD
|
||||||
|
mov ecx, SIZEOF_MMWORD
|
||||||
|
jz short .rgb_ycc_cnv
|
||||||
|
movq mmF,mmA
|
||||||
|
movq mmA, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mmG, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
jmp short .rgb_ycc_cnv
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movq mmA, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mmG, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
movq mmF, MMWORD [esi+2*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
.rgb_ycc_cnv:
|
||||||
|
; mmA=(00 10 20 01 11 21 02 12)
|
||||||
|
; mmG=(22 03 13 23 04 14 24 05)
|
||||||
|
; mmF=(15 25 06 16 26 07 17 27)
|
||||||
|
|
||||||
|
movq mmD,mmA
|
||||||
|
psllq mmA,4*BYTE_BIT ; mmA=(-- -- -- -- 00 10 20 01)
|
||||||
|
psrlq mmD,4*BYTE_BIT ; mmD=(11 21 02 12 -- -- -- --)
|
||||||
|
|
||||||
|
punpckhbw mmA,mmG ; mmA=(00 04 10 14 20 24 01 05)
|
||||||
|
psllq mmG,4*BYTE_BIT ; mmG=(-- -- -- -- 22 03 13 23)
|
||||||
|
|
||||||
|
punpcklbw mmD,mmF ; mmD=(11 15 21 25 02 06 12 16)
|
||||||
|
punpckhbw mmG,mmF ; mmG=(22 26 03 07 13 17 23 27)
|
||||||
|
|
||||||
|
movq mmE,mmA
|
||||||
|
psllq mmA,4*BYTE_BIT ; mmA=(-- -- -- -- 00 04 10 14)
|
||||||
|
psrlq mmE,4*BYTE_BIT ; mmE=(20 24 01 05 -- -- -- --)
|
||||||
|
|
||||||
|
punpckhbw mmA,mmD ; mmA=(00 02 04 06 10 12 14 16)
|
||||||
|
psllq mmD,4*BYTE_BIT ; mmD=(-- -- -- -- 11 15 21 25)
|
||||||
|
|
||||||
|
punpcklbw mmE,mmG ; mmE=(20 22 24 26 01 03 05 07)
|
||||||
|
punpckhbw mmD,mmG ; mmD=(11 13 15 17 21 23 25 27)
|
||||||
|
|
||||||
|
pxor mmH,mmH
|
||||||
|
|
||||||
|
movq mmC,mmA
|
||||||
|
punpcklbw mmA,mmH ; mmA=(00 02 04 06)
|
||||||
|
punpckhbw mmC,mmH ; mmC=(10 12 14 16)
|
||||||
|
|
||||||
|
movq mmB,mmE
|
||||||
|
punpcklbw mmE,mmH ; mmE=(20 22 24 26)
|
||||||
|
punpckhbw mmB,mmH ; mmB=(01 03 05 07)
|
||||||
|
|
||||||
|
movq mmF,mmD
|
||||||
|
punpcklbw mmD,mmH ; mmD=(11 13 15 17)
|
||||||
|
punpckhbw mmF,mmH ; mmF=(21 23 25 27)
|
||||||
|
|
||||||
|
%else ; RGB_PIXELSIZE == 4 ; -----------
|
||||||
|
|
||||||
|
.column_ld1:
|
||||||
|
test cl, SIZEOF_MMWORD/8
|
||||||
|
jz short .column_ld2
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/8
|
||||||
|
movd mmA, DWORD [esi+ecx*RGB_PIXELSIZE]
|
||||||
|
.column_ld2:
|
||||||
|
test cl, SIZEOF_MMWORD/4
|
||||||
|
jz short .column_ld4
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/4
|
||||||
|
movq mmF,mmA
|
||||||
|
movq mmA, MMWORD [esi+ecx*RGB_PIXELSIZE]
|
||||||
|
.column_ld4:
|
||||||
|
test cl, SIZEOF_MMWORD/2
|
||||||
|
mov ecx, SIZEOF_MMWORD
|
||||||
|
jz short .rgb_ycc_cnv
|
||||||
|
movq mmD,mmA
|
||||||
|
movq mmC,mmF
|
||||||
|
movq mmA, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mmF, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
jmp short .rgb_ycc_cnv
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movq mmA, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mmF, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
movq mmD, MMWORD [esi+2*SIZEOF_MMWORD]
|
||||||
|
movq mmC, MMWORD [esi+3*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
.rgb_ycc_cnv:
|
||||||
|
; mmA=(00 10 20 30 01 11 21 31)
|
||||||
|
; mmF=(02 12 22 32 03 13 23 33)
|
||||||
|
; mmD=(04 14 24 34 05 15 25 35)
|
||||||
|
; mmC=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movq mmB,mmA
|
||||||
|
punpcklbw mmA,mmF ; mmA=(00 02 10 12 20 22 30 32)
|
||||||
|
punpckhbw mmB,mmF ; mmB=(01 03 11 13 21 23 31 33)
|
||||||
|
|
||||||
|
movq mmG,mmD
|
||||||
|
punpcklbw mmD,mmC ; mmD=(04 06 14 16 24 26 34 36)
|
||||||
|
punpckhbw mmG,mmC ; mmG=(05 07 15 17 25 27 35 37)
|
||||||
|
|
||||||
|
movq mmE,mmA
|
||||||
|
punpcklwd mmA,mmD ; mmA=(00 02 04 06 10 12 14 16)
|
||||||
|
punpckhwd mmE,mmD ; mmE=(20 22 24 26 30 32 34 36)
|
||||||
|
|
||||||
|
movq mmH,mmB
|
||||||
|
punpcklwd mmB,mmG ; mmB=(01 03 05 07 11 13 15 17)
|
||||||
|
punpckhwd mmH,mmG ; mmH=(21 23 25 27 31 33 35 37)
|
||||||
|
|
||||||
|
pxor mmF,mmF
|
||||||
|
|
||||||
|
movq mmC,mmA
|
||||||
|
punpcklbw mmA,mmF ; mmA=(00 02 04 06)
|
||||||
|
punpckhbw mmC,mmF ; mmC=(10 12 14 16)
|
||||||
|
|
||||||
|
movq mmD,mmB
|
||||||
|
punpcklbw mmB,mmF ; mmB=(01 03 05 07)
|
||||||
|
punpckhbw mmD,mmF ; mmD=(11 13 15 17)
|
||||||
|
|
||||||
|
movq mmG,mmE
|
||||||
|
punpcklbw mmE,mmF ; mmE=(20 22 24 26)
|
||||||
|
punpckhbw mmG,mmF ; mmG=(30 32 34 36)
|
||||||
|
|
||||||
|
punpcklbw mmF,mmH
|
||||||
|
punpckhbw mmH,mmH
|
||||||
|
psrlw mmF,BYTE_BIT ; mmF=(21 23 25 27)
|
||||||
|
psrlw mmH,BYTE_BIT ; mmH=(31 33 35 37)
|
||||||
|
|
||||||
|
%endif ; RGB_PIXELSIZE ; ---------------
|
||||||
|
|
||||||
|
; mm0=(R0 R2 R4 R6)=RE, mm2=(G0 G2 G4 G6)=GE, mm4=(B0 B2 B4 B6)=BE
|
||||||
|
; mm1=(R1 R3 R5 R7)=RO, mm3=(G1 G3 G5 G7)=GO, mm5=(B1 B3 B5 B7)=BO
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; Y = 0.29900 * R + 0.58700 * G + 0.11400 * B
|
||||||
|
; Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + CENTERJSAMPLE
|
||||||
|
; Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + CENTERJSAMPLE
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; Y = 0.29900 * R + 0.33700 * G + 0.11400 * B + 0.25000 * G
|
||||||
|
; Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + CENTERJSAMPLE
|
||||||
|
; Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + CENTERJSAMPLE
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm0 ; wk(0)=RE
|
||||||
|
movq MMWORD [wk(1)], mm1 ; wk(1)=RO
|
||||||
|
movq MMWORD [wk(2)], mm4 ; wk(2)=BE
|
||||||
|
movq MMWORD [wk(3)], mm5 ; wk(3)=BO
|
||||||
|
|
||||||
|
movq mm6,mm1
|
||||||
|
punpcklwd mm1,mm3
|
||||||
|
punpckhwd mm6,mm3
|
||||||
|
movq mm7,mm1
|
||||||
|
movq mm4,mm6
|
||||||
|
pmaddwd mm1,[GOTOFF(eax,PW_F0299_F0337)] ; mm1=ROL*FIX(0.299)+GOL*FIX(0.337)
|
||||||
|
pmaddwd mm6,[GOTOFF(eax,PW_F0299_F0337)] ; mm6=ROH*FIX(0.299)+GOH*FIX(0.337)
|
||||||
|
pmaddwd mm7,[GOTOFF(eax,PW_MF016_MF033)] ; mm7=ROL*-FIX(0.168)+GOL*-FIX(0.331)
|
||||||
|
pmaddwd mm4,[GOTOFF(eax,PW_MF016_MF033)] ; mm4=ROH*-FIX(0.168)+GOH*-FIX(0.331)
|
||||||
|
|
||||||
|
movq MMWORD [wk(4)], mm1 ; wk(4)=ROL*FIX(0.299)+GOL*FIX(0.337)
|
||||||
|
movq MMWORD [wk(5)], mm6 ; wk(5)=ROH*FIX(0.299)+GOH*FIX(0.337)
|
||||||
|
|
||||||
|
pxor mm1,mm1
|
||||||
|
pxor mm6,mm6
|
||||||
|
punpcklwd mm1,mm5 ; mm1=BOL
|
||||||
|
punpckhwd mm6,mm5 ; mm6=BOH
|
||||||
|
psrld mm1,1 ; mm1=BOL*FIX(0.500)
|
||||||
|
psrld mm6,1 ; mm6=BOH*FIX(0.500)
|
||||||
|
|
||||||
|
movq mm5,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; mm5=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd mm7,mm1
|
||||||
|
paddd mm4,mm6
|
||||||
|
paddd mm7,mm5
|
||||||
|
paddd mm4,mm5
|
||||||
|
psrld mm7,SCALEBITS ; mm7=CbOL
|
||||||
|
psrld mm4,SCALEBITS ; mm4=CbOH
|
||||||
|
packssdw mm7,mm4 ; mm7=CbO
|
||||||
|
|
||||||
|
movq mm1, MMWORD [wk(2)] ; mm1=BE
|
||||||
|
|
||||||
|
movq mm6,mm0
|
||||||
|
punpcklwd mm0,mm2
|
||||||
|
punpckhwd mm6,mm2
|
||||||
|
movq mm5,mm0
|
||||||
|
movq mm4,mm6
|
||||||
|
pmaddwd mm0,[GOTOFF(eax,PW_F0299_F0337)] ; mm0=REL*FIX(0.299)+GEL*FIX(0.337)
|
||||||
|
pmaddwd mm6,[GOTOFF(eax,PW_F0299_F0337)] ; mm6=REH*FIX(0.299)+GEH*FIX(0.337)
|
||||||
|
pmaddwd mm5,[GOTOFF(eax,PW_MF016_MF033)] ; mm5=REL*-FIX(0.168)+GEL*-FIX(0.331)
|
||||||
|
pmaddwd mm4,[GOTOFF(eax,PW_MF016_MF033)] ; mm4=REH*-FIX(0.168)+GEH*-FIX(0.331)
|
||||||
|
|
||||||
|
movq MMWORD [wk(6)], mm0 ; wk(6)=REL*FIX(0.299)+GEL*FIX(0.337)
|
||||||
|
movq MMWORD [wk(7)], mm6 ; wk(7)=REH*FIX(0.299)+GEH*FIX(0.337)
|
||||||
|
|
||||||
|
pxor mm0,mm0
|
||||||
|
pxor mm6,mm6
|
||||||
|
punpcklwd mm0,mm1 ; mm0=BEL
|
||||||
|
punpckhwd mm6,mm1 ; mm6=BEH
|
||||||
|
psrld mm0,1 ; mm0=BEL*FIX(0.500)
|
||||||
|
psrld mm6,1 ; mm6=BEH*FIX(0.500)
|
||||||
|
|
||||||
|
movq mm1,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; mm1=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd mm5,mm0
|
||||||
|
paddd mm4,mm6
|
||||||
|
paddd mm5,mm1
|
||||||
|
paddd mm4,mm1
|
||||||
|
psrld mm5,SCALEBITS ; mm5=CbEL
|
||||||
|
psrld mm4,SCALEBITS ; mm4=CbEH
|
||||||
|
packssdw mm5,mm4 ; mm5=CbE
|
||||||
|
|
||||||
|
psllw mm7,BYTE_BIT
|
||||||
|
por mm5,mm7 ; mm5=Cb
|
||||||
|
movq MMWORD [ebx], mm5 ; Save Cb
|
||||||
|
|
||||||
|
movq mm0, MMWORD [wk(3)] ; mm0=BO
|
||||||
|
movq mm6, MMWORD [wk(2)] ; mm6=BE
|
||||||
|
movq mm1, MMWORD [wk(1)] ; mm1=RO
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
punpcklwd mm0,mm3
|
||||||
|
punpckhwd mm4,mm3
|
||||||
|
movq mm7,mm0
|
||||||
|
movq mm5,mm4
|
||||||
|
pmaddwd mm0,[GOTOFF(eax,PW_F0114_F0250)] ; mm0=BOL*FIX(0.114)+GOL*FIX(0.250)
|
||||||
|
pmaddwd mm4,[GOTOFF(eax,PW_F0114_F0250)] ; mm4=BOH*FIX(0.114)+GOH*FIX(0.250)
|
||||||
|
pmaddwd mm7,[GOTOFF(eax,PW_MF008_MF041)] ; mm7=BOL*-FIX(0.081)+GOL*-FIX(0.418)
|
||||||
|
pmaddwd mm5,[GOTOFF(eax,PW_MF008_MF041)] ; mm5=BOH*-FIX(0.081)+GOH*-FIX(0.418)
|
||||||
|
|
||||||
|
movq mm3,[GOTOFF(eax,PD_ONEHALF)] ; mm3=[PD_ONEHALF]
|
||||||
|
|
||||||
|
paddd mm0, MMWORD [wk(4)]
|
||||||
|
paddd mm4, MMWORD [wk(5)]
|
||||||
|
paddd mm0,mm3
|
||||||
|
paddd mm4,mm3
|
||||||
|
psrld mm0,SCALEBITS ; mm0=YOL
|
||||||
|
psrld mm4,SCALEBITS ; mm4=YOH
|
||||||
|
packssdw mm0,mm4 ; mm0=YO
|
||||||
|
|
||||||
|
pxor mm3,mm3
|
||||||
|
pxor mm4,mm4
|
||||||
|
punpcklwd mm3,mm1 ; mm3=ROL
|
||||||
|
punpckhwd mm4,mm1 ; mm4=ROH
|
||||||
|
psrld mm3,1 ; mm3=ROL*FIX(0.500)
|
||||||
|
psrld mm4,1 ; mm4=ROH*FIX(0.500)
|
||||||
|
|
||||||
|
movq mm1,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; mm1=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd mm7,mm3
|
||||||
|
paddd mm5,mm4
|
||||||
|
paddd mm7,mm1
|
||||||
|
paddd mm5,mm1
|
||||||
|
psrld mm7,SCALEBITS ; mm7=CrOL
|
||||||
|
psrld mm5,SCALEBITS ; mm5=CrOH
|
||||||
|
packssdw mm7,mm5 ; mm7=CrO
|
||||||
|
|
||||||
|
movq mm3, MMWORD [wk(0)] ; mm3=RE
|
||||||
|
|
||||||
|
movq mm4,mm6
|
||||||
|
punpcklwd mm6,mm2
|
||||||
|
punpckhwd mm4,mm2
|
||||||
|
movq mm1,mm6
|
||||||
|
movq mm5,mm4
|
||||||
|
pmaddwd mm6,[GOTOFF(eax,PW_F0114_F0250)] ; mm6=BEL*FIX(0.114)+GEL*FIX(0.250)
|
||||||
|
pmaddwd mm4,[GOTOFF(eax,PW_F0114_F0250)] ; mm4=BEH*FIX(0.114)+GEH*FIX(0.250)
|
||||||
|
pmaddwd mm1,[GOTOFF(eax,PW_MF008_MF041)] ; mm1=BEL*-FIX(0.081)+GEL*-FIX(0.418)
|
||||||
|
pmaddwd mm5,[GOTOFF(eax,PW_MF008_MF041)] ; mm5=BEH*-FIX(0.081)+GEH*-FIX(0.418)
|
||||||
|
|
||||||
|
movq mm2,[GOTOFF(eax,PD_ONEHALF)] ; mm2=[PD_ONEHALF]
|
||||||
|
|
||||||
|
paddd mm6, MMWORD [wk(6)]
|
||||||
|
paddd mm4, MMWORD [wk(7)]
|
||||||
|
paddd mm6,mm2
|
||||||
|
paddd mm4,mm2
|
||||||
|
psrld mm6,SCALEBITS ; mm6=YEL
|
||||||
|
psrld mm4,SCALEBITS ; mm4=YEH
|
||||||
|
packssdw mm6,mm4 ; mm6=YE
|
||||||
|
|
||||||
|
psllw mm0,BYTE_BIT
|
||||||
|
por mm6,mm0 ; mm6=Y
|
||||||
|
movq MMWORD [edi], mm6 ; Save Y
|
||||||
|
|
||||||
|
pxor mm2,mm2
|
||||||
|
pxor mm4,mm4
|
||||||
|
punpcklwd mm2,mm3 ; mm2=REL
|
||||||
|
punpckhwd mm4,mm3 ; mm4=REH
|
||||||
|
psrld mm2,1 ; mm2=REL*FIX(0.500)
|
||||||
|
psrld mm4,1 ; mm4=REH*FIX(0.500)
|
||||||
|
|
||||||
|
movq mm0,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; mm0=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd mm1,mm2
|
||||||
|
paddd mm5,mm4
|
||||||
|
paddd mm1,mm0
|
||||||
|
paddd mm5,mm0
|
||||||
|
psrld mm1,SCALEBITS ; mm1=CrEL
|
||||||
|
psrld mm5,SCALEBITS ; mm5=CrEH
|
||||||
|
packssdw mm1,mm5 ; mm1=CrE
|
||||||
|
|
||||||
|
psllw mm7,BYTE_BIT
|
||||||
|
por mm1,mm7 ; mm1=Cr
|
||||||
|
movq MMWORD [edx], mm1 ; Save Cr
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
add esi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; inptr
|
||||||
|
add edi, byte SIZEOF_MMWORD ; outptr0
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; outptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; outptr2
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jae near .columnloop
|
||||||
|
test ecx,ecx
|
||||||
|
jnz near .column_ld1
|
||||||
|
|
||||||
|
pop ecx ; col
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ebx
|
||||||
|
pop edx
|
||||||
|
poppic eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_buf
|
||||||
|
add edi, byte SIZEOF_JSAMPROW
|
||||||
|
add ebx, byte SIZEOF_JSAMPROW
|
||||||
|
add edx, byte SIZEOF_JSAMPROW
|
||||||
|
dec eax ; num_rows
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JCCOLOR_RGBYCC_MMX_SUPPORTED
|
||||||
|
%endif ; RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
49
jccolor.c
49
jccolor.c
@@ -5,12 +5,20 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 5, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains input colorspace conversion routines.
|
* This file contains input colorspace conversion routines.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#define JPEG_INTERNALS
|
#define JPEG_INTERNALS
|
||||||
#include "jinclude.h"
|
#include "jinclude.h"
|
||||||
#include "jpeglib.h"
|
#include "jpeglib.h"
|
||||||
|
#include "jcolsamp.h" /* Private declarations */
|
||||||
|
|
||||||
|
|
||||||
/* Private subobject */
|
/* Private subobject */
|
||||||
@@ -352,6 +360,7 @@ GLOBAL(void)
|
|||||||
jinit_color_converter (j_compress_ptr cinfo)
|
jinit_color_converter (j_compress_ptr cinfo)
|
||||||
{
|
{
|
||||||
my_cconvert_ptr cconvert;
|
my_cconvert_ptr cconvert;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
cconvert = (my_cconvert_ptr)
|
cconvert = (my_cconvert_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -420,8 +429,23 @@ jinit_color_converter (j_compress_ptr cinfo)
|
|||||||
if (cinfo->num_components != 3)
|
if (cinfo->num_components != 3)
|
||||||
ERREXIT(cinfo, JERR_BAD_J_COLORSPACE);
|
ERREXIT(cinfo, JERR_BAD_J_COLORSPACE);
|
||||||
if (cinfo->in_color_space == JCS_RGB) {
|
if (cinfo->in_color_space == JCS_RGB) {
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JCCOLOR_RGBYCC_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_rgb_ycc_convert_sse2)) {
|
||||||
|
cconvert->pub.color_convert = jpeg_rgb_ycc_convert_sse2;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#ifdef JCCOLOR_RGBYCC_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX) {
|
||||||
|
cconvert->pub.color_convert = jpeg_rgb_ycc_convert_mmx;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
{
|
||||||
cconvert->pub.start_pass = rgb_ycc_start;
|
cconvert->pub.start_pass = rgb_ycc_start;
|
||||||
cconvert->pub.color_convert = rgb_ycc_convert;
|
cconvert->pub.color_convert = rgb_ycc_convert;
|
||||||
|
}
|
||||||
} else if (cinfo->in_color_space == JCS_YCbCr)
|
} else if (cinfo->in_color_space == JCS_YCbCr)
|
||||||
cconvert->pub.color_convert = null_convert;
|
cconvert->pub.color_convert = null_convert;
|
||||||
else
|
else
|
||||||
@@ -457,3 +481,28 @@ jinit_color_converter (j_compress_ptr cinfo)
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_color_converter (j_compress_ptr cinfo)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JCCOLOR_RGBYCC_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_rgb_ycc_convert_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JCCOLOR_RGBYCC_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
|
||||||
|
return JSIMD_NONE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|||||||
541
jccolss2.asm
Normal file
541
jccolss2.asm
Normal file
@@ -0,0 +1,541 @@
|
|||||||
|
;
|
||||||
|
; jccolss2.asm - colorspace conversion (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
%ifdef JCCOLOR_RGBYCC_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define SCALEBITS 16
|
||||||
|
|
||||||
|
F_0_081 equ 5329 ; FIX(0.08131)
|
||||||
|
F_0_114 equ 7471 ; FIX(0.11400)
|
||||||
|
F_0_168 equ 11059 ; FIX(0.16874)
|
||||||
|
F_0_250 equ 16384 ; FIX(0.25000)
|
||||||
|
F_0_299 equ 19595 ; FIX(0.29900)
|
||||||
|
F_0_331 equ 21709 ; FIX(0.33126)
|
||||||
|
F_0_418 equ 27439 ; FIX(0.41869)
|
||||||
|
F_0_587 equ 38470 ; FIX(0.58700)
|
||||||
|
F_0_337 equ (F_0_587 - F_0_250) ; FIX(0.58700) - FIX(0.25000)
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_rgb_ycc_convert_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_rgb_ycc_convert_sse2):
|
||||||
|
|
||||||
|
PW_F0299_F0337 times 4 dw F_0_299, F_0_337
|
||||||
|
PW_F0114_F0250 times 4 dw F_0_114, F_0_250
|
||||||
|
PW_MF016_MF033 times 4 dw -F_0_168,-F_0_331
|
||||||
|
PW_MF008_MF041 times 4 dw -F_0_081,-F_0_418
|
||||||
|
PD_ONEHALFM1_CJ times 4 dd (1 << (SCALEBITS-1)) - 1 + (CENTERJSAMPLE << SCALEBITS)
|
||||||
|
PD_ONEHALF times 4 dd (1 << (SCALEBITS-1))
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Convert some rows of samples to the output colorspace.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_rgb_ycc_convert_sse2 (j_compress_ptr cinfo,
|
||||||
|
; JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
|
||||||
|
; JDIMENSION output_row, int num_rows);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_compress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPARRAY input_buf
|
||||||
|
%define output_buf(b) (b)+16 ; JSAMPIMAGE output_buf
|
||||||
|
%define output_row(b) (b)+20 ; JDIMENSION output_row
|
||||||
|
%define num_rows(b) (b)+24 ; int num_rows
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 8
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_rgb_ycc_convert_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_rgb_ycc_convert_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(eax)]
|
||||||
|
mov ecx, JDIMENSION [jcstruct_image_width(ecx)] ; num_cols
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov esi, JSAMPIMAGE [output_buf(eax)]
|
||||||
|
mov ecx, JDIMENSION [output_row(eax)]
|
||||||
|
mov edi, JSAMPARRAY [esi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [esi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [esi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
lea edi, [edi+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea ebx, [ebx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea edx, [edx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_buf(eax)]
|
||||||
|
mov eax, INT [num_rows(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
pushpic eax
|
||||||
|
push edx
|
||||||
|
push ebx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
push ecx ; col
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr0
|
||||||
|
mov ebx, JSAMPROW [ebx] ; outptr1
|
||||||
|
mov edx, JSAMPROW [edx] ; outptr2
|
||||||
|
movpic eax, POINTER [gotptr] ; load GOT address (eax)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jae near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 ; ---------------
|
||||||
|
|
||||||
|
.column_ld1:
|
||||||
|
push eax
|
||||||
|
push edx
|
||||||
|
lea ecx,[ecx+ecx*2] ; imul ecx,RGB_PIXELSIZE
|
||||||
|
test cl, SIZEOF_BYTE
|
||||||
|
jz short .column_ld2
|
||||||
|
sub ecx, byte SIZEOF_BYTE
|
||||||
|
movzx eax, BYTE [esi+ecx]
|
||||||
|
.column_ld2:
|
||||||
|
test cl, SIZEOF_WORD
|
||||||
|
jz short .column_ld4
|
||||||
|
sub ecx, byte SIZEOF_WORD
|
||||||
|
movzx edx, WORD [esi+ecx]
|
||||||
|
shl eax, WORD_BIT
|
||||||
|
or eax,edx
|
||||||
|
.column_ld4:
|
||||||
|
movd xmmA,eax
|
||||||
|
pop edx
|
||||||
|
pop eax
|
||||||
|
test cl, SIZEOF_DWORD
|
||||||
|
jz short .column_ld8
|
||||||
|
sub ecx, byte SIZEOF_DWORD
|
||||||
|
movd xmmF, _DWORD [esi+ecx]
|
||||||
|
pslldq xmmA, SIZEOF_DWORD
|
||||||
|
por xmmA,xmmF
|
||||||
|
.column_ld8:
|
||||||
|
test cl, SIZEOF_MMWORD
|
||||||
|
jz short .column_ld16
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
movq xmmB, _MMWORD [esi+ecx]
|
||||||
|
pslldq xmmA, SIZEOF_MMWORD
|
||||||
|
por xmmA,xmmB
|
||||||
|
.column_ld16:
|
||||||
|
test cl, SIZEOF_XMMWORD
|
||||||
|
jz short .column_ld32
|
||||||
|
movdqa xmmF,xmmA
|
||||||
|
movdqu xmmA, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
mov ecx, SIZEOF_XMMWORD
|
||||||
|
jmp short .rgb_ycc_cnv
|
||||||
|
.column_ld32:
|
||||||
|
test cl, 2*SIZEOF_XMMWORD
|
||||||
|
mov ecx, SIZEOF_XMMWORD
|
||||||
|
jz short .rgb_ycc_cnv
|
||||||
|
movdqa xmmB,xmmA
|
||||||
|
movdqu xmmA, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmF, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
jmp short .rgb_ycc_cnv
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movdqu xmmA, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmF, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmB, XMMWORD [esi+2*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
.rgb_ycc_cnv:
|
||||||
|
; xmmA=(00 10 20 01 11 21 02 12 22 03 13 23 04 14 24 05)
|
||||||
|
; xmmF=(15 25 06 16 26 07 17 27 08 18 28 09 19 29 0A 1A)
|
||||||
|
; xmmB=(2A 0B 1B 2B 0C 1C 2C 0D 1D 2D 0E 1E 2E 0F 1F 2F)
|
||||||
|
|
||||||
|
movdqa xmmG,xmmA
|
||||||
|
pslldq xmmA,8 ; xmmA=(-- -- -- -- -- -- -- -- 00 10 20 01 11 21 02 12)
|
||||||
|
psrldq xmmG,8 ; xmmG=(22 03 13 23 04 14 24 05 -- -- -- -- -- -- -- --)
|
||||||
|
|
||||||
|
punpckhbw xmmA,xmmF ; xmmA=(00 08 10 18 20 28 01 09 11 19 21 29 02 0A 12 1A)
|
||||||
|
pslldq xmmF,8 ; xmmF=(-- -- -- -- -- -- -- -- 15 25 06 16 26 07 17 27)
|
||||||
|
|
||||||
|
punpcklbw xmmG,xmmB ; xmmG=(22 2A 03 0B 13 1B 23 2B 04 0C 14 1C 24 2C 05 0D)
|
||||||
|
punpckhbw xmmF,xmmB ; xmmF=(15 1D 25 2D 06 0E 16 1E 26 2E 07 0F 17 1F 27 2F)
|
||||||
|
|
||||||
|
movdqa xmmD,xmmA
|
||||||
|
pslldq xmmA,8 ; xmmA=(-- -- -- -- -- -- -- -- 00 08 10 18 20 28 01 09)
|
||||||
|
psrldq xmmD,8 ; xmmD=(11 19 21 29 02 0A 12 1A -- -- -- -- -- -- -- --)
|
||||||
|
|
||||||
|
punpckhbw xmmA,xmmG ; xmmA=(00 04 08 0C 10 14 18 1C 20 24 28 2C 01 05 09 0D)
|
||||||
|
pslldq xmmG,8 ; xmmG=(-- -- -- -- -- -- -- -- 22 2A 03 0B 13 1B 23 2B)
|
||||||
|
|
||||||
|
punpcklbw xmmD,xmmF ; xmmD=(11 15 19 1D 21 25 29 2D 02 06 0A 0E 12 16 1A 1E)
|
||||||
|
punpckhbw xmmG,xmmF ; xmmG=(22 26 2A 2E 03 07 0B 0F 13 17 1B 1F 23 27 2B 2F)
|
||||||
|
|
||||||
|
movdqa xmmE,xmmA
|
||||||
|
pslldq xmmA,8 ; xmmA=(-- -- -- -- -- -- -- -- 00 04 08 0C 10 14 18 1C)
|
||||||
|
psrldq xmmE,8 ; xmmE=(20 24 28 2C 01 05 09 0D -- -- -- -- -- -- -- --)
|
||||||
|
|
||||||
|
punpckhbw xmmA,xmmD ; xmmA=(00 02 04 06 08 0A 0C 0E 10 12 14 16 18 1A 1C 1E)
|
||||||
|
pslldq xmmD,8 ; xmmD=(-- -- -- -- -- -- -- -- 11 15 19 1D 21 25 29 2D)
|
||||||
|
|
||||||
|
punpcklbw xmmE,xmmG ; xmmE=(20 22 24 26 28 2A 2C 2E 01 03 05 07 09 0B 0D 0F)
|
||||||
|
punpckhbw xmmD,xmmG ; xmmD=(11 13 15 17 19 1B 1D 1F 21 23 25 27 29 2B 2D 2F)
|
||||||
|
|
||||||
|
pxor xmmH,xmmH
|
||||||
|
|
||||||
|
movdqa xmmC,xmmA
|
||||||
|
punpcklbw xmmA,xmmH ; xmmA=(00 02 04 06 08 0A 0C 0E)
|
||||||
|
punpckhbw xmmC,xmmH ; xmmC=(10 12 14 16 18 1A 1C 1E)
|
||||||
|
|
||||||
|
movdqa xmmB,xmmE
|
||||||
|
punpcklbw xmmE,xmmH ; xmmE=(20 22 24 26 28 2A 2C 2E)
|
||||||
|
punpckhbw xmmB,xmmH ; xmmB=(01 03 05 07 09 0B 0D 0F)
|
||||||
|
|
||||||
|
movdqa xmmF,xmmD
|
||||||
|
punpcklbw xmmD,xmmH ; xmmD=(11 13 15 17 19 1B 1D 1F)
|
||||||
|
punpckhbw xmmF,xmmH ; xmmF=(21 23 25 27 29 2B 2D 2F)
|
||||||
|
|
||||||
|
%else ; RGB_PIXELSIZE == 4 ; -----------
|
||||||
|
|
||||||
|
.column_ld1:
|
||||||
|
test cl, SIZEOF_XMMWORD/16
|
||||||
|
jz short .column_ld2
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD/16
|
||||||
|
movd xmmA, _DWORD [esi+ecx*RGB_PIXELSIZE]
|
||||||
|
.column_ld2:
|
||||||
|
test cl, SIZEOF_XMMWORD/8
|
||||||
|
jz short .column_ld4
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD/8
|
||||||
|
movq xmmE, _MMWORD [esi+ecx*RGB_PIXELSIZE]
|
||||||
|
pslldq xmmA, SIZEOF_MMWORD
|
||||||
|
por xmmA,xmmE
|
||||||
|
.column_ld4:
|
||||||
|
test cl, SIZEOF_XMMWORD/4
|
||||||
|
jz short .column_ld8
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD/4
|
||||||
|
movdqa xmmE,xmmA
|
||||||
|
movdqu xmmA, XMMWORD [esi+ecx*RGB_PIXELSIZE]
|
||||||
|
.column_ld8:
|
||||||
|
test cl, SIZEOF_XMMWORD/2
|
||||||
|
mov ecx, SIZEOF_XMMWORD
|
||||||
|
jz short .rgb_ycc_cnv
|
||||||
|
movdqa xmmF,xmmA
|
||||||
|
movdqa xmmH,xmmE
|
||||||
|
movdqu xmmA, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmE, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
jmp short .rgb_ycc_cnv
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movdqu xmmA, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmE, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmF, XMMWORD [esi+2*SIZEOF_XMMWORD]
|
||||||
|
movdqu xmmH, XMMWORD [esi+3*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
.rgb_ycc_cnv:
|
||||||
|
; xmmA=(00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33)
|
||||||
|
; xmmE=(04 14 24 34 05 15 25 35 06 16 26 36 07 17 27 37)
|
||||||
|
; xmmF=(08 18 28 38 09 19 29 39 0A 1A 2A 3A 0B 1B 2B 3B)
|
||||||
|
; xmmH=(0C 1C 2C 3C 0D 1D 2D 3D 0E 1E 2E 3E 0F 1F 2F 3F)
|
||||||
|
|
||||||
|
movdqa xmmD,xmmA
|
||||||
|
punpcklbw xmmA,xmmE ; xmmA=(00 04 10 14 20 24 30 34 01 05 11 15 21 25 31 35)
|
||||||
|
punpckhbw xmmD,xmmE ; xmmD=(02 06 12 16 22 26 32 36 03 07 13 17 23 27 33 37)
|
||||||
|
|
||||||
|
movdqa xmmC,xmmF
|
||||||
|
punpcklbw xmmF,xmmH ; xmmF=(08 0C 18 1C 28 2C 38 3C 09 0D 19 1D 29 2D 39 3D)
|
||||||
|
punpckhbw xmmC,xmmH ; xmmC=(0A 0E 1A 1E 2A 2E 3A 3E 0B 0F 1B 1F 2B 2F 3B 3F)
|
||||||
|
|
||||||
|
movdqa xmmB,xmmA
|
||||||
|
punpcklwd xmmA,xmmF ; xmmA=(00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C)
|
||||||
|
punpckhwd xmmB,xmmF ; xmmB=(01 05 09 0D 11 15 19 1D 21 25 29 2D 31 35 39 3D)
|
||||||
|
|
||||||
|
movdqa xmmG,xmmD
|
||||||
|
punpcklwd xmmD,xmmC ; xmmD=(02 06 0A 0E 12 16 1A 1E 22 26 2A 2E 32 36 3A 3E)
|
||||||
|
punpckhwd xmmG,xmmC ; xmmG=(03 07 0B 0F 13 17 1B 1F 23 27 2B 2F 33 37 3B 3F)
|
||||||
|
|
||||||
|
movdqa xmmE,xmmA
|
||||||
|
punpcklbw xmmA,xmmD ; xmmA=(00 02 04 06 08 0A 0C 0E 10 12 14 16 18 1A 1C 1E)
|
||||||
|
punpckhbw xmmE,xmmD ; xmmE=(20 22 24 26 28 2A 2C 2E 30 32 34 36 38 3A 3C 3E)
|
||||||
|
|
||||||
|
movdqa xmmH,xmmB
|
||||||
|
punpcklbw xmmB,xmmG ; xmmB=(01 03 05 07 09 0B 0D 0F 11 13 15 17 19 1B 1D 1F)
|
||||||
|
punpckhbw xmmH,xmmG ; xmmH=(21 23 25 27 29 2B 2D 2F 31 33 35 37 39 3B 3D 3F)
|
||||||
|
|
||||||
|
pxor xmmF,xmmF
|
||||||
|
|
||||||
|
movdqa xmmC,xmmA
|
||||||
|
punpcklbw xmmA,xmmF ; xmmA=(00 02 04 06 08 0A 0C 0E)
|
||||||
|
punpckhbw xmmC,xmmF ; xmmC=(10 12 14 16 18 1A 1C 1E)
|
||||||
|
|
||||||
|
movdqa xmmD,xmmB
|
||||||
|
punpcklbw xmmB,xmmF ; xmmB=(01 03 05 07 09 0B 0D 0F)
|
||||||
|
punpckhbw xmmD,xmmF ; xmmD=(11 13 15 17 19 1B 1D 1F)
|
||||||
|
|
||||||
|
movdqa xmmG,xmmE
|
||||||
|
punpcklbw xmmE,xmmF ; xmmE=(20 22 24 26 28 2A 2C 2E)
|
||||||
|
punpckhbw xmmG,xmmF ; xmmG=(30 32 34 36 38 3A 3C 3E)
|
||||||
|
|
||||||
|
punpcklbw xmmF,xmmH
|
||||||
|
punpckhbw xmmH,xmmH
|
||||||
|
psrlw xmmF,BYTE_BIT ; xmmF=(21 23 25 27 29 2B 2D 2F)
|
||||||
|
psrlw xmmH,BYTE_BIT ; xmmH=(31 33 35 37 39 3B 3D 3F)
|
||||||
|
|
||||||
|
%endif ; RGB_PIXELSIZE ; ---------------
|
||||||
|
|
||||||
|
; xmm0=R(02468ACE)=RE, xmm2=G(02468ACE)=GE, xmm4=B(02468ACE)=BE
|
||||||
|
; xmm1=R(13579BDF)=RO, xmm3=G(13579BDF)=GO, xmm5=B(13579BDF)=BO
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; Y = 0.29900 * R + 0.58700 * G + 0.11400 * B
|
||||||
|
; Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + CENTERJSAMPLE
|
||||||
|
; Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + CENTERJSAMPLE
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; Y = 0.29900 * R + 0.33700 * G + 0.11400 * B + 0.25000 * G
|
||||||
|
; Cb = -0.16874 * R - 0.33126 * G + 0.50000 * B + CENTERJSAMPLE
|
||||||
|
; Cr = 0.50000 * R - 0.41869 * G - 0.08131 * B + CENTERJSAMPLE
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm0 ; wk(0)=RE
|
||||||
|
movdqa XMMWORD [wk(1)], xmm1 ; wk(1)=RO
|
||||||
|
movdqa XMMWORD [wk(2)], xmm4 ; wk(2)=BE
|
||||||
|
movdqa XMMWORD [wk(3)], xmm5 ; wk(3)=BO
|
||||||
|
|
||||||
|
movdqa xmm6,xmm1
|
||||||
|
punpcklwd xmm1,xmm3
|
||||||
|
punpckhwd xmm6,xmm3
|
||||||
|
movdqa xmm7,xmm1
|
||||||
|
movdqa xmm4,xmm6
|
||||||
|
pmaddwd xmm1,[GOTOFF(eax,PW_F0299_F0337)] ; xmm1=ROL*FIX(0.299)+GOL*FIX(0.337)
|
||||||
|
pmaddwd xmm6,[GOTOFF(eax,PW_F0299_F0337)] ; xmm6=ROH*FIX(0.299)+GOH*FIX(0.337)
|
||||||
|
pmaddwd xmm7,[GOTOFF(eax,PW_MF016_MF033)] ; xmm7=ROL*-FIX(0.168)+GOL*-FIX(0.331)
|
||||||
|
pmaddwd xmm4,[GOTOFF(eax,PW_MF016_MF033)] ; xmm4=ROH*-FIX(0.168)+GOH*-FIX(0.331)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(4)], xmm1 ; wk(4)=ROL*FIX(0.299)+GOL*FIX(0.337)
|
||||||
|
movdqa XMMWORD [wk(5)], xmm6 ; wk(5)=ROH*FIX(0.299)+GOH*FIX(0.337)
|
||||||
|
|
||||||
|
pxor xmm1,xmm1
|
||||||
|
pxor xmm6,xmm6
|
||||||
|
punpcklwd xmm1,xmm5 ; xmm1=BOL
|
||||||
|
punpckhwd xmm6,xmm5 ; xmm6=BOH
|
||||||
|
psrld xmm1,1 ; xmm1=BOL*FIX(0.500)
|
||||||
|
psrld xmm6,1 ; xmm6=BOH*FIX(0.500)
|
||||||
|
|
||||||
|
movdqa xmm5,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; xmm5=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd xmm7,xmm1
|
||||||
|
paddd xmm4,xmm6
|
||||||
|
paddd xmm7,xmm5
|
||||||
|
paddd xmm4,xmm5
|
||||||
|
psrld xmm7,SCALEBITS ; xmm7=CbOL
|
||||||
|
psrld xmm4,SCALEBITS ; xmm4=CbOH
|
||||||
|
packssdw xmm7,xmm4 ; xmm7=CbO
|
||||||
|
|
||||||
|
movdqa xmm1, XMMWORD [wk(2)] ; xmm1=BE
|
||||||
|
|
||||||
|
movdqa xmm6,xmm0
|
||||||
|
punpcklwd xmm0,xmm2
|
||||||
|
punpckhwd xmm6,xmm2
|
||||||
|
movdqa xmm5,xmm0
|
||||||
|
movdqa xmm4,xmm6
|
||||||
|
pmaddwd xmm0,[GOTOFF(eax,PW_F0299_F0337)] ; xmm0=REL*FIX(0.299)+GEL*FIX(0.337)
|
||||||
|
pmaddwd xmm6,[GOTOFF(eax,PW_F0299_F0337)] ; xmm6=REH*FIX(0.299)+GEH*FIX(0.337)
|
||||||
|
pmaddwd xmm5,[GOTOFF(eax,PW_MF016_MF033)] ; xmm5=REL*-FIX(0.168)+GEL*-FIX(0.331)
|
||||||
|
pmaddwd xmm4,[GOTOFF(eax,PW_MF016_MF033)] ; xmm4=REH*-FIX(0.168)+GEH*-FIX(0.331)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(6)], xmm0 ; wk(6)=REL*FIX(0.299)+GEL*FIX(0.337)
|
||||||
|
movdqa XMMWORD [wk(7)], xmm6 ; wk(7)=REH*FIX(0.299)+GEH*FIX(0.337)
|
||||||
|
|
||||||
|
pxor xmm0,xmm0
|
||||||
|
pxor xmm6,xmm6
|
||||||
|
punpcklwd xmm0,xmm1 ; xmm0=BEL
|
||||||
|
punpckhwd xmm6,xmm1 ; xmm6=BEH
|
||||||
|
psrld xmm0,1 ; xmm0=BEL*FIX(0.500)
|
||||||
|
psrld xmm6,1 ; xmm6=BEH*FIX(0.500)
|
||||||
|
|
||||||
|
movdqa xmm1,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; xmm1=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd xmm5,xmm0
|
||||||
|
paddd xmm4,xmm6
|
||||||
|
paddd xmm5,xmm1
|
||||||
|
paddd xmm4,xmm1
|
||||||
|
psrld xmm5,SCALEBITS ; xmm5=CbEL
|
||||||
|
psrld xmm4,SCALEBITS ; xmm4=CbEH
|
||||||
|
packssdw xmm5,xmm4 ; xmm5=CbE
|
||||||
|
|
||||||
|
psllw xmm7,BYTE_BIT
|
||||||
|
por xmm5,xmm7 ; xmm5=Cb
|
||||||
|
movdqa XMMWORD [ebx], xmm5 ; Save Cb
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [wk(3)] ; xmm0=BO
|
||||||
|
movdqa xmm6, XMMWORD [wk(2)] ; xmm6=BE
|
||||||
|
movdqa xmm1, XMMWORD [wk(1)] ; xmm1=RO
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
punpcklwd xmm0,xmm3
|
||||||
|
punpckhwd xmm4,xmm3
|
||||||
|
movdqa xmm7,xmm0
|
||||||
|
movdqa xmm5,xmm4
|
||||||
|
pmaddwd xmm0,[GOTOFF(eax,PW_F0114_F0250)] ; xmm0=BOL*FIX(0.114)+GOL*FIX(0.250)
|
||||||
|
pmaddwd xmm4,[GOTOFF(eax,PW_F0114_F0250)] ; xmm4=BOH*FIX(0.114)+GOH*FIX(0.250)
|
||||||
|
pmaddwd xmm7,[GOTOFF(eax,PW_MF008_MF041)] ; xmm7=BOL*-FIX(0.081)+GOL*-FIX(0.418)
|
||||||
|
pmaddwd xmm5,[GOTOFF(eax,PW_MF008_MF041)] ; xmm5=BOH*-FIX(0.081)+GOH*-FIX(0.418)
|
||||||
|
|
||||||
|
movdqa xmm3,[GOTOFF(eax,PD_ONEHALF)] ; xmm3=[PD_ONEHALF]
|
||||||
|
|
||||||
|
paddd xmm0, XMMWORD [wk(4)]
|
||||||
|
paddd xmm4, XMMWORD [wk(5)]
|
||||||
|
paddd xmm0,xmm3
|
||||||
|
paddd xmm4,xmm3
|
||||||
|
psrld xmm0,SCALEBITS ; xmm0=YOL
|
||||||
|
psrld xmm4,SCALEBITS ; xmm4=YOH
|
||||||
|
packssdw xmm0,xmm4 ; xmm0=YO
|
||||||
|
|
||||||
|
pxor xmm3,xmm3
|
||||||
|
pxor xmm4,xmm4
|
||||||
|
punpcklwd xmm3,xmm1 ; xmm3=ROL
|
||||||
|
punpckhwd xmm4,xmm1 ; xmm4=ROH
|
||||||
|
psrld xmm3,1 ; xmm3=ROL*FIX(0.500)
|
||||||
|
psrld xmm4,1 ; xmm4=ROH*FIX(0.500)
|
||||||
|
|
||||||
|
movdqa xmm1,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; xmm1=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd xmm7,xmm3
|
||||||
|
paddd xmm5,xmm4
|
||||||
|
paddd xmm7,xmm1
|
||||||
|
paddd xmm5,xmm1
|
||||||
|
psrld xmm7,SCALEBITS ; xmm7=CrOL
|
||||||
|
psrld xmm5,SCALEBITS ; xmm5=CrOH
|
||||||
|
packssdw xmm7,xmm5 ; xmm7=CrO
|
||||||
|
|
||||||
|
movdqa xmm3, XMMWORD [wk(0)] ; xmm3=RE
|
||||||
|
|
||||||
|
movdqa xmm4,xmm6
|
||||||
|
punpcklwd xmm6,xmm2
|
||||||
|
punpckhwd xmm4,xmm2
|
||||||
|
movdqa xmm1,xmm6
|
||||||
|
movdqa xmm5,xmm4
|
||||||
|
pmaddwd xmm6,[GOTOFF(eax,PW_F0114_F0250)] ; xmm6=BEL*FIX(0.114)+GEL*FIX(0.250)
|
||||||
|
pmaddwd xmm4,[GOTOFF(eax,PW_F0114_F0250)] ; xmm4=BEH*FIX(0.114)+GEH*FIX(0.250)
|
||||||
|
pmaddwd xmm1,[GOTOFF(eax,PW_MF008_MF041)] ; xmm1=BEL*-FIX(0.081)+GEL*-FIX(0.418)
|
||||||
|
pmaddwd xmm5,[GOTOFF(eax,PW_MF008_MF041)] ; xmm5=BEH*-FIX(0.081)+GEH*-FIX(0.418)
|
||||||
|
|
||||||
|
movdqa xmm2,[GOTOFF(eax,PD_ONEHALF)] ; xmm2=[PD_ONEHALF]
|
||||||
|
|
||||||
|
paddd xmm6, XMMWORD [wk(6)]
|
||||||
|
paddd xmm4, XMMWORD [wk(7)]
|
||||||
|
paddd xmm6,xmm2
|
||||||
|
paddd xmm4,xmm2
|
||||||
|
psrld xmm6,SCALEBITS ; xmm6=YEL
|
||||||
|
psrld xmm4,SCALEBITS ; xmm4=YEH
|
||||||
|
packssdw xmm6,xmm4 ; xmm6=YE
|
||||||
|
|
||||||
|
psllw xmm0,BYTE_BIT
|
||||||
|
por xmm6,xmm0 ; xmm6=Y
|
||||||
|
movdqa XMMWORD [edi], xmm6 ; Save Y
|
||||||
|
|
||||||
|
pxor xmm2,xmm2
|
||||||
|
pxor xmm4,xmm4
|
||||||
|
punpcklwd xmm2,xmm3 ; xmm2=REL
|
||||||
|
punpckhwd xmm4,xmm3 ; xmm4=REH
|
||||||
|
psrld xmm2,1 ; xmm2=REL*FIX(0.500)
|
||||||
|
psrld xmm4,1 ; xmm4=REH*FIX(0.500)
|
||||||
|
|
||||||
|
movdqa xmm0,[GOTOFF(eax,PD_ONEHALFM1_CJ)] ; xmm0=[PD_ONEHALFM1_CJ]
|
||||||
|
|
||||||
|
paddd xmm1,xmm2
|
||||||
|
paddd xmm5,xmm4
|
||||||
|
paddd xmm1,xmm0
|
||||||
|
paddd xmm5,xmm0
|
||||||
|
psrld xmm1,SCALEBITS ; xmm1=CrEL
|
||||||
|
psrld xmm5,SCALEBITS ; xmm5=CrEH
|
||||||
|
packssdw xmm1,xmm5 ; xmm1=CrE
|
||||||
|
|
||||||
|
psllw xmm7,BYTE_BIT
|
||||||
|
por xmm1,xmm7 ; xmm1=Cr
|
||||||
|
movdqa XMMWORD [edx], xmm1 ; Save Cr
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD
|
||||||
|
add esi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; inptr
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr0
|
||||||
|
add ebx, byte SIZEOF_XMMWORD ; outptr1
|
||||||
|
add edx, byte SIZEOF_XMMWORD ; outptr2
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jae near .columnloop
|
||||||
|
test ecx,ecx
|
||||||
|
jnz near .column_ld1
|
||||||
|
|
||||||
|
pop ecx ; col
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ebx
|
||||||
|
pop edx
|
||||||
|
poppic eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_buf
|
||||||
|
add edi, byte SIZEOF_JSAMPROW
|
||||||
|
add ebx, byte SIZEOF_JSAMPROW
|
||||||
|
add edx, byte SIZEOF_JSAMPROW
|
||||||
|
dec eax ; num_rows
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JCCOLOR_RGBYCC_SSE2_SUPPORTED
|
||||||
|
%endif ; RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
366
jcdctmgr.c
366
jcdctmgr.c
@@ -5,6 +5,13 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : December 24, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains the forward-DCT management logic.
|
* This file contains the forward-DCT management logic.
|
||||||
* This code selects a particular DCT implementation to be used,
|
* This code selects a particular DCT implementation to be used,
|
||||||
* and it performs related housekeeping chores including coefficient
|
* and it performs related housekeeping chores including coefficient
|
||||||
@@ -24,6 +31,8 @@ typedef struct {
|
|||||||
|
|
||||||
/* Pointer to the DCT routine actually in use */
|
/* Pointer to the DCT routine actually in use */
|
||||||
forward_DCT_method_ptr do_dct;
|
forward_DCT_method_ptr do_dct;
|
||||||
|
convsamp_int_method_ptr convsamp;
|
||||||
|
quantize_int_method_ptr quantize;
|
||||||
|
|
||||||
/* The actual post-DCT divisors --- not identical to the quant table
|
/* The actual post-DCT divisors --- not identical to the quant table
|
||||||
* entries, because of scaling (especially for an unnormalized DCT).
|
* entries, because of scaling (especially for an unnormalized DCT).
|
||||||
@@ -34,12 +43,75 @@ typedef struct {
|
|||||||
#ifdef DCT_FLOAT_SUPPORTED
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
/* Same as above for the floating-point case. */
|
/* Same as above for the floating-point case. */
|
||||||
float_DCT_method_ptr do_float_dct;
|
float_DCT_method_ptr do_float_dct;
|
||||||
|
convsamp_float_method_ptr float_convsamp;
|
||||||
|
quantize_float_method_ptr float_quantize;
|
||||||
FAST_FLOAT * float_divisors[NUM_QUANT_TBLS];
|
FAST_FLOAT * float_divisors[NUM_QUANT_TBLS];
|
||||||
#endif
|
#endif
|
||||||
} my_fdct_controller;
|
} my_fdct_controller;
|
||||||
|
|
||||||
typedef my_fdct_controller * my_fdct_ptr;
|
typedef my_fdct_controller * my_fdct_ptr;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: Most of SSE/SSE2 instructions require that the memory address
|
||||||
|
* is aligned to a 16-byte boundary; if not, a general-protection exception
|
||||||
|
* (#GP) is generated.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define ALIGN_SIZE 16 /* sizeof SSE/SSE2 register */
|
||||||
|
#define ALIGN_MEM(p,a) ((void *) (((size_t) (p) + (a) - 1) & -(a)))
|
||||||
|
|
||||||
|
#ifdef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
#undef jpeg_quantize_int
|
||||||
|
#undef jpeg_quantize_int_mmx
|
||||||
|
#undef jpeg_quantize_int_sse2
|
||||||
|
#define jpeg_quantize_int jpeg_quantize_idiv
|
||||||
|
#define jpeg_quantize_int_mmx jpeg_quantize_idiv
|
||||||
|
#define jpeg_quantize_int_sse2 jpeg_quantize_idiv
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: compute the reciprocal of the divisor
|
||||||
|
*
|
||||||
|
* This implementation is based on an algorithm described in
|
||||||
|
* "How to optimize for the Pentium family of microprocessors"
|
||||||
|
* (http://www.agner.org/assem/).
|
||||||
|
*/
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
compute_reciprocal (DCTELEM divisor, DCTELEM * dtbl)
|
||||||
|
{
|
||||||
|
unsigned long d = ((unsigned long) divisor) & 0x0000FFFF;
|
||||||
|
unsigned long fq, fr;
|
||||||
|
int b, r, c;
|
||||||
|
|
||||||
|
for (b = 0; (1UL << b) <= d; b++) ;
|
||||||
|
|
||||||
|
r = 16 + (--b);
|
||||||
|
fq = (1UL << r) / d;
|
||||||
|
fr = (1UL << r) % d;
|
||||||
|
r -= 16;
|
||||||
|
c = 0;
|
||||||
|
|
||||||
|
if (fr == 0) {
|
||||||
|
fq >>= 1;
|
||||||
|
r--;
|
||||||
|
} else if (fr <= (d / 2)) {
|
||||||
|
c++;
|
||||||
|
} else {
|
||||||
|
fq++;
|
||||||
|
}
|
||||||
|
|
||||||
|
dtbl[DCTSIZE2 * 0] = (DCTELEM) fq; /* reciprocal */
|
||||||
|
dtbl[DCTSIZE2 * 1] = (DCTELEM) (c + (d / 2)); /* correction + roundfactor */
|
||||||
|
dtbl[DCTSIZE2 * 2] = (DCTELEM) (1 << (16 - (r + 1 + 1))); /* scale */
|
||||||
|
dtbl[DCTSIZE2 * 3] = (DCTELEM) (r + 1); /* shift */
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* JFDCT_INT_QUANTIZE_WITH_DIVISION */
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Initialize for a processing pass.
|
* Initialize for a processing pass.
|
||||||
@@ -75,6 +147,18 @@ start_pass_fdctmgr (j_compress_ptr cinfo)
|
|||||||
/* For LL&M IDCT method, divisors are equal to raw quantization
|
/* For LL&M IDCT method, divisors are equal to raw quantization
|
||||||
* coefficients multiplied by 8 (to counteract scaling).
|
* coefficients multiplied by 8 (to counteract scaling).
|
||||||
*/
|
*/
|
||||||
|
#ifndef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
if (fdct->divisors[qtblno] == NULL) {
|
||||||
|
fdct->divisors[qtblno] = (DCTELEM *)
|
||||||
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
|
(DCTSIZE2 * 4) * SIZEOF(DCTELEM));
|
||||||
|
}
|
||||||
|
dtbl = fdct->divisors[qtblno];
|
||||||
|
for (i = 0; i < DCTSIZE2; i++) {
|
||||||
|
compute_reciprocal ((DCTELEM) (qtbl->quantval[i] << 3), &dtbl[i]);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
#else /* JFDCT_INT_QUANTIZE_WITH_DIVISION */
|
||||||
if (fdct->divisors[qtblno] == NULL) {
|
if (fdct->divisors[qtblno] == NULL) {
|
||||||
fdct->divisors[qtblno] = (DCTELEM *)
|
fdct->divisors[qtblno] = (DCTELEM *)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -85,7 +169,8 @@ start_pass_fdctmgr (j_compress_ptr cinfo)
|
|||||||
dtbl[i] = ((DCTELEM) qtbl->quantval[i]) << 3;
|
dtbl[i] = ((DCTELEM) qtbl->quantval[i]) << 3;
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif /* JFDCT_INT_QUANTIZE_WITH_DIVISION */
|
||||||
|
#endif /* DCT_ISLOW_SUPPORTED */
|
||||||
#ifdef DCT_IFAST_SUPPORTED
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
case JDCT_IFAST:
|
case JDCT_IFAST:
|
||||||
{
|
{
|
||||||
@@ -109,6 +194,21 @@ start_pass_fdctmgr (j_compress_ptr cinfo)
|
|||||||
};
|
};
|
||||||
SHIFT_TEMPS
|
SHIFT_TEMPS
|
||||||
|
|
||||||
|
#ifndef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
if (fdct->divisors[qtblno] == NULL) {
|
||||||
|
fdct->divisors[qtblno] = (DCTELEM *)
|
||||||
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
|
(DCTSIZE2 * 4) * SIZEOF(DCTELEM));
|
||||||
|
}
|
||||||
|
dtbl = fdct->divisors[qtblno];
|
||||||
|
for (i = 0; i < DCTSIZE2; i++) {
|
||||||
|
compute_reciprocal ((DCTELEM)
|
||||||
|
DESCALE(MULTIPLY16V16((INT32) qtbl->quantval[i],
|
||||||
|
(INT32) aanscales[i]),
|
||||||
|
CONST_BITS-3),
|
||||||
|
&dtbl[i]);
|
||||||
|
}
|
||||||
|
#else /* JFDCT_INT_QUANTIZE_WITH_DIVISION */
|
||||||
if (fdct->divisors[qtblno] == NULL) {
|
if (fdct->divisors[qtblno] == NULL) {
|
||||||
fdct->divisors[qtblno] = (DCTELEM *)
|
fdct->divisors[qtblno] = (DCTELEM *)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -121,9 +221,10 @@ start_pass_fdctmgr (j_compress_ptr cinfo)
|
|||||||
(INT32) aanscales[i]),
|
(INT32) aanscales[i]),
|
||||||
CONST_BITS-3);
|
CONST_BITS-3);
|
||||||
}
|
}
|
||||||
|
#endif /* JFDCT_INT_QUANTIZE_WITH_DIVISION */
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif /* DCT_IFAST_SUPPORTED */
|
||||||
#ifdef DCT_FLOAT_SUPPORTED
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
case JDCT_FLOAT:
|
case JDCT_FLOAT:
|
||||||
{
|
{
|
||||||
@@ -183,83 +284,23 @@ forward_DCT (j_compress_ptr cinfo, jpeg_component_info * compptr,
|
|||||||
JDIMENSION num_blocks)
|
JDIMENSION num_blocks)
|
||||||
/* This version is used for integer DCT implementations. */
|
/* This version is used for integer DCT implementations. */
|
||||||
{
|
{
|
||||||
/* This routine is heavily used, so it's worth coding it tightly. */
|
|
||||||
my_fdct_ptr fdct = (my_fdct_ptr) cinfo->fdct;
|
my_fdct_ptr fdct = (my_fdct_ptr) cinfo->fdct;
|
||||||
forward_DCT_method_ptr do_dct = fdct->do_dct;
|
|
||||||
DCTELEM * divisors = fdct->divisors[compptr->quant_tbl_no];
|
DCTELEM * divisors = fdct->divisors[compptr->quant_tbl_no];
|
||||||
DCTELEM workspace[DCTSIZE2]; /* work area for FDCT subroutine */
|
DCTELEM workspace[DCTSIZE2 + ALIGN_SIZE/sizeof(DCTELEM)];
|
||||||
|
DCTELEM * wkptr = (DCTELEM *) ALIGN_MEM(workspace, ALIGN_SIZE);
|
||||||
JDIMENSION bi;
|
JDIMENSION bi;
|
||||||
|
|
||||||
sample_data += start_row; /* fold in the vertical offset once */
|
sample_data += start_row; /* fold in the vertical offset once */
|
||||||
|
|
||||||
for (bi = 0; bi < num_blocks; bi++, start_col += DCTSIZE) {
|
for (bi = 0; bi < num_blocks; bi++, start_col += DCTSIZE) {
|
||||||
/* Load data into workspace, applying unsigned->signed conversion */
|
/* Load data into workspace, applying unsigned->signed conversion */
|
||||||
{ register DCTELEM *workspaceptr;
|
(*fdct->convsamp) (sample_data, start_col, wkptr);
|
||||||
register JSAMPROW elemptr;
|
|
||||||
register int elemr;
|
|
||||||
|
|
||||||
workspaceptr = workspace;
|
|
||||||
for (elemr = 0; elemr < DCTSIZE; elemr++) {
|
|
||||||
elemptr = sample_data[elemr] + start_col;
|
|
||||||
#if DCTSIZE == 8 /* unroll the inner loop */
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
#else
|
|
||||||
{ register int elemc;
|
|
||||||
for (elemc = DCTSIZE; elemc > 0; elemc--) {
|
|
||||||
*workspaceptr++ = GETJSAMPLE(*elemptr++) - CENTERJSAMPLE;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/* Perform the DCT */
|
/* Perform the DCT */
|
||||||
(*do_dct) (workspace);
|
(*fdct->do_dct) (wkptr);
|
||||||
|
|
||||||
/* Quantize/descale the coefficients, and store into coef_blocks[] */
|
/* Quantize/descale the coefficients, and store into coef_blocks[] */
|
||||||
{ register DCTELEM temp, qval;
|
(*fdct->quantize) (coef_blocks[bi], divisors, wkptr);
|
||||||
register int i;
|
|
||||||
register JCOEFPTR output_ptr = coef_blocks[bi];
|
|
||||||
|
|
||||||
for (i = 0; i < DCTSIZE2; i++) {
|
|
||||||
qval = divisors[i];
|
|
||||||
temp = workspace[i];
|
|
||||||
/* Divide the coefficient value by qval, ensuring proper rounding.
|
|
||||||
* Since C does not specify the direction of rounding for negative
|
|
||||||
* quotients, we have to force the dividend positive for portability.
|
|
||||||
*
|
|
||||||
* In most files, at least half of the output values will be zero
|
|
||||||
* (at default quantization settings, more like three-quarters...)
|
|
||||||
* so we should ensure that this case is fast. On many machines,
|
|
||||||
* a comparison is enough cheaper than a divide to make a special test
|
|
||||||
* a win. Since both inputs will be nonnegative, we need only test
|
|
||||||
* for a < b to discover whether a/b is 0.
|
|
||||||
* If your machine's division is fast enough, define FAST_DIVIDE.
|
|
||||||
*/
|
|
||||||
#ifdef FAST_DIVIDE
|
|
||||||
#define DIVIDE_BY(a,b) a /= b
|
|
||||||
#else
|
|
||||||
#define DIVIDE_BY(a,b) if (a >= b) a /= b; else a = 0
|
|
||||||
#endif
|
|
||||||
if (temp < 0) {
|
|
||||||
temp = -temp;
|
|
||||||
temp += qval>>1; /* for rounding */
|
|
||||||
DIVIDE_BY(temp, qval);
|
|
||||||
temp = -temp;
|
|
||||||
} else {
|
|
||||||
temp += qval>>1; /* for rounding */
|
|
||||||
DIVIDE_BY(temp, qval);
|
|
||||||
}
|
|
||||||
output_ptr[i] = (JCOEF) temp;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -273,64 +314,23 @@ forward_DCT_float (j_compress_ptr cinfo, jpeg_component_info * compptr,
|
|||||||
JDIMENSION num_blocks)
|
JDIMENSION num_blocks)
|
||||||
/* This version is used for floating-point DCT implementations. */
|
/* This version is used for floating-point DCT implementations. */
|
||||||
{
|
{
|
||||||
/* This routine is heavily used, so it's worth coding it tightly. */
|
|
||||||
my_fdct_ptr fdct = (my_fdct_ptr) cinfo->fdct;
|
my_fdct_ptr fdct = (my_fdct_ptr) cinfo->fdct;
|
||||||
float_DCT_method_ptr do_dct = fdct->do_float_dct;
|
|
||||||
FAST_FLOAT * divisors = fdct->float_divisors[compptr->quant_tbl_no];
|
FAST_FLOAT * divisors = fdct->float_divisors[compptr->quant_tbl_no];
|
||||||
FAST_FLOAT workspace[DCTSIZE2]; /* work area for FDCT subroutine */
|
FAST_FLOAT workspace[DCTSIZE2 + ALIGN_SIZE/sizeof(FAST_FLOAT)];
|
||||||
|
FAST_FLOAT * wkptr = (FAST_FLOAT *) ALIGN_MEM(workspace, ALIGN_SIZE);
|
||||||
JDIMENSION bi;
|
JDIMENSION bi;
|
||||||
|
|
||||||
sample_data += start_row; /* fold in the vertical offset once */
|
sample_data += start_row; /* fold in the vertical offset once */
|
||||||
|
|
||||||
for (bi = 0; bi < num_blocks; bi++, start_col += DCTSIZE) {
|
for (bi = 0; bi < num_blocks; bi++, start_col += DCTSIZE) {
|
||||||
/* Load data into workspace, applying unsigned->signed conversion */
|
/* Load data into workspace, applying unsigned->signed conversion */
|
||||||
{ register FAST_FLOAT *workspaceptr;
|
(*fdct->float_convsamp) (sample_data, start_col, wkptr);
|
||||||
register JSAMPROW elemptr;
|
|
||||||
register int elemr;
|
|
||||||
|
|
||||||
workspaceptr = workspace;
|
|
||||||
for (elemr = 0; elemr < DCTSIZE; elemr++) {
|
|
||||||
elemptr = sample_data[elemr] + start_col;
|
|
||||||
#if DCTSIZE == 8 /* unroll the inner loop */
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
#else
|
|
||||||
{ register int elemc;
|
|
||||||
for (elemc = DCTSIZE; elemc > 0; elemc--) {
|
|
||||||
*workspaceptr++ = (FAST_FLOAT)
|
|
||||||
(GETJSAMPLE(*elemptr++) - CENTERJSAMPLE);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/* Perform the DCT */
|
/* Perform the DCT */
|
||||||
(*do_dct) (workspace);
|
(*fdct->do_float_dct) (wkptr);
|
||||||
|
|
||||||
/* Quantize/descale the coefficients, and store into coef_blocks[] */
|
/* Quantize/descale the coefficients, and store into coef_blocks[] */
|
||||||
{ register FAST_FLOAT temp;
|
(*fdct->float_quantize) (coef_blocks[bi], divisors, wkptr);
|
||||||
register int i;
|
|
||||||
register JCOEFPTR output_ptr = coef_blocks[bi];
|
|
||||||
|
|
||||||
for (i = 0; i < DCTSIZE2; i++) {
|
|
||||||
/* Apply the quantization and scaling factor */
|
|
||||||
temp = workspace[i] * divisors[i];
|
|
||||||
/* Round to nearest integer.
|
|
||||||
* Since C does not specify the direction of rounding for negative
|
|
||||||
* quotients, we have to force the dividend positive for portability.
|
|
||||||
* The maximum coefficient size is +-16K (for 12-bit data), so this
|
|
||||||
* code should work for either 16-bit or 32-bit ints.
|
|
||||||
*/
|
|
||||||
output_ptr[i] = (JCOEF) ((int) (temp + (FAST_FLOAT) 16384.5) - 16384);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -346,6 +346,7 @@ jinit_forward_dct (j_compress_ptr cinfo)
|
|||||||
{
|
{
|
||||||
my_fdct_ptr fdct;
|
my_fdct_ptr fdct;
|
||||||
int i;
|
int i;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
fdct = (my_fdct_ptr)
|
fdct = (my_fdct_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -357,21 +358,86 @@ jinit_forward_dct (j_compress_ptr cinfo)
|
|||||||
#ifdef DCT_ISLOW_SUPPORTED
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
case JDCT_ISLOW:
|
case JDCT_ISLOW:
|
||||||
fdct->pub.forward_DCT = forward_DCT;
|
fdct->pub.forward_DCT = forward_DCT;
|
||||||
fdct->do_dct = jpeg_fdct_islow;
|
#ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
break;
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_islow_sse2)) {
|
||||||
|
fdct->do_dct = jpeg_fdct_islow_sse2;
|
||||||
|
fdct->convsamp = jpeg_convsamp_int_sse2;
|
||||||
|
fdct->quantize = jpeg_quantize_int_sse2;
|
||||||
|
} else
|
||||||
#endif
|
#endif
|
||||||
|
#ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX) {
|
||||||
|
fdct->do_dct = jpeg_fdct_islow_mmx;
|
||||||
|
fdct->convsamp = jpeg_convsamp_int_mmx;
|
||||||
|
fdct->quantize = jpeg_quantize_int_mmx;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
{
|
||||||
|
fdct->do_dct = jpeg_fdct_islow;
|
||||||
|
fdct->convsamp = jpeg_convsamp_int;
|
||||||
|
fdct->quantize = jpeg_quantize_int;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
#endif /* DCT_ISLOW_SUPPORTED */
|
||||||
#ifdef DCT_IFAST_SUPPORTED
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
case JDCT_IFAST:
|
case JDCT_IFAST:
|
||||||
fdct->pub.forward_DCT = forward_DCT;
|
fdct->pub.forward_DCT = forward_DCT;
|
||||||
fdct->do_dct = jpeg_fdct_ifast;
|
#ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
break;
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_ifast_sse2)) {
|
||||||
|
fdct->do_dct = jpeg_fdct_ifast_sse2;
|
||||||
|
fdct->convsamp = jpeg_convsamp_int_sse2;
|
||||||
|
fdct->quantize = jpeg_quantize_int_sse2;
|
||||||
|
} else
|
||||||
#endif
|
#endif
|
||||||
|
#ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX) {
|
||||||
|
fdct->do_dct = jpeg_fdct_ifast_mmx;
|
||||||
|
fdct->convsamp = jpeg_convsamp_int_mmx;
|
||||||
|
fdct->quantize = jpeg_quantize_int_mmx;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
{
|
||||||
|
fdct->do_dct = jpeg_fdct_ifast;
|
||||||
|
fdct->convsamp = jpeg_convsamp_int;
|
||||||
|
fdct->quantize = jpeg_quantize_int;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
#endif /* DCT_IFAST_SUPPORTED */
|
||||||
#ifdef DCT_FLOAT_SUPPORTED
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
case JDCT_FLOAT:
|
case JDCT_FLOAT:
|
||||||
fdct->pub.forward_DCT = forward_DCT_float;
|
fdct->pub.forward_DCT = forward_DCT_float;
|
||||||
fdct->do_float_dct = jpeg_fdct_float;
|
#ifdef JFDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
break;
|
if (simd & JSIMD_SSE && simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_float_sse)) {
|
||||||
|
fdct->do_float_dct = jpeg_fdct_float_sse;
|
||||||
|
fdct->float_convsamp = jpeg_convsamp_flt_sse2;
|
||||||
|
fdct->float_quantize = jpeg_quantize_flt_sse2;
|
||||||
|
} else
|
||||||
#endif
|
#endif
|
||||||
|
#ifdef JFDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_float_sse)) {
|
||||||
|
fdct->do_float_dct = jpeg_fdct_float_sse;
|
||||||
|
fdct->float_convsamp = jpeg_convsamp_flt_sse;
|
||||||
|
fdct->float_quantize = jpeg_quantize_flt_sse;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#ifdef JFDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_3DNOW) {
|
||||||
|
fdct->do_float_dct = jpeg_fdct_float_3dnow;
|
||||||
|
fdct->float_convsamp = jpeg_convsamp_flt_3dnow;
|
||||||
|
fdct->float_quantize = jpeg_quantize_flt_3dnow;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
{
|
||||||
|
fdct->do_float_dct = jpeg_fdct_float;
|
||||||
|
fdct->float_convsamp = jpeg_convsamp_float;
|
||||||
|
fdct->float_quantize = jpeg_quantize_float;
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
#endif /* DCT_FLOAT_SUPPORTED */
|
||||||
default:
|
default:
|
||||||
ERREXIT(cinfo, JERR_NOT_COMPILED);
|
ERREXIT(cinfo, JERR_NOT_COMPILED);
|
||||||
break;
|
break;
|
||||||
@@ -385,3 +451,65 @@ jinit_forward_dct (j_compress_ptr cinfo)
|
|||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_forward_dct (j_compress_ptr cinfo, int method)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
switch (method) {
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
case JDCT_ISLOW:
|
||||||
|
#ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_islow_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* DCT_ISLOW_SUPPORTED */
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
case JDCT_IFAST:
|
||||||
|
#ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_ifast_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* DCT_IFAST_SUPPORTED */
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
case JDCT_FLOAT:
|
||||||
|
#ifdef JFDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE && simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_float_sse))
|
||||||
|
return JSIMD_SSE; /* (JSIMD_SSE | JSIMD_SSE2); */
|
||||||
|
#endif
|
||||||
|
#ifdef JFDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fdct_float_sse))
|
||||||
|
return JSIMD_SSE; /* (JSIMD_SSE | JSIMD_MMX); */
|
||||||
|
#endif
|
||||||
|
#ifdef JFDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_3DNOW)
|
||||||
|
return JSIMD_3DNOW; /* (JSIMD_3DNOW | JSIMD_MMX); */
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* DCT_FLOAT_SUPPORTED */
|
||||||
|
default:
|
||||||
|
;
|
||||||
|
}
|
||||||
|
|
||||||
|
return JSIMD_NONE; /* not compiled */
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|||||||
139
jchuff.c
139
jchuff.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jchuff.c
|
* jchuff.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -125,16 +125,14 @@ start_pass_huff (j_compress_ptr cinfo, boolean gather_statistics)
|
|||||||
compptr = cinfo->cur_comp_info[ci];
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
dctbl = compptr->dc_tbl_no;
|
dctbl = compptr->dc_tbl_no;
|
||||||
actbl = compptr->ac_tbl_no;
|
actbl = compptr->ac_tbl_no;
|
||||||
/* Make sure requested tables are present */
|
|
||||||
/* (In gather mode, tables need not be allocated yet) */
|
|
||||||
if (dctbl < 0 || dctbl >= NUM_HUFF_TBLS ||
|
|
||||||
(cinfo->dc_huff_tbl_ptrs[dctbl] == NULL && !gather_statistics))
|
|
||||||
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, dctbl);
|
|
||||||
if (actbl < 0 || actbl >= NUM_HUFF_TBLS ||
|
|
||||||
(cinfo->ac_huff_tbl_ptrs[actbl] == NULL && !gather_statistics))
|
|
||||||
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, actbl);
|
|
||||||
if (gather_statistics) {
|
if (gather_statistics) {
|
||||||
#ifdef ENTROPY_OPT_SUPPORTED
|
#ifdef ENTROPY_OPT_SUPPORTED
|
||||||
|
/* Check for invalid table indexes */
|
||||||
|
/* (make_c_derived_tbl does this in the other path) */
|
||||||
|
if (dctbl < 0 || dctbl >= NUM_HUFF_TBLS)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, dctbl);
|
||||||
|
if (actbl < 0 || actbl >= NUM_HUFF_TBLS)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, actbl);
|
||||||
/* Allocate and zero the statistics tables */
|
/* Allocate and zero the statistics tables */
|
||||||
/* Note that jpeg_gen_optimal_table expects 257 entries in each table! */
|
/* Note that jpeg_gen_optimal_table expects 257 entries in each table! */
|
||||||
if (entropy->dc_count_ptrs[dctbl] == NULL)
|
if (entropy->dc_count_ptrs[dctbl] == NULL)
|
||||||
@@ -151,9 +149,9 @@ start_pass_huff (j_compress_ptr cinfo, boolean gather_statistics)
|
|||||||
} else {
|
} else {
|
||||||
/* Compute derived values for Huffman tables */
|
/* Compute derived values for Huffman tables */
|
||||||
/* We may do this more than once for a table, but it's not expensive */
|
/* We may do this more than once for a table, but it's not expensive */
|
||||||
jpeg_make_c_derived_tbl(cinfo, cinfo->dc_huff_tbl_ptrs[dctbl],
|
jpeg_make_c_derived_tbl(cinfo, TRUE, dctbl,
|
||||||
& entropy->dc_derived_tbls[dctbl]);
|
& entropy->dc_derived_tbls[dctbl]);
|
||||||
jpeg_make_c_derived_tbl(cinfo, cinfo->ac_huff_tbl_ptrs[actbl],
|
jpeg_make_c_derived_tbl(cinfo, FALSE, actbl,
|
||||||
& entropy->ac_derived_tbls[actbl]);
|
& entropy->ac_derived_tbls[actbl]);
|
||||||
}
|
}
|
||||||
/* Initialize DC predictions to 0 */
|
/* Initialize DC predictions to 0 */
|
||||||
@@ -172,19 +170,34 @@ start_pass_huff (j_compress_ptr cinfo, boolean gather_statistics)
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
* Compute the derived values for a Huffman table.
|
* Compute the derived values for a Huffman table.
|
||||||
|
* This routine also performs some validation checks on the table.
|
||||||
|
*
|
||||||
* Note this is also used by jcphuff.c.
|
* Note this is also used by jcphuff.c.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
GLOBAL(void)
|
GLOBAL(void)
|
||||||
jpeg_make_c_derived_tbl (j_compress_ptr cinfo, JHUFF_TBL * htbl,
|
jpeg_make_c_derived_tbl (j_compress_ptr cinfo, boolean isDC, int tblno,
|
||||||
c_derived_tbl ** pdtbl)
|
c_derived_tbl ** pdtbl)
|
||||||
{
|
{
|
||||||
|
JHUFF_TBL *htbl;
|
||||||
c_derived_tbl *dtbl;
|
c_derived_tbl *dtbl;
|
||||||
int p, i, l, lastp, si;
|
int p, i, l, lastp, si, maxsymbol;
|
||||||
char huffsize[257];
|
char huffsize[257];
|
||||||
unsigned int huffcode[257];
|
unsigned int huffcode[257];
|
||||||
unsigned int code;
|
unsigned int code;
|
||||||
|
|
||||||
|
/* Note that huffsize[] and huffcode[] are filled in code-length order,
|
||||||
|
* paralleling the order of the symbols themselves in htbl->huffval[].
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* Find the input Huffman table */
|
||||||
|
if (tblno < 0 || tblno >= NUM_HUFF_TBLS)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tblno);
|
||||||
|
htbl =
|
||||||
|
isDC ? cinfo->dc_huff_tbl_ptrs[tblno] : cinfo->ac_huff_tbl_ptrs[tblno];
|
||||||
|
if (htbl == NULL)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tblno);
|
||||||
|
|
||||||
/* Allocate a workspace if we haven't already done so. */
|
/* Allocate a workspace if we haven't already done so. */
|
||||||
if (*pdtbl == NULL)
|
if (*pdtbl == NULL)
|
||||||
*pdtbl = (c_derived_tbl *)
|
*pdtbl = (c_derived_tbl *)
|
||||||
@@ -193,18 +206,20 @@ jpeg_make_c_derived_tbl (j_compress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
dtbl = *pdtbl;
|
dtbl = *pdtbl;
|
||||||
|
|
||||||
/* Figure C.1: make table of Huffman code length for each symbol */
|
/* Figure C.1: make table of Huffman code length for each symbol */
|
||||||
/* Note that this is in code-length order. */
|
|
||||||
|
|
||||||
p = 0;
|
p = 0;
|
||||||
for (l = 1; l <= 16; l++) {
|
for (l = 1; l <= 16; l++) {
|
||||||
for (i = 1; i <= (int) htbl->bits[l]; i++)
|
i = (int) htbl->bits[l];
|
||||||
|
if (i < 0 || p + i > 256) /* protect against table overrun */
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
|
while (i--)
|
||||||
huffsize[p++] = (char) l;
|
huffsize[p++] = (char) l;
|
||||||
}
|
}
|
||||||
huffsize[p] = 0;
|
huffsize[p] = 0;
|
||||||
lastp = p;
|
lastp = p;
|
||||||
|
|
||||||
/* Figure C.2: generate the codes themselves */
|
/* Figure C.2: generate the codes themselves */
|
||||||
/* Note that this is in code-length order. */
|
/* We also validate that the counts represent a legal Huffman code tree. */
|
||||||
|
|
||||||
code = 0;
|
code = 0;
|
||||||
si = huffsize[0];
|
si = huffsize[0];
|
||||||
@@ -214,6 +229,11 @@ jpeg_make_c_derived_tbl (j_compress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
huffcode[p++] = code;
|
huffcode[p++] = code;
|
||||||
code++;
|
code++;
|
||||||
}
|
}
|
||||||
|
/* code is now 1 more than the last code used for codelength si; but
|
||||||
|
* it must still fit in si bits, since no code is allowed to be all ones.
|
||||||
|
*/
|
||||||
|
if (((INT32) code) >= (((INT32) 1) << si))
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
code <<= 1;
|
code <<= 1;
|
||||||
si++;
|
si++;
|
||||||
}
|
}
|
||||||
@@ -221,14 +241,25 @@ jpeg_make_c_derived_tbl (j_compress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
/* Figure C.3: generate encoding tables */
|
/* Figure C.3: generate encoding tables */
|
||||||
/* These are code and size indexed by symbol value */
|
/* These are code and size indexed by symbol value */
|
||||||
|
|
||||||
/* Set any codeless symbols to have code length 0;
|
/* Set all codeless symbols to have code length 0;
|
||||||
* this allows emit_bits to detect any attempt to emit such symbols.
|
* this lets us detect duplicate VAL entries here, and later
|
||||||
|
* allows emit_bits to detect any attempt to emit such symbols.
|
||||||
*/
|
*/
|
||||||
MEMZERO(dtbl->ehufsi, SIZEOF(dtbl->ehufsi));
|
MEMZERO(dtbl->ehufsi, SIZEOF(dtbl->ehufsi));
|
||||||
|
|
||||||
|
/* This is also a convenient place to check for out-of-range
|
||||||
|
* and duplicated VAL entries. We allow 0..255 for AC symbols
|
||||||
|
* but only 0..15 for DC. (We could constrain them further
|
||||||
|
* based on data depth and mode, but this seems enough.)
|
||||||
|
*/
|
||||||
|
maxsymbol = isDC ? 15 : 255;
|
||||||
|
|
||||||
for (p = 0; p < lastp; p++) {
|
for (p = 0; p < lastp; p++) {
|
||||||
dtbl->ehufco[htbl->huffval[p]] = huffcode[p];
|
i = htbl->huffval[p];
|
||||||
dtbl->ehufsi[htbl->huffval[p]] = huffsize[p];
|
if (i < 0 || i > maxsymbol || dtbl->ehufsi[i])
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
|
dtbl->ehufco[i] = huffcode[p];
|
||||||
|
dtbl->ehufsi[i] = huffsize[p];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -343,6 +374,11 @@ encode_one_block (working_state * state, JCOEFPTR block, int last_dc_val,
|
|||||||
nbits++;
|
nbits++;
|
||||||
temp >>= 1;
|
temp >>= 1;
|
||||||
}
|
}
|
||||||
|
/* Check for out-of-range coefficient values.
|
||||||
|
* Since we're encoding a difference, the range limit is twice as much.
|
||||||
|
*/
|
||||||
|
if (nbits > MAX_COEF_BITS+1)
|
||||||
|
ERREXIT(state->cinfo, JERR_BAD_DCT_COEF);
|
||||||
|
|
||||||
/* Emit the Huffman-coded symbol for the number of bits */
|
/* Emit the Huffman-coded symbol for the number of bits */
|
||||||
if (! emit_bits(state, dctbl->ehufco[nbits], dctbl->ehufsi[nbits]))
|
if (! emit_bits(state, dctbl->ehufco[nbits], dctbl->ehufsi[nbits]))
|
||||||
@@ -380,6 +416,9 @@ encode_one_block (working_state * state, JCOEFPTR block, int last_dc_val,
|
|||||||
nbits = 1; /* there must be at least one 1 bit */
|
nbits = 1; /* there must be at least one 1 bit */
|
||||||
while ((temp >>= 1))
|
while ((temp >>= 1))
|
||||||
nbits++;
|
nbits++;
|
||||||
|
/* Check for out-of-range coefficient values */
|
||||||
|
if (nbits > MAX_COEF_BITS)
|
||||||
|
ERREXIT(state->cinfo, JERR_BAD_DCT_COEF);
|
||||||
|
|
||||||
/* Emit Huffman symbol for run length / number of bits */
|
/* Emit Huffman symbol for run length / number of bits */
|
||||||
i = (r << 4) + nbits;
|
i = (r << 4) + nbits;
|
||||||
@@ -516,19 +555,12 @@ finish_pass_huff (j_compress_ptr cinfo)
|
|||||||
/*
|
/*
|
||||||
* Huffman coding optimization.
|
* Huffman coding optimization.
|
||||||
*
|
*
|
||||||
* This actually is optimization, in the sense that we find the best possible
|
* We first scan the supplied data and count the number of uses of each symbol
|
||||||
* Huffman table(s) for the given data. We first scan the supplied data and
|
* that is to be Huffman-coded. (This process MUST agree with the code above.)
|
||||||
* count the number of uses of each symbol that is to be Huffman-coded.
|
* Then we build a Huffman coding tree for the observed counts.
|
||||||
* (This process must agree with the code above.) Then we build an
|
* Symbols which are not needed at all for the particular image are not
|
||||||
* optimal Huffman coding tree for the observed counts.
|
* assigned any code, which saves space in the DHT marker as well as in
|
||||||
*
|
* the compressed data.
|
||||||
* The JPEG standard requires Huffman codes to be no more than 16 bits long.
|
|
||||||
* If some symbols have a very small but nonzero probability, the Huffman tree
|
|
||||||
* must be adjusted to meet the code length restriction. We currently use
|
|
||||||
* the adjustment method suggested in the JPEG spec. This method is *not*
|
|
||||||
* optimal; it may not choose the best possible limited-length code. But
|
|
||||||
* since the symbols involved are infrequently used, it's not clear that
|
|
||||||
* going to extra trouble is worthwhile.
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#ifdef ENTROPY_OPT_SUPPORTED
|
#ifdef ENTROPY_OPT_SUPPORTED
|
||||||
@@ -537,7 +569,7 @@ finish_pass_huff (j_compress_ptr cinfo)
|
|||||||
/* Process a single block's worth of coefficients */
|
/* Process a single block's worth of coefficients */
|
||||||
|
|
||||||
LOCAL(void)
|
LOCAL(void)
|
||||||
htest_one_block (JCOEFPTR block, int last_dc_val,
|
htest_one_block (j_compress_ptr cinfo, JCOEFPTR block, int last_dc_val,
|
||||||
long dc_counts[], long ac_counts[])
|
long dc_counts[], long ac_counts[])
|
||||||
{
|
{
|
||||||
register int temp;
|
register int temp;
|
||||||
@@ -556,6 +588,11 @@ htest_one_block (JCOEFPTR block, int last_dc_val,
|
|||||||
nbits++;
|
nbits++;
|
||||||
temp >>= 1;
|
temp >>= 1;
|
||||||
}
|
}
|
||||||
|
/* Check for out-of-range coefficient values.
|
||||||
|
* Since we're encoding a difference, the range limit is twice as much.
|
||||||
|
*/
|
||||||
|
if (nbits > MAX_COEF_BITS+1)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_DCT_COEF);
|
||||||
|
|
||||||
/* Count the Huffman symbol for the number of bits */
|
/* Count the Huffman symbol for the number of bits */
|
||||||
dc_counts[nbits]++;
|
dc_counts[nbits]++;
|
||||||
@@ -582,6 +619,9 @@ htest_one_block (JCOEFPTR block, int last_dc_val,
|
|||||||
nbits = 1; /* there must be at least one 1 bit */
|
nbits = 1; /* there must be at least one 1 bit */
|
||||||
while ((temp >>= 1))
|
while ((temp >>= 1))
|
||||||
nbits++;
|
nbits++;
|
||||||
|
/* Check for out-of-range coefficient values */
|
||||||
|
if (nbits > MAX_COEF_BITS)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_DCT_COEF);
|
||||||
|
|
||||||
/* Count Huffman symbol for run length / number of bits */
|
/* Count Huffman symbol for run length / number of bits */
|
||||||
ac_counts[(r << 4) + nbits]++;
|
ac_counts[(r << 4) + nbits]++;
|
||||||
@@ -623,7 +663,7 @@ encode_mcu_gather (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
ci = cinfo->MCU_membership[blkn];
|
ci = cinfo->MCU_membership[blkn];
|
||||||
compptr = cinfo->cur_comp_info[ci];
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
htest_one_block(MCU_data[blkn][0], entropy->saved.last_dc_val[ci],
|
htest_one_block(cinfo, MCU_data[blkn][0], entropy->saved.last_dc_val[ci],
|
||||||
entropy->dc_count_ptrs[compptr->dc_tbl_no],
|
entropy->dc_count_ptrs[compptr->dc_tbl_no],
|
||||||
entropy->ac_count_ptrs[compptr->ac_tbl_no]);
|
entropy->ac_count_ptrs[compptr->ac_tbl_no]);
|
||||||
entropy->saved.last_dc_val[ci] = MCU_data[blkn][0][0];
|
entropy->saved.last_dc_val[ci] = MCU_data[blkn][0][0];
|
||||||
@@ -634,8 +674,31 @@ encode_mcu_gather (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Generate the optimal coding for the given counts, fill htbl.
|
* Generate the best Huffman code table for the given counts, fill htbl.
|
||||||
* Note this is also used by jcphuff.c.
|
* Note this is also used by jcphuff.c.
|
||||||
|
*
|
||||||
|
* The JPEG standard requires that no symbol be assigned a codeword of all
|
||||||
|
* one bits (so that padding bits added at the end of a compressed segment
|
||||||
|
* can't look like a valid code). Because of the canonical ordering of
|
||||||
|
* codewords, this just means that there must be an unused slot in the
|
||||||
|
* longest codeword length category. Section K.2 of the JPEG spec suggests
|
||||||
|
* reserving such a slot by pretending that symbol 256 is a valid symbol
|
||||||
|
* with count 1. In theory that's not optimal; giving it count zero but
|
||||||
|
* including it in the symbol set anyway should give a better Huffman code.
|
||||||
|
* But the theoretically better code actually seems to come out worse in
|
||||||
|
* practice, because it produces more all-ones bytes (which incur stuffed
|
||||||
|
* zero bytes in the final file). In any case the difference is tiny.
|
||||||
|
*
|
||||||
|
* The JPEG standard requires Huffman codes to be no more than 16 bits long.
|
||||||
|
* If some symbols have a very small but nonzero probability, the Huffman tree
|
||||||
|
* must be adjusted to meet the code length restriction. We currently use
|
||||||
|
* the adjustment method suggested in JPEG section K.2. This method is *not*
|
||||||
|
* optimal; it may not choose the best possible limited-length code. But
|
||||||
|
* typically only very-low-frequency symbols will be given less-than-optimal
|
||||||
|
* lengths, so the code is almost optimal. Experimental comparisons against
|
||||||
|
* an optimal limited-length-code algorithm indicate that the difference is
|
||||||
|
* microscopic --- usually less than a hundredth of a percent of total size.
|
||||||
|
* So the extra complexity of an optimal algorithm doesn't seem worthwhile.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
GLOBAL(void)
|
GLOBAL(void)
|
||||||
@@ -656,10 +719,10 @@ jpeg_gen_optimal_table (j_compress_ptr cinfo, JHUFF_TBL * htbl, long freq[])
|
|||||||
for (i = 0; i < 257; i++)
|
for (i = 0; i < 257; i++)
|
||||||
others[i] = -1; /* init links to empty */
|
others[i] = -1; /* init links to empty */
|
||||||
|
|
||||||
freq[256] = 1; /* make sure there is a nonzero count */
|
freq[256] = 1; /* make sure 256 has a nonzero count */
|
||||||
/* Including the pseudo-symbol 256 in the Huffman procedure guarantees
|
/* Including the pseudo-symbol 256 in the Huffman procedure guarantees
|
||||||
* that no real symbol is given code-value of all ones, because 256
|
* that no real symbol is given code-value of all ones, because 256
|
||||||
* will be placed in the largest codeword category.
|
* will be placed last in the largest codeword category.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/* Huffman's basic algorithm to assign optimal code lengths to symbols */
|
/* Huffman's basic algorithm to assign optimal code lengths to symbols */
|
||||||
|
|||||||
17
jchuff.h
17
jchuff.h
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jchuff.h
|
* jchuff.h
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -10,6 +10,18 @@
|
|||||||
* progressive encoder (jcphuff.c). No other modules need to see these.
|
* progressive encoder (jcphuff.c). No other modules need to see these.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
/* The legal range of a DCT coefficient is
|
||||||
|
* -1024 .. +1023 for 8-bit data;
|
||||||
|
* -16384 .. +16383 for 12-bit data.
|
||||||
|
* Hence the magnitude should always fit in 10 or 14 bits respectively.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#if BITS_IN_JSAMPLE == 8
|
||||||
|
#define MAX_COEF_BITS 10
|
||||||
|
#else
|
||||||
|
#define MAX_COEF_BITS 14
|
||||||
|
#endif
|
||||||
|
|
||||||
/* Derived data constructed for each Huffman table */
|
/* Derived data constructed for each Huffman table */
|
||||||
|
|
||||||
typedef struct {
|
typedef struct {
|
||||||
@@ -27,7 +39,8 @@ typedef struct {
|
|||||||
|
|
||||||
/* Expand a Huffman table definition into the derived format */
|
/* Expand a Huffman table definition into the derived format */
|
||||||
EXTERN(void) jpeg_make_c_derived_tbl
|
EXTERN(void) jpeg_make_c_derived_tbl
|
||||||
JPP((j_compress_ptr cinfo, JHUFF_TBL * htbl, c_derived_tbl ** pdtbl));
|
JPP((j_compress_ptr cinfo, boolean isDC, int tblno,
|
||||||
|
c_derived_tbl ** pdtbl));
|
||||||
|
|
||||||
/* Generate an optimal table definition given the specified counts */
|
/* Generate an optimal table definition given the specified counts */
|
||||||
EXTERN(void) jpeg_gen_optimal_table
|
EXTERN(void) jpeg_gen_optimal_table
|
||||||
|
|||||||
4
jcinit.c
4
jcinit.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jcinit.c
|
* jcinit.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -56,7 +56,7 @@ jinit_compress_master (j_compress_ptr cinfo)
|
|||||||
|
|
||||||
/* Need a full-image coefficient buffer in any multi-pass mode. */
|
/* Need a full-image coefficient buffer in any multi-pass mode. */
|
||||||
jinit_c_coef_controller(cinfo,
|
jinit_c_coef_controller(cinfo,
|
||||||
(cinfo->num_scans > 1 || cinfo->optimize_coding));
|
(boolean) (cinfo->num_scans > 1 || cinfo->optimize_coding));
|
||||||
jinit_c_main_controller(cinfo, FALSE /* never need full buffer here */);
|
jinit_c_main_controller(cinfo, FALSE /* never need full buffer here */);
|
||||||
|
|
||||||
jinit_marker_writer(cinfo);
|
jinit_marker_writer(cinfo);
|
||||||
|
|||||||
89
jcmarker.c
89
jcmarker.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jcmarker.c
|
* jcmarker.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -81,6 +81,17 @@ typedef enum { /* JPEG marker codes */
|
|||||||
} JPEG_MARKER;
|
} JPEG_MARKER;
|
||||||
|
|
||||||
|
|
||||||
|
/* Private state */
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
struct jpeg_marker_writer pub; /* public fields */
|
||||||
|
|
||||||
|
unsigned int last_restart_interval; /* last DRI value emitted; 0 after SOI */
|
||||||
|
} my_marker_writer;
|
||||||
|
|
||||||
|
typedef my_marker_writer * my_marker_ptr;
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Basic output routines.
|
* Basic output routines.
|
||||||
*
|
*
|
||||||
@@ -158,8 +169,8 @@ emit_dqt (j_compress_ptr cinfo, int index)
|
|||||||
/* The table entries must be emitted in zigzag order. */
|
/* The table entries must be emitted in zigzag order. */
|
||||||
unsigned int qval = qtbl->quantval[jpeg_natural_order[i]];
|
unsigned int qval = qtbl->quantval[jpeg_natural_order[i]];
|
||||||
if (prec)
|
if (prec)
|
||||||
emit_byte(cinfo, qval >> 8);
|
emit_byte(cinfo, (int) (qval >> 8));
|
||||||
emit_byte(cinfo, qval & 0xFF);
|
emit_byte(cinfo, (int) (qval & 0xFF));
|
||||||
}
|
}
|
||||||
|
|
||||||
qtbl->sent_table = TRUE;
|
qtbl->sent_table = TRUE;
|
||||||
@@ -342,7 +353,7 @@ emit_jfif_app0 (j_compress_ptr cinfo)
|
|||||||
* Length of APP0 block (2 bytes)
|
* Length of APP0 block (2 bytes)
|
||||||
* Block ID (4 bytes - ASCII "JFIF")
|
* Block ID (4 bytes - ASCII "JFIF")
|
||||||
* Zero byte (1 byte to terminate the ID string)
|
* Zero byte (1 byte to terminate the ID string)
|
||||||
* Version Major, Minor (2 bytes - 0x01, 0x01)
|
* Version Major, Minor (2 bytes - major first)
|
||||||
* Units (1 byte - 0x00 = none, 0x01 = inch, 0x02 = cm)
|
* Units (1 byte - 0x00 = none, 0x01 = inch, 0x02 = cm)
|
||||||
* Xdpu (2 bytes - dots per unit horizontal)
|
* Xdpu (2 bytes - dots per unit horizontal)
|
||||||
* Ydpu (2 bytes - dots per unit vertical)
|
* Ydpu (2 bytes - dots per unit vertical)
|
||||||
@@ -359,11 +370,8 @@ emit_jfif_app0 (j_compress_ptr cinfo)
|
|||||||
emit_byte(cinfo, 0x49);
|
emit_byte(cinfo, 0x49);
|
||||||
emit_byte(cinfo, 0x46);
|
emit_byte(cinfo, 0x46);
|
||||||
emit_byte(cinfo, 0);
|
emit_byte(cinfo, 0);
|
||||||
/* We currently emit version code 1.01 since we use no 1.02 features.
|
emit_byte(cinfo, cinfo->JFIF_major_version); /* Version fields */
|
||||||
* This may avoid complaints from some older decoders.
|
emit_byte(cinfo, cinfo->JFIF_minor_version);
|
||||||
*/
|
|
||||||
emit_byte(cinfo, 1); /* Major version */
|
|
||||||
emit_byte(cinfo, 1); /* Minor version */
|
|
||||||
emit_byte(cinfo, cinfo->density_unit); /* Pixel size information */
|
emit_byte(cinfo, cinfo->density_unit); /* Pixel size information */
|
||||||
emit_2bytes(cinfo, (int) cinfo->X_density);
|
emit_2bytes(cinfo, (int) cinfo->X_density);
|
||||||
emit_2bytes(cinfo, (int) cinfo->Y_density);
|
emit_2bytes(cinfo, (int) cinfo->Y_density);
|
||||||
@@ -419,28 +427,30 @@ emit_adobe_app14 (j_compress_ptr cinfo)
|
|||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* This routine is exported for possible use by applications.
|
* These routines allow writing an arbitrary marker with parameters.
|
||||||
* The intended use is to emit COM or APPn markers after calling
|
* The only intended use is to emit COM or APPn markers after calling
|
||||||
* jpeg_start_compress() and before the first jpeg_write_scanlines() call
|
* write_file_header and before calling write_frame_header.
|
||||||
* (hence, after write_file_header but before write_frame_header).
|
|
||||||
* Other uses are not guaranteed to produce desirable results.
|
* Other uses are not guaranteed to produce desirable results.
|
||||||
|
* Counting the parameter bytes properly is the caller's responsibility.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
METHODDEF(void)
|
METHODDEF(void)
|
||||||
write_any_marker (j_compress_ptr cinfo, int marker,
|
write_marker_header (j_compress_ptr cinfo, int marker, unsigned int datalen)
|
||||||
const JOCTET *dataptr, unsigned int datalen)
|
/* Emit an arbitrary marker header */
|
||||||
/* Emit an arbitrary marker with parameters */
|
|
||||||
{
|
{
|
||||||
if (datalen <= (unsigned int) 65533) { /* safety check */
|
if (datalen > (unsigned int) 65533) /* safety check */
|
||||||
|
ERREXIT(cinfo, JERR_BAD_LENGTH);
|
||||||
|
|
||||||
emit_marker(cinfo, (JPEG_MARKER) marker);
|
emit_marker(cinfo, (JPEG_MARKER) marker);
|
||||||
|
|
||||||
emit_2bytes(cinfo, (int) (datalen + 2)); /* total length */
|
emit_2bytes(cinfo, (int) (datalen + 2)); /* total length */
|
||||||
|
}
|
||||||
|
|
||||||
while (datalen--) {
|
METHODDEF(void)
|
||||||
emit_byte(cinfo, *dataptr);
|
write_marker_byte (j_compress_ptr cinfo, int val)
|
||||||
dataptr++;
|
/* Emit one byte of marker parameters following write_marker_header */
|
||||||
}
|
{
|
||||||
}
|
emit_byte(cinfo, val);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -458,8 +468,13 @@ write_any_marker (j_compress_ptr cinfo, int marker,
|
|||||||
METHODDEF(void)
|
METHODDEF(void)
|
||||||
write_file_header (j_compress_ptr cinfo)
|
write_file_header (j_compress_ptr cinfo)
|
||||||
{
|
{
|
||||||
|
my_marker_ptr marker = (my_marker_ptr) cinfo->marker;
|
||||||
|
|
||||||
emit_marker(cinfo, M_SOI); /* first the SOI */
|
emit_marker(cinfo, M_SOI); /* first the SOI */
|
||||||
|
|
||||||
|
/* SOI is defined to reset restart interval to 0 */
|
||||||
|
marker->last_restart_interval = 0;
|
||||||
|
|
||||||
if (cinfo->write_JFIF_header) /* next an optional JFIF APP0 */
|
if (cinfo->write_JFIF_header) /* next an optional JFIF APP0 */
|
||||||
emit_jfif_app0(cinfo);
|
emit_jfif_app0(cinfo);
|
||||||
if (cinfo->write_Adobe_marker) /* next an optional Adobe APP14 */
|
if (cinfo->write_Adobe_marker) /* next an optional Adobe APP14 */
|
||||||
@@ -535,6 +550,7 @@ write_frame_header (j_compress_ptr cinfo)
|
|||||||
METHODDEF(void)
|
METHODDEF(void)
|
||||||
write_scan_header (j_compress_ptr cinfo)
|
write_scan_header (j_compress_ptr cinfo)
|
||||||
{
|
{
|
||||||
|
my_marker_ptr marker = (my_marker_ptr) cinfo->marker;
|
||||||
int i;
|
int i;
|
||||||
jpeg_component_info *compptr;
|
jpeg_component_info *compptr;
|
||||||
|
|
||||||
@@ -567,11 +583,12 @@ write_scan_header (j_compress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* Emit DRI if required --- note that DRI value could change for each scan.
|
/* Emit DRI if required --- note that DRI value could change for each scan.
|
||||||
* If it doesn't, a tiny amount of space is wasted in multiple-scan files.
|
* We avoid wasting space with unnecessary DRIs, however.
|
||||||
* We assume DRI will never be nonzero for one scan and zero for a later one.
|
|
||||||
*/
|
*/
|
||||||
if (cinfo->restart_interval)
|
if (cinfo->restart_interval != marker->last_restart_interval) {
|
||||||
emit_dri(cinfo);
|
emit_dri(cinfo);
|
||||||
|
marker->last_restart_interval = cinfo->restart_interval;
|
||||||
|
}
|
||||||
|
|
||||||
emit_sos(cinfo);
|
emit_sos(cinfo);
|
||||||
}
|
}
|
||||||
@@ -627,15 +644,21 @@ write_tables_only (j_compress_ptr cinfo)
|
|||||||
GLOBAL(void)
|
GLOBAL(void)
|
||||||
jinit_marker_writer (j_compress_ptr cinfo)
|
jinit_marker_writer (j_compress_ptr cinfo)
|
||||||
{
|
{
|
||||||
|
my_marker_ptr marker;
|
||||||
|
|
||||||
/* Create the subobject */
|
/* Create the subobject */
|
||||||
cinfo->marker = (struct jpeg_marker_writer *)
|
marker = (my_marker_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
SIZEOF(struct jpeg_marker_writer));
|
SIZEOF(my_marker_writer));
|
||||||
|
cinfo->marker = (struct jpeg_marker_writer *) marker;
|
||||||
/* Initialize method pointers */
|
/* Initialize method pointers */
|
||||||
cinfo->marker->write_any_marker = write_any_marker;
|
marker->pub.write_file_header = write_file_header;
|
||||||
cinfo->marker->write_file_header = write_file_header;
|
marker->pub.write_frame_header = write_frame_header;
|
||||||
cinfo->marker->write_frame_header = write_frame_header;
|
marker->pub.write_scan_header = write_scan_header;
|
||||||
cinfo->marker->write_scan_header = write_scan_header;
|
marker->pub.write_file_trailer = write_file_trailer;
|
||||||
cinfo->marker->write_file_trailer = write_file_trailer;
|
marker->pub.write_tables_only = write_tables_only;
|
||||||
cinfo->marker->write_tables_only = write_tables_only;
|
marker->pub.write_marker_header = write_marker_header;
|
||||||
|
marker->pub.write_marker_byte = write_marker_byte;
|
||||||
|
/* Initialize private state */
|
||||||
|
marker->last_restart_interval = 0;
|
||||||
}
|
}
|
||||||
|
|||||||
16
jcmaster.c
16
jcmaster.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jcmaster.c
|
* jcmaster.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -185,8 +185,20 @@ validate_script (j_compress_ptr cinfo)
|
|||||||
Al = scanptr->Al;
|
Al = scanptr->Al;
|
||||||
if (cinfo->progressive_mode) {
|
if (cinfo->progressive_mode) {
|
||||||
#ifdef C_PROGRESSIVE_SUPPORTED
|
#ifdef C_PROGRESSIVE_SUPPORTED
|
||||||
|
/* The JPEG spec simply gives the ranges 0..13 for Ah and Al, but that
|
||||||
|
* seems wrong: the upper bound ought to depend on data precision.
|
||||||
|
* Perhaps they really meant 0..N+1 for N-bit precision.
|
||||||
|
* Here we allow 0..10 for 8-bit data; Al larger than 10 results in
|
||||||
|
* out-of-range reconstructed DC values during the first DC scan,
|
||||||
|
* which might cause problems for some decoders.
|
||||||
|
*/
|
||||||
|
#if BITS_IN_JSAMPLE == 8
|
||||||
|
#define MAX_AH_AL 10
|
||||||
|
#else
|
||||||
|
#define MAX_AH_AL 13
|
||||||
|
#endif
|
||||||
if (Ss < 0 || Ss >= DCTSIZE2 || Se < Ss || Se >= DCTSIZE2 ||
|
if (Ss < 0 || Ss >= DCTSIZE2 || Se < Ss || Se >= DCTSIZE2 ||
|
||||||
Ah < 0 || Ah > 13 || Al < 0 || Al > 13)
|
Ah < 0 || Ah > MAX_AH_AL || Al < 0 || Al > MAX_AH_AL)
|
||||||
ERREXIT1(cinfo, JERR_BAD_PROG_SCRIPT, scanno);
|
ERREXIT1(cinfo, JERR_BAD_PROG_SCRIPT, scanno);
|
||||||
if (Ss == 0) {
|
if (Ss == 0) {
|
||||||
if (Se != 0) /* DC and AC together not OK */
|
if (Se != 0) /* DC and AC together not OK */
|
||||||
|
|||||||
143
jcolsamp.h
Normal file
143
jcolsamp.h
Normal file
@@ -0,0 +1,143 @@
|
|||||||
|
/*
|
||||||
|
* jcolsamp.h - private declarations for color conversion & up/downsampling
|
||||||
|
*
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
*
|
||||||
|
* Last Modified : February 4, 2006
|
||||||
|
*
|
||||||
|
* [TAB8]
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
/* configuration check: BITS_IN_JSAMPLE==8 (8-bit sample values) is the only
|
||||||
|
* valid setting on this SIMD extension.
|
||||||
|
*/
|
||||||
|
#if BITS_IN_JSAMPLE != 8
|
||||||
|
#error "Sorry, this SIMD code only copes with 8-bit sample values."
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* Short forms of external names for systems with brain-damaged linkers. */
|
||||||
|
|
||||||
|
#ifdef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
#define jpeg_rgb_ycc_convert_mmx jMRgbYccCnv /* jccolmmx.asm */
|
||||||
|
#define jpeg_rgb_ycc_convert_sse2 jSRgbYccCnv /* jccolss2.asm */
|
||||||
|
#define jpeg_h2v1_downsample_mmx jM21Downsample /* jcsammmx.asm */
|
||||||
|
#define jpeg_h2v2_downsample_mmx jM22Downsample /* jcsammmx.asm */
|
||||||
|
#define jpeg_h2v1_downsample_sse2 jS21Downsample /* jcsamss2.asm */
|
||||||
|
#define jpeg_h2v2_downsample_sse2 jS22Downsample /* jcsamss2.asm */
|
||||||
|
#define jpeg_ycc_rgb_convert_mmx jMYccRgbCnv /* jdcolmmx.asm */
|
||||||
|
#define jpeg_ycc_rgb_convert_sse2 jSYccRgbCnv /* jdcolss2.asm */
|
||||||
|
#define jpeg_h2v1_merged_upsample_mmx jM21MerUpsample /* jdmermmx.asm */
|
||||||
|
#define jpeg_h2v2_merged_upsample_mmx jM22MerUpsample /* jdmermmx.asm */
|
||||||
|
#define jpeg_h2v1_merged_upsample_sse2 jS21MerUpsample /* jdmerss2.asm */
|
||||||
|
#define jpeg_h2v2_merged_upsample_sse2 jS22MerUpsample /* jdmerss2.asm */
|
||||||
|
#define jpeg_h2v1_fancy_upsample_mmx jM21FanUpsample /* jdsammmx.asm */
|
||||||
|
#define jpeg_h2v2_fancy_upsample_mmx jM22FanUpsample /* jdsammmx.asm */
|
||||||
|
#define jpeg_h1v2_fancy_upsample_mmx jM12FanUpsample /* jdsammmx.asm */
|
||||||
|
#define jpeg_h2v1_upsample_mmx jM21Upsample /* jdsammmx.asm */
|
||||||
|
#define jpeg_h2v2_upsample_mmx jM22Upsample /* jdsammmx.asm */
|
||||||
|
#define jpeg_h2v1_fancy_upsample_sse2 jS21FanUpsample /* jdsamss2.asm */
|
||||||
|
#define jpeg_h2v2_fancy_upsample_sse2 jS22FanUpsample /* jdsamss2.asm */
|
||||||
|
#define jpeg_h1v2_fancy_upsample_sse2 jS12FanUpsample /* jdsamss2.asm */
|
||||||
|
#define jpeg_h2v1_upsample_sse2 jS21Upsample /* jdsamss2.asm */
|
||||||
|
#define jpeg_h2v2_upsample_sse2 jS22Upsample /* jdsamss2.asm */
|
||||||
|
#define jconst_rgb_ycc_convert_mmx jMCRgbYccCnv /* jccolmmx.asm */
|
||||||
|
#define jconst_rgb_ycc_convert_sse2 jSCRgbYccCnv /* jccolss2.asm */
|
||||||
|
#define jconst_ycc_rgb_convert_mmx jMCYccRgbCnv /* jdcolmmx.asm */
|
||||||
|
#define jconst_ycc_rgb_convert_sse2 jSCYccRgbCnv /* jdcolss2.asm */
|
||||||
|
#define jconst_merged_upsample_mmx jMCMerUpsample /* jdmermmx.asm */
|
||||||
|
#define jconst_merged_upsample_sse2 jSCMerUpsample /* jdmerss2.asm */
|
||||||
|
#define jconst_fancy_upsample_mmx jMCFanUpsample /* jdsammmx.asm */
|
||||||
|
#define jconst_fancy_upsample_sse2 jSCFanUpsample /* jdsamss2.asm */
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
#define jpeg_simd_merged_upsampler jSiMUpsampler /* jdmerge.c */
|
||||||
|
#endif
|
||||||
|
#endif /* NEED_SHORT_EXTERNAL_NAMES */
|
||||||
|
|
||||||
|
/* Extern declarations for color conversion & up/downsampling routines. */
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_rgb_ycc_convert_mmx
|
||||||
|
JPP((j_compress_ptr cinfo, JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
|
||||||
|
JDIMENSION output_row, int num_rows));
|
||||||
|
EXTERN(void) jpeg_rgb_ycc_convert_sse2
|
||||||
|
JPP((j_compress_ptr cinfo, JSAMPARRAY input_buf, JSAMPIMAGE output_buf,
|
||||||
|
JDIMENSION output_row, int num_rows));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_h2v1_downsample_mmx
|
||||||
|
JPP((j_compress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY output_data));
|
||||||
|
EXTERN(void) jpeg_h2v2_downsample_mmx
|
||||||
|
JPP((j_compress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY output_data));
|
||||||
|
EXTERN(void) jpeg_h2v1_downsample_sse2
|
||||||
|
JPP((j_compress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY output_data));
|
||||||
|
EXTERN(void) jpeg_h2v2_downsample_sse2
|
||||||
|
JPP((j_compress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY output_data));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_ycc_rgb_convert_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, JSAMPIMAGE input_buf, JDIMENSION input_row,
|
||||||
|
JSAMPARRAY output_buf, int num_rows));
|
||||||
|
EXTERN(void) jpeg_ycc_rgb_convert_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, JSAMPIMAGE input_buf, JDIMENSION input_row,
|
||||||
|
JSAMPARRAY output_buf, int num_rows));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_h2v1_merged_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));
|
||||||
|
EXTERN(void) jpeg_h2v2_merged_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));
|
||||||
|
EXTERN(void) jpeg_h2v1_merged_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));
|
||||||
|
EXTERN(void) jpeg_h2v2_merged_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
JDIMENSION in_row_group_ctr, JSAMPARRAY output_buf));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_h2v1_fancy_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v2_fancy_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h1v2_fancy_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v1_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v2_upsample_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v1_fancy_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v2_fancy_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h1v2_fancy_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v1_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
EXTERN(void) jpeg_h2v2_upsample_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr));
|
||||||
|
|
||||||
|
extern const int jconst_rgb_ycc_convert_mmx[];
|
||||||
|
extern const int jconst_rgb_ycc_convert_sse2[];
|
||||||
|
extern const int jconst_ycc_rgb_convert_mmx[];
|
||||||
|
extern const int jconst_ycc_rgb_convert_sse2[];
|
||||||
|
extern const int jconst_merged_upsample_mmx[];
|
||||||
|
extern const int jconst_merged_upsample_sse2[];
|
||||||
|
extern const int jconst_fancy_upsample_mmx[];
|
||||||
|
extern const int jconst_fancy_upsample_sse2[];
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
EXTERN(unsigned int) jpeg_simd_merged_upsampler JPP((j_decompress_ptr cinfo));
|
||||||
|
#endif
|
||||||
156
jcolsamp.inc
Normal file
156
jcolsamp.inc
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
;
|
||||||
|
; jcolsamp.inc - private declarations for color conversion & up/downsampling
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; Last Modified : January 5, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; configuration check: BITS_IN_JSAMPLE==8 (8-bit sample values) is the only
|
||||||
|
; valid setting on this SIMD extension.
|
||||||
|
;
|
||||||
|
%if BITS_IN_JSAMPLE != 8
|
||||||
|
%error "Sorry, this SIMD code only copes with 8-bit sample values."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; Short forms of external names for systems with brain-damaged linkers.
|
||||||
|
;
|
||||||
|
%ifdef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
%define jpeg_rgb_ycc_convert_mmx jMRgbYccCnv ; jccolmmx.asm
|
||||||
|
%define jpeg_rgb_ycc_convert_sse2 jSRgbYccCnv ; jccolss2.asm
|
||||||
|
%define jpeg_h2v1_downsample_mmx jM21Downsample ; jcsammmx.asm
|
||||||
|
%define jpeg_h2v2_downsample_mmx jM22Downsample ; jcsammmx.asm
|
||||||
|
%define jpeg_h2v1_downsample_sse2 jS21Downsample ; jcsamss2.asm
|
||||||
|
%define jpeg_h2v2_downsample_sse2 jS22Downsample ; jcsamss2.asm
|
||||||
|
%define jpeg_ycc_rgb_convert_mmx jMYccRgbCnv ; jdcolmmx.asm
|
||||||
|
%define jpeg_ycc_rgb_convert_sse2 jSYccRgbCnv ; jdcolss2.asm
|
||||||
|
%define jpeg_h2v1_merged_upsample_mmx jM21MerUpsample ; jdmermmx.asm
|
||||||
|
%define jpeg_h2v2_merged_upsample_mmx jM22MerUpsample ; jdmermmx.asm
|
||||||
|
%define jpeg_h2v1_merged_upsample_sse2 jS21MerUpsample ; jdmerss2.asm
|
||||||
|
%define jpeg_h2v2_merged_upsample_sse2 jS22MerUpsample ; jdmerss2.asm
|
||||||
|
%define jpeg_h2v1_fancy_upsample_mmx jM21FanUpsample ; jdsammmx.asm
|
||||||
|
%define jpeg_h2v2_fancy_upsample_mmx jM22FanUpsample ; jdsammmx.asm
|
||||||
|
%define jpeg_h1v2_fancy_upsample_mmx jM12FanUpsample ; jdsammmx.asm
|
||||||
|
%define jpeg_h2v1_upsample_mmx jM21Upsample ; jdsammmx.asm
|
||||||
|
%define jpeg_h2v2_upsample_mmx jM22Upsample ; jdsammmx.asm
|
||||||
|
%define jpeg_h2v1_fancy_upsample_sse2 jS21FanUpsample ; jdsamss2.asm
|
||||||
|
%define jpeg_h2v2_fancy_upsample_sse2 jS22FanUpsample ; jdsamss2.asm
|
||||||
|
%define jpeg_h1v2_fancy_upsample_sse2 jS12FanUpsample ; jdsamss2.asm
|
||||||
|
%define jpeg_h2v1_upsample_sse2 jS21Upsample ; jdsamss2.asm
|
||||||
|
%define jpeg_h2v2_upsample_sse2 jS22Upsample ; jdsamss2.asm
|
||||||
|
%define jconst_rgb_ycc_convert_mmx jMCRgbYccCnv ; jccolmmx.asm
|
||||||
|
%define jconst_rgb_ycc_convert_sse2 jSCRgbYccCnv ; jccolss2.asm
|
||||||
|
%define jconst_ycc_rgb_convert_mmx jMCYccRgbCnv ; jdcolmmx.asm
|
||||||
|
%define jconst_ycc_rgb_convert_sse2 jSCYccRgbCnv ; jdcolss2.asm
|
||||||
|
%define jconst_merged_upsample_mmx jMCMerUpsample ; jdmermmx.asm
|
||||||
|
%define jconst_merged_upsample_sse2 jSCMerUpsample ; jdmerss2.asm
|
||||||
|
%define jconst_fancy_upsample_mmx jMCFanUpsample ; jdsammmx.asm
|
||||||
|
%define jconst_fancy_upsample_sse2 jSCFanUpsample ; jdsamss2.asm
|
||||||
|
%endif ; NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
; pseudo-resisters to make ordering of RGB configurable
|
||||||
|
;
|
||||||
|
%if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
%if RGB_RED < 0 || RGB_RED >= RGB_PIXELSIZE || RGB_GREEN < 0 || \
|
||||||
|
RGB_GREEN >= RGB_PIXELSIZE || RGB_BLUE < 0 || RGB_BLUE >= RGB_PIXELSIZE || \
|
||||||
|
RGB_RED == RGB_GREEN || RGB_GREEN == RGB_BLUE || RGB_RED == RGB_BLUE
|
||||||
|
%error "Incorrect RGB pixel offset."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if RGB_RED == 0
|
||||||
|
%define mmA mm0
|
||||||
|
%define mmB mm1
|
||||||
|
%define xmmA xmm0
|
||||||
|
%define xmmB xmm1
|
||||||
|
%elif RGB_GREEN == 0
|
||||||
|
%define mmA mm2
|
||||||
|
%define mmB mm3
|
||||||
|
%define xmmA xmm2
|
||||||
|
%define xmmB xmm3
|
||||||
|
%elif RGB_BLUE == 0
|
||||||
|
%define mmA mm4
|
||||||
|
%define mmB mm5
|
||||||
|
%define xmmA xmm4
|
||||||
|
%define xmmB xmm5
|
||||||
|
%else
|
||||||
|
%define mmA mm6
|
||||||
|
%define mmB mm7
|
||||||
|
%define xmmA xmm6
|
||||||
|
%define xmmB xmm7
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if RGB_RED == 1
|
||||||
|
%define mmC mm0
|
||||||
|
%define mmD mm1
|
||||||
|
%define xmmC xmm0
|
||||||
|
%define xmmD xmm1
|
||||||
|
%elif RGB_GREEN == 1
|
||||||
|
%define mmC mm2
|
||||||
|
%define mmD mm3
|
||||||
|
%define xmmC xmm2
|
||||||
|
%define xmmD xmm3
|
||||||
|
%elif RGB_BLUE == 1
|
||||||
|
%define mmC mm4
|
||||||
|
%define mmD mm5
|
||||||
|
%define xmmC xmm4
|
||||||
|
%define xmmD xmm5
|
||||||
|
%else
|
||||||
|
%define mmC mm6
|
||||||
|
%define mmD mm7
|
||||||
|
%define xmmC xmm6
|
||||||
|
%define xmmD xmm7
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if RGB_RED == 2
|
||||||
|
%define mmE mm0
|
||||||
|
%define mmF mm1
|
||||||
|
%define xmmE xmm0
|
||||||
|
%define xmmF xmm1
|
||||||
|
%elif RGB_GREEN == 2
|
||||||
|
%define mmE mm2
|
||||||
|
%define mmF mm3
|
||||||
|
%define xmmE xmm2
|
||||||
|
%define xmmF xmm3
|
||||||
|
%elif RGB_BLUE == 2
|
||||||
|
%define mmE mm4
|
||||||
|
%define mmF mm5
|
||||||
|
%define xmmE xmm4
|
||||||
|
%define xmmF xmm5
|
||||||
|
%else
|
||||||
|
%define mmE mm6
|
||||||
|
%define mmF mm7
|
||||||
|
%define xmmE xmm6
|
||||||
|
%define xmmF xmm7
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if RGB_RED == 3
|
||||||
|
%define mmG mm0
|
||||||
|
%define mmH mm1
|
||||||
|
%define xmmG xmm0
|
||||||
|
%define xmmH xmm1
|
||||||
|
%elif RGB_GREEN == 3
|
||||||
|
%define mmG mm2
|
||||||
|
%define mmH mm3
|
||||||
|
%define xmmG xmm2
|
||||||
|
%define xmmH xmm3
|
||||||
|
%elif RGB_BLUE == 3
|
||||||
|
%define mmG mm4
|
||||||
|
%define mmH mm5
|
||||||
|
%define xmmG xmm4
|
||||||
|
%define xmmH xmm5
|
||||||
|
%else
|
||||||
|
%define mmG mm6
|
||||||
|
%define mmH mm7
|
||||||
|
%define xmmG xmm6
|
||||||
|
%define xmmH xmm7
|
||||||
|
%endif
|
||||||
|
%endif ; RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
74
jcomapi.c
74
jcomapi.c
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* jcomapi.c
|
* jcomapi.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1996, Thomas G. Lane.
|
* Copyright (C) 1994-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : March 11, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains application interface routines that are used for both
|
* This file contains application interface routines that are used for both
|
||||||
* compression and decompression.
|
* compression and decompression.
|
||||||
*/
|
*/
|
||||||
@@ -30,6 +37,10 @@ jpeg_abort (j_common_ptr cinfo)
|
|||||||
{
|
{
|
||||||
int pool;
|
int pool;
|
||||||
|
|
||||||
|
/* Do nothing if called on a not-initialized or destroyed JPEG object. */
|
||||||
|
if (cinfo->mem == NULL)
|
||||||
|
return;
|
||||||
|
|
||||||
/* Releasing pools in reverse order might help avoid fragmentation
|
/* Releasing pools in reverse order might help avoid fragmentation
|
||||||
* with some (brain-damaged) malloc libraries.
|
* with some (brain-damaged) malloc libraries.
|
||||||
*/
|
*/
|
||||||
@@ -38,7 +49,15 @@ jpeg_abort (j_common_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* Reset overall state for possible reuse of object */
|
/* Reset overall state for possible reuse of object */
|
||||||
cinfo->global_state = (cinfo->is_decompressor ? DSTATE_START : CSTATE_START);
|
if (cinfo->is_decompressor) {
|
||||||
|
cinfo->global_state = DSTATE_START;
|
||||||
|
/* Try to keep application from accessing now-deleted marker list.
|
||||||
|
* A bit kludgy to do it here, but this is the most central place.
|
||||||
|
*/
|
||||||
|
((j_decompress_ptr) cinfo)->marker_list = NULL;
|
||||||
|
} else {
|
||||||
|
cinfo->global_state = CSTATE_START;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -92,3 +111,54 @@ jpeg_alloc_huff_table (j_common_ptr cinfo)
|
|||||||
tbl->sent_table = FALSE; /* make sure this is false in any new table */
|
tbl->sent_table = FALSE; /* make sure this is false in any new table */
|
||||||
return tbl;
|
return tbl;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: Checking for support of SIMD instruction set.
|
||||||
|
*/
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_support (j_common_ptr cinfo)
|
||||||
|
{
|
||||||
|
enum { JSIMD_INVALID = ~0 };
|
||||||
|
static volatile unsigned int simd_supported = JSIMD_INVALID;
|
||||||
|
|
||||||
|
if (simd_supported == JSIMD_INVALID)
|
||||||
|
simd_supported = jpeg_simd_os_support(jpeg_simd_cpu_support());
|
||||||
|
|
||||||
|
#ifndef JSIMD_MASKFUNC_NOT_SUPPORTED
|
||||||
|
if (cinfo != NULL) /* Turn off the masked flags */
|
||||||
|
return simd_supported & ~jpeg_simd_mask(cinfo, JSIMD_NONE, JSIMD_NONE);
|
||||||
|
#endif
|
||||||
|
return simd_supported;
|
||||||
|
}
|
||||||
|
|
||||||
|
#ifndef JSIMD_MASKFUNC_NOT_SUPPORTED
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: modify/retrieve SIMD instruction mask
|
||||||
|
*/
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_mask (j_common_ptr cinfo, unsigned int remove, unsigned int add)
|
||||||
|
{
|
||||||
|
unsigned long *gp;
|
||||||
|
unsigned int oldmask;
|
||||||
|
|
||||||
|
if (cinfo->is_decompressor)
|
||||||
|
gp = (unsigned long *) &((j_decompress_ptr) cinfo)->output_gamma;
|
||||||
|
else /* compressor */
|
||||||
|
gp = (unsigned long *) &((j_compress_ptr) cinfo)->input_gamma;
|
||||||
|
|
||||||
|
if ((gp[1] == 0x3FF00000 || gp[1] == 0x00000000) && /* +1.0 or +0.0 */
|
||||||
|
(gp[0] & ~JSIMD_ALL) == 0) {
|
||||||
|
oldmask = gp[0];
|
||||||
|
if (((remove | add) & ~JSIMD_ALL) == 0)
|
||||||
|
gp[0] = (oldmask & ~remove) | add;
|
||||||
|
} else {
|
||||||
|
oldmask = 0; /* error */
|
||||||
|
}
|
||||||
|
return oldmask;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MASKFUNC_NOT_SUPPORTED */
|
||||||
|
|||||||
48
jconfig.bc5
Normal file
48
jconfig.bc5
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
/* jconfig.bc5 --- jconfig.h for Borland C++ Compiler 5.5 (win32) */
|
||||||
|
/* see jconfig.doc for explanations */
|
||||||
|
|
||||||
|
#define HAVE_PROTOTYPES
|
||||||
|
#define HAVE_UNSIGNED_CHAR
|
||||||
|
#define HAVE_UNSIGNED_SHORT
|
||||||
|
/* #define void char */
|
||||||
|
/* #define const */
|
||||||
|
#undef CHAR_IS_UNSIGNED
|
||||||
|
#define HAVE_STDDEF_H
|
||||||
|
#define HAVE_STDLIB_H
|
||||||
|
#undef NEED_BSD_STRINGS
|
||||||
|
#undef NEED_SYS_TYPES_H
|
||||||
|
#undef NEED_FAR_POINTERS /* we presume a 32-bit flat memory model */
|
||||||
|
#undef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
#undef INCOMPLETE_TYPES_BROKEN /* this assumes you have -w-stu in CFLAGS */
|
||||||
|
|
||||||
|
/* Define "boolean" as unsigned char, not int, per Windows custom */
|
||||||
|
#define TYPEDEF_UCHAR_BOOLEAN
|
||||||
|
|
||||||
|
#ifdef JPEG_INTERNALS
|
||||||
|
|
||||||
|
#undef RIGHT_SHIFT_IS_UNSIGNED
|
||||||
|
|
||||||
|
#endif /* JPEG_INTERNALS */
|
||||||
|
|
||||||
|
#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)
|
||||||
|
#undef JSIMD_MMX_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_3DNOW_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE2_NOT_SUPPORTED
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef JPEG_CJPEG_DJPEG
|
||||||
|
|
||||||
|
#define BMP_SUPPORTED /* BMP image file format */
|
||||||
|
#define GIF_SUPPORTED /* GIF image file format */
|
||||||
|
#define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */
|
||||||
|
#undef RLE_SUPPORTED /* Utah RLE image file format */
|
||||||
|
#define TARGA_SUPPORTED /* Targa image file format */
|
||||||
|
|
||||||
|
#define TWO_FILE_COMMANDLINE
|
||||||
|
#define USE_SETMODE /* Borland has setmode() */
|
||||||
|
#undef NEED_SIGNAL_CATCHER /* Define this if you use jmemname.c */
|
||||||
|
#undef DONT_USE_B_MODE
|
||||||
|
#undef PROGRESS_REPORT /* optional */
|
||||||
|
|
||||||
|
#endif /* JPEG_CJPEG_DJPEG */
|
||||||
12
jconfig.cfg
12
jconfig.cfg
@@ -16,6 +16,9 @@
|
|||||||
/* Define this if you get warnings about undefined structures. */
|
/* Define this if you get warnings about undefined structures. */
|
||||||
#undef INCOMPLETE_TYPES_BROKEN
|
#undef INCOMPLETE_TYPES_BROKEN
|
||||||
|
|
||||||
|
/* Define "boolean" as unsigned char, not int, per Windows custom */
|
||||||
|
#undef TYPEDEF_UCHAR_BOOLEAN
|
||||||
|
|
||||||
#ifdef JPEG_INTERNALS
|
#ifdef JPEG_INTERNALS
|
||||||
|
|
||||||
#undef RIGHT_SHIFT_IS_UNSIGNED
|
#undef RIGHT_SHIFT_IS_UNSIGNED
|
||||||
@@ -26,6 +29,13 @@
|
|||||||
|
|
||||||
#endif /* JPEG_INTERNALS */
|
#endif /* JPEG_INTERNALS */
|
||||||
|
|
||||||
|
#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)
|
||||||
|
#undef JSIMD_MMX_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_3DNOW_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE2_NOT_SUPPORTED
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifdef JPEG_CJPEG_DJPEG
|
#ifdef JPEG_CJPEG_DJPEG
|
||||||
|
|
||||||
#define BMP_SUPPORTED /* BMP image file format */
|
#define BMP_SUPPORTED /* BMP image file format */
|
||||||
@@ -35,6 +45,8 @@
|
|||||||
#define TARGA_SUPPORTED /* Targa image file format */
|
#define TARGA_SUPPORTED /* Targa image file format */
|
||||||
|
|
||||||
#undef TWO_FILE_COMMANDLINE
|
#undef TWO_FILE_COMMANDLINE
|
||||||
|
#undef USE_SETMODE
|
||||||
|
#undef USE_FDOPEN
|
||||||
#undef NEED_SIGNAL_CATCHER
|
#undef NEED_SIGNAL_CATCHER
|
||||||
#undef DONT_USE_B_MODE
|
#undef DONT_USE_B_MODE
|
||||||
|
|
||||||
|
|||||||
@@ -21,6 +21,13 @@
|
|||||||
|
|
||||||
#endif /* JPEG_INTERNALS */
|
#endif /* JPEG_INTERNALS */
|
||||||
|
|
||||||
|
#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)
|
||||||
|
#undef JSIMD_MMX_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_3DNOW_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE2_NOT_SUPPORTED
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifdef JPEG_CJPEG_DJPEG
|
#ifdef JPEG_CJPEG_DJPEG
|
||||||
|
|
||||||
#define BMP_SUPPORTED /* BMP image file format */
|
#define BMP_SUPPORTED /* BMP image file format */
|
||||||
@@ -35,4 +42,6 @@
|
|||||||
#undef DONT_USE_B_MODE
|
#undef DONT_USE_B_MODE
|
||||||
#undef PROGRESS_REPORT /* optional */
|
#undef PROGRESS_REPORT /* optional */
|
||||||
|
|
||||||
|
#define FREE_MEM_ESTIMATE 0 /* for alternate cjpeg/djpeg */
|
||||||
|
|
||||||
#endif /* JPEG_CJPEG_DJPEG */
|
#endif /* JPEG_CJPEG_DJPEG */
|
||||||
|
|||||||
44
jconfig.linux
Normal file
44
jconfig.linux
Normal file
@@ -0,0 +1,44 @@
|
|||||||
|
/* jconfig.linux --- jconfig.h for Linux ELF with gcc */
|
||||||
|
/* see jconfig.doc for explanations */
|
||||||
|
|
||||||
|
#define HAVE_PROTOTYPES
|
||||||
|
#define HAVE_UNSIGNED_CHAR
|
||||||
|
#define HAVE_UNSIGNED_SHORT
|
||||||
|
/* #define void char */
|
||||||
|
/* #define const */
|
||||||
|
#undef CHAR_IS_UNSIGNED
|
||||||
|
#define HAVE_STDDEF_H
|
||||||
|
#define HAVE_STDLIB_H
|
||||||
|
#undef NEED_BSD_STRINGS
|
||||||
|
#undef NEED_SYS_TYPES_H
|
||||||
|
#undef NEED_FAR_POINTERS
|
||||||
|
#undef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
#undef INCOMPLETE_TYPES_BROKEN
|
||||||
|
|
||||||
|
#ifdef JPEG_INTERNALS
|
||||||
|
|
||||||
|
#undef RIGHT_SHIFT_IS_UNSIGNED
|
||||||
|
|
||||||
|
#endif /* JPEG_INTERNALS */
|
||||||
|
|
||||||
|
#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)
|
||||||
|
#undef JSIMD_MMX_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_3DNOW_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE2_NOT_SUPPORTED
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef JPEG_CJPEG_DJPEG
|
||||||
|
|
||||||
|
#define BMP_SUPPORTED /* BMP image file format */
|
||||||
|
#define GIF_SUPPORTED /* GIF image file format */
|
||||||
|
#define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */
|
||||||
|
#undef RLE_SUPPORTED /* Utah RLE image file format */
|
||||||
|
#define TARGA_SUPPORTED /* Targa image file format */
|
||||||
|
|
||||||
|
#undef TWO_FILE_COMMANDLINE
|
||||||
|
#undef NEED_SIGNAL_CATCHER /* Define this if you use jmemname.c */
|
||||||
|
#undef DONT_USE_B_MODE
|
||||||
|
#undef PROGRESS_REPORT /* optional */
|
||||||
|
|
||||||
|
#endif /* JPEG_CJPEG_DJPEG */
|
||||||
48
jconfig.mgw
Normal file
48
jconfig.mgw
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
/* jconfig.mgw --- jconfig.h for MinGW */
|
||||||
|
/* see jconfig.doc for explanations */
|
||||||
|
|
||||||
|
#define HAVE_PROTOTYPES
|
||||||
|
#define HAVE_UNSIGNED_CHAR
|
||||||
|
#define HAVE_UNSIGNED_SHORT
|
||||||
|
/* #define void char */
|
||||||
|
/* #define const */
|
||||||
|
#undef CHAR_IS_UNSIGNED
|
||||||
|
#define HAVE_STDDEF_H
|
||||||
|
#define HAVE_STDLIB_H
|
||||||
|
#undef NEED_BSD_STRINGS
|
||||||
|
#undef NEED_SYS_TYPES_H
|
||||||
|
#undef NEED_FAR_POINTERS
|
||||||
|
#undef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
#undef INCOMPLETE_TYPES_BROKEN
|
||||||
|
|
||||||
|
/* Define "boolean" as unsigned char, not int, per Windows custom */
|
||||||
|
#define TYPEDEF_UCHAR_BOOLEAN
|
||||||
|
|
||||||
|
#ifdef JPEG_INTERNALS
|
||||||
|
|
||||||
|
#undef RIGHT_SHIFT_IS_UNSIGNED
|
||||||
|
|
||||||
|
#endif /* JPEG_INTERNALS */
|
||||||
|
|
||||||
|
#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)
|
||||||
|
#undef JSIMD_MMX_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_3DNOW_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE2_NOT_SUPPORTED
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef JPEG_CJPEG_DJPEG
|
||||||
|
|
||||||
|
#define BMP_SUPPORTED /* BMP image file format */
|
||||||
|
#define GIF_SUPPORTED /* GIF image file format */
|
||||||
|
#define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */
|
||||||
|
#undef RLE_SUPPORTED /* Utah RLE image file format */
|
||||||
|
#define TARGA_SUPPORTED /* Targa image file format */
|
||||||
|
|
||||||
|
#define TWO_FILE_COMMANDLINE /* optional */
|
||||||
|
#define USE_SETMODE /* MinGW has setmode() */
|
||||||
|
#undef NEED_SIGNAL_CATCHER /* Define this if you use jmemname.c */
|
||||||
|
#undef DONT_USE_B_MODE
|
||||||
|
#undef PROGRESS_REPORT /* optional */
|
||||||
|
|
||||||
|
#endif /* JPEG_CJPEG_DJPEG */
|
||||||
48
jconfig.vc
Normal file
48
jconfig.vc
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
/* jconfig.vc --- jconfig.h for Microsoft Visual C++ on Windows 95 or NT. */
|
||||||
|
/* see jconfig.doc for explanations */
|
||||||
|
|
||||||
|
#define HAVE_PROTOTYPES
|
||||||
|
#define HAVE_UNSIGNED_CHAR
|
||||||
|
#define HAVE_UNSIGNED_SHORT
|
||||||
|
/* #define void char */
|
||||||
|
/* #define const */
|
||||||
|
#undef CHAR_IS_UNSIGNED
|
||||||
|
#define HAVE_STDDEF_H
|
||||||
|
#define HAVE_STDLIB_H
|
||||||
|
#undef NEED_BSD_STRINGS
|
||||||
|
#undef NEED_SYS_TYPES_H
|
||||||
|
#undef NEED_FAR_POINTERS /* we presume a 32-bit flat memory model */
|
||||||
|
#undef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
#undef INCOMPLETE_TYPES_BROKEN
|
||||||
|
|
||||||
|
/* Define "boolean" as unsigned char, not int, per Windows custom */
|
||||||
|
#define TYPEDEF_UCHAR_BOOLEAN
|
||||||
|
|
||||||
|
#ifdef JPEG_INTERNALS
|
||||||
|
|
||||||
|
#undef RIGHT_SHIFT_IS_UNSIGNED
|
||||||
|
|
||||||
|
#endif /* JPEG_INTERNALS */
|
||||||
|
|
||||||
|
#if defined(JPEG_INTERNALS) || defined(JPEG_INTERNAL_OPTIONS)
|
||||||
|
#undef JSIMD_MMX_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_3DNOW_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE_NOT_SUPPORTED
|
||||||
|
#undef JSIMD_SSE2_NOT_SUPPORTED
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef JPEG_CJPEG_DJPEG
|
||||||
|
|
||||||
|
#define BMP_SUPPORTED /* BMP image file format */
|
||||||
|
#define GIF_SUPPORTED /* GIF image file format */
|
||||||
|
#define PPM_SUPPORTED /* PBMPLUS PPM/PGM image file format */
|
||||||
|
#undef RLE_SUPPORTED /* Utah RLE image file format */
|
||||||
|
#define TARGA_SUPPORTED /* Targa image file format */
|
||||||
|
|
||||||
|
#define TWO_FILE_COMMANDLINE /* optional */
|
||||||
|
#define USE_SETMODE /* Microsoft has setmode() */
|
||||||
|
#undef NEED_SIGNAL_CATCHER
|
||||||
|
#undef DONT_USE_B_MODE
|
||||||
|
#undef PROGRESS_REPORT /* optional */
|
||||||
|
|
||||||
|
#endif /* JPEG_CJPEG_DJPEG */
|
||||||
50
jcparam.c
50
jcparam.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jcparam.c
|
* jcparam.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -29,7 +29,7 @@ jpeg_add_quant_table (j_compress_ptr cinfo, int which_tbl,
|
|||||||
* are limited to 1..255 for JPEG baseline compatibility.
|
* are limited to 1..255 for JPEG baseline compatibility.
|
||||||
*/
|
*/
|
||||||
{
|
{
|
||||||
JQUANT_TBL ** qtblptr = & cinfo->quant_tbl_ptrs[which_tbl];
|
JQUANT_TBL ** qtblptr;
|
||||||
int i;
|
int i;
|
||||||
long temp;
|
long temp;
|
||||||
|
|
||||||
@@ -37,6 +37,11 @@ jpeg_add_quant_table (j_compress_ptr cinfo, int which_tbl,
|
|||||||
if (cinfo->global_state != CSTATE_START)
|
if (cinfo->global_state != CSTATE_START)
|
||||||
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
||||||
|
|
||||||
|
if (which_tbl < 0 || which_tbl >= NUM_QUANT_TBLS)
|
||||||
|
ERREXIT1(cinfo, JERR_DQT_INDEX, which_tbl);
|
||||||
|
|
||||||
|
qtblptr = & cinfo->quant_tbl_ptrs[which_tbl];
|
||||||
|
|
||||||
if (*qtblptr == NULL)
|
if (*qtblptr == NULL)
|
||||||
*qtblptr = jpeg_alloc_quant_table((j_common_ptr) cinfo);
|
*qtblptr = jpeg_alloc_quant_table((j_common_ptr) cinfo);
|
||||||
|
|
||||||
@@ -148,11 +153,25 @@ add_huff_table (j_compress_ptr cinfo,
|
|||||||
JHUFF_TBL **htblptr, const UINT8 *bits, const UINT8 *val)
|
JHUFF_TBL **htblptr, const UINT8 *bits, const UINT8 *val)
|
||||||
/* Define a Huffman table */
|
/* Define a Huffman table */
|
||||||
{
|
{
|
||||||
|
int nsymbols, len;
|
||||||
|
|
||||||
if (*htblptr == NULL)
|
if (*htblptr == NULL)
|
||||||
*htblptr = jpeg_alloc_huff_table((j_common_ptr) cinfo);
|
*htblptr = jpeg_alloc_huff_table((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
/* Copy the number-of-symbols-of-each-code-length counts */
|
||||||
MEMCOPY((*htblptr)->bits, bits, SIZEOF((*htblptr)->bits));
|
MEMCOPY((*htblptr)->bits, bits, SIZEOF((*htblptr)->bits));
|
||||||
MEMCOPY((*htblptr)->huffval, val, SIZEOF((*htblptr)->huffval));
|
|
||||||
|
/* Validate the counts. We do this here mainly so we can copy the right
|
||||||
|
* number of symbols from the val[] array, without risking marching off
|
||||||
|
* the end of memory. jchuff.c will do a more thorough test later.
|
||||||
|
*/
|
||||||
|
nsymbols = 0;
|
||||||
|
for (len = 1; len <= 16; len++)
|
||||||
|
nsymbols += bits[len];
|
||||||
|
if (nsymbols < 1 || nsymbols > 256)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
|
|
||||||
|
MEMCOPY((*htblptr)->huffval, val, nsymbols * SIZEOF(UINT8));
|
||||||
|
|
||||||
/* Initialize sent_table FALSE so table will be written to JPEG file. */
|
/* Initialize sent_table FALSE so table will be written to JPEG file. */
|
||||||
(*htblptr)->sent_table = FALSE;
|
(*htblptr)->sent_table = FALSE;
|
||||||
@@ -313,7 +332,15 @@ jpeg_set_defaults (j_compress_ptr cinfo)
|
|||||||
|
|
||||||
/* Fill in default JFIF marker parameters. Note that whether the marker
|
/* Fill in default JFIF marker parameters. Note that whether the marker
|
||||||
* will actually be written is determined by jpeg_set_colorspace.
|
* will actually be written is determined by jpeg_set_colorspace.
|
||||||
|
*
|
||||||
|
* By default, the library emits JFIF version code 1.01.
|
||||||
|
* An application that wants to emit JFIF 1.02 extension markers should set
|
||||||
|
* JFIF_minor_version to 2. We could probably get away with just defaulting
|
||||||
|
* to 1.02, but there may still be some decoders in use that will complain
|
||||||
|
* about that; saying 1.01 should minimize compatibility problems.
|
||||||
*/
|
*/
|
||||||
|
cinfo->JFIF_major_version = 1; /* Default JFIF version = 1.01 */
|
||||||
|
cinfo->JFIF_minor_version = 1;
|
||||||
cinfo->density_unit = 0; /* Pixel size is unknown by default */
|
cinfo->density_unit = 0; /* Pixel size is unknown by default */
|
||||||
cinfo->X_density = 1; /* Pixel aspect ratio is square by default */
|
cinfo->X_density = 1; /* Pixel aspect ratio is square by default */
|
||||||
cinfo->Y_density = 1;
|
cinfo->Y_density = 1;
|
||||||
@@ -529,11 +556,20 @@ jpeg_simple_progression (j_compress_ptr cinfo)
|
|||||||
nscans = 2 + 4 * ncomps; /* 2 DC scans; 4 AC scans per component */
|
nscans = 2 + 4 * ncomps; /* 2 DC scans; 4 AC scans per component */
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Allocate space for script. */
|
/* Allocate space for script.
|
||||||
/* We use permanent pool just in case application re-uses script. */
|
* We need to put it in the permanent pool in case the application performs
|
||||||
scanptr = (jpeg_scan_info *)
|
* multiple compressions without changing the settings. To avoid a memory
|
||||||
|
* leak if jpeg_simple_progression is called repeatedly for the same JPEG
|
||||||
|
* object, we try to re-use previously allocated space, and we allocate
|
||||||
|
* enough space to handle YCbCr even if initially asked for grayscale.
|
||||||
|
*/
|
||||||
|
if (cinfo->script_space == NULL || cinfo->script_space_size < nscans) {
|
||||||
|
cinfo->script_space_size = MAX(nscans, 10);
|
||||||
|
cinfo->script_space = (jpeg_scan_info *)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,
|
||||||
nscans * SIZEOF(jpeg_scan_info));
|
cinfo->script_space_size * SIZEOF(jpeg_scan_info));
|
||||||
|
}
|
||||||
|
scanptr = cinfo->script_space;
|
||||||
cinfo->scan_info = scanptr;
|
cinfo->scan_info = scanptr;
|
||||||
cinfo->num_scans = nscans;
|
cinfo->num_scans = nscans;
|
||||||
|
|
||||||
|
|||||||
34
jcphuff.c
34
jcphuff.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jcphuff.c
|
* jcphuff.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1995-1996, Thomas G. Lane.
|
* Copyright (C) 1995-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -147,22 +147,19 @@ start_pass_phuff (j_compress_ptr cinfo, boolean gather_statistics)
|
|||||||
compptr = cinfo->cur_comp_info[ci];
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
/* Initialize DC predictions to 0 */
|
/* Initialize DC predictions to 0 */
|
||||||
entropy->last_dc_val[ci] = 0;
|
entropy->last_dc_val[ci] = 0;
|
||||||
/* Make sure requested tables are present */
|
/* Get table index */
|
||||||
/* (In gather mode, tables need not be allocated yet) */
|
|
||||||
if (is_DC_band) {
|
if (is_DC_band) {
|
||||||
if (cinfo->Ah != 0) /* DC refinement needs no table */
|
if (cinfo->Ah != 0) /* DC refinement needs no table */
|
||||||
continue;
|
continue;
|
||||||
tbl = compptr->dc_tbl_no;
|
tbl = compptr->dc_tbl_no;
|
||||||
if (tbl < 0 || tbl >= NUM_HUFF_TBLS ||
|
|
||||||
(cinfo->dc_huff_tbl_ptrs[tbl] == NULL && !gather_statistics))
|
|
||||||
ERREXIT1(cinfo,JERR_NO_HUFF_TABLE, tbl);
|
|
||||||
} else {
|
} else {
|
||||||
entropy->ac_tbl_no = tbl = compptr->ac_tbl_no;
|
entropy->ac_tbl_no = tbl = compptr->ac_tbl_no;
|
||||||
if (tbl < 0 || tbl >= NUM_HUFF_TBLS ||
|
|
||||||
(cinfo->ac_huff_tbl_ptrs[tbl] == NULL && !gather_statistics))
|
|
||||||
ERREXIT1(cinfo,JERR_NO_HUFF_TABLE, tbl);
|
|
||||||
}
|
}
|
||||||
if (gather_statistics) {
|
if (gather_statistics) {
|
||||||
|
/* Check for invalid table index */
|
||||||
|
/* (make_c_derived_tbl does this in the other path) */
|
||||||
|
if (tbl < 0 || tbl >= NUM_HUFF_TBLS)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tbl);
|
||||||
/* Allocate and zero the statistics tables */
|
/* Allocate and zero the statistics tables */
|
||||||
/* Note that jpeg_gen_optimal_table expects 257 entries in each table! */
|
/* Note that jpeg_gen_optimal_table expects 257 entries in each table! */
|
||||||
if (entropy->count_ptrs[tbl] == NULL)
|
if (entropy->count_ptrs[tbl] == NULL)
|
||||||
@@ -171,13 +168,9 @@ start_pass_phuff (j_compress_ptr cinfo, boolean gather_statistics)
|
|||||||
257 * SIZEOF(long));
|
257 * SIZEOF(long));
|
||||||
MEMZERO(entropy->count_ptrs[tbl], 257 * SIZEOF(long));
|
MEMZERO(entropy->count_ptrs[tbl], 257 * SIZEOF(long));
|
||||||
} else {
|
} else {
|
||||||
/* Compute derived values for Huffman tables */
|
/* Compute derived values for Huffman table */
|
||||||
/* We may do this more than once for a table, but it's not expensive */
|
/* We may do this more than once for a table, but it's not expensive */
|
||||||
if (is_DC_band)
|
jpeg_make_c_derived_tbl(cinfo, is_DC_band, tbl,
|
||||||
jpeg_make_c_derived_tbl(cinfo, cinfo->dc_huff_tbl_ptrs[tbl],
|
|
||||||
& entropy->derived_tbls[tbl]);
|
|
||||||
else
|
|
||||||
jpeg_make_c_derived_tbl(cinfo, cinfo->ac_huff_tbl_ptrs[tbl],
|
|
||||||
& entropy->derived_tbls[tbl]);
|
& entropy->derived_tbls[tbl]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -329,6 +322,9 @@ emit_eobrun (phuff_entropy_ptr entropy)
|
|||||||
nbits = 0;
|
nbits = 0;
|
||||||
while ((temp >>= 1))
|
while ((temp >>= 1))
|
||||||
nbits++;
|
nbits++;
|
||||||
|
/* safety check: shouldn't happen given limited correction-bit buffer */
|
||||||
|
if (nbits > 14)
|
||||||
|
ERREXIT(entropy->cinfo, JERR_HUFF_MISSING_CODE);
|
||||||
|
|
||||||
emit_symbol(entropy, entropy->ac_tbl_no, nbits << 4);
|
emit_symbol(entropy, entropy->ac_tbl_no, nbits << 4);
|
||||||
if (nbits)
|
if (nbits)
|
||||||
@@ -427,6 +423,11 @@ encode_mcu_DC_first (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
nbits++;
|
nbits++;
|
||||||
temp >>= 1;
|
temp >>= 1;
|
||||||
}
|
}
|
||||||
|
/* Check for out-of-range coefficient values.
|
||||||
|
* Since we're encoding a difference, the range limit is twice as much.
|
||||||
|
*/
|
||||||
|
if (nbits > MAX_COEF_BITS+1)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_DCT_COEF);
|
||||||
|
|
||||||
/* Count/emit the Huffman-coded symbol for the number of bits */
|
/* Count/emit the Huffman-coded symbol for the number of bits */
|
||||||
emit_symbol(entropy, compptr->dc_tbl_no, nbits);
|
emit_symbol(entropy, compptr->dc_tbl_no, nbits);
|
||||||
@@ -523,6 +524,9 @@ encode_mcu_AC_first (j_compress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
nbits = 1; /* there must be at least one 1 bit */
|
nbits = 1; /* there must be at least one 1 bit */
|
||||||
while ((temp >>= 1))
|
while ((temp >>= 1))
|
||||||
nbits++;
|
nbits++;
|
||||||
|
/* Check for out-of-range coefficient values */
|
||||||
|
if (nbits > MAX_COEF_BITS)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_DCT_COEF);
|
||||||
|
|
||||||
/* Count/emit Huffman symbol for run length / number of bits */
|
/* Count/emit Huffman symbol for run length / number of bits */
|
||||||
emit_symbol(entropy, entropy->ac_tbl_no, (r << 4) + nbits);
|
emit_symbol(entropy, entropy->ac_tbl_no, (r << 4) + nbits);
|
||||||
|
|||||||
240
jcqnt3dn.asm
Normal file
240
jcqnt3dn.asm
Normal file
@@ -0,0 +1,240 @@
|
|||||||
|
;
|
||||||
|
; jcqnt3dn.asm - sample data conversion and quantization (3DNow! & MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 23, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JFDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_flt_3dnow (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_flt_3dnow)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_flt_3dnow):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
pcmpeqw mm7,mm7
|
||||||
|
psllw mm7,7
|
||||||
|
packsswb mm7,mm7 ; mm7 = PB_CENTERJSAMPLE (0x808080..)
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [start_col]
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/2
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq mm0, MMWORD [ebx+eax*SIZEOF_JSAMPLE]
|
||||||
|
movq mm1, MMWORD [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
psubb mm0,mm7 ; mm0=(01234567)
|
||||||
|
psubb mm1,mm7 ; mm1=(89ABCDEF)
|
||||||
|
|
||||||
|
punpcklbw mm2,mm0 ; mm2=(*0*1*2*3)
|
||||||
|
punpckhbw mm0,mm0 ; mm0=(*4*5*6*7)
|
||||||
|
punpcklbw mm3,mm1 ; mm3=(*8*9*A*B)
|
||||||
|
punpckhbw mm1,mm1 ; mm1=(*C*D*E*F)
|
||||||
|
|
||||||
|
punpcklwd mm4,mm2 ; mm4=(***0***1)
|
||||||
|
punpckhwd mm2,mm2 ; mm2=(***2***3)
|
||||||
|
punpcklwd mm5,mm0 ; mm5=(***4***5)
|
||||||
|
punpckhwd mm0,mm0 ; mm0=(***6***7)
|
||||||
|
|
||||||
|
psrad mm4,(DWORD_BIT-BYTE_BIT) ; mm4=(01)
|
||||||
|
psrad mm2,(DWORD_BIT-BYTE_BIT) ; mm2=(23)
|
||||||
|
pi2fd mm4,mm4
|
||||||
|
pi2fd mm2,mm2
|
||||||
|
psrad mm5,(DWORD_BIT-BYTE_BIT) ; mm5=(45)
|
||||||
|
psrad mm0,(DWORD_BIT-BYTE_BIT) ; mm0=(67)
|
||||||
|
pi2fd mm5,mm5
|
||||||
|
pi2fd mm0,mm0
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(0,2,edi,SIZEOF_FAST_FLOAT)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(0,3,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
|
||||||
|
punpcklwd mm6,mm3 ; mm6=(***8***9)
|
||||||
|
punpckhwd mm3,mm3 ; mm3=(***A***B)
|
||||||
|
punpcklwd mm4,mm1 ; mm4=(***C***D)
|
||||||
|
punpckhwd mm1,mm1 ; mm1=(***E***F)
|
||||||
|
|
||||||
|
psrad mm6,(DWORD_BIT-BYTE_BIT) ; mm6=(89)
|
||||||
|
psrad mm3,(DWORD_BIT-BYTE_BIT) ; mm3=(AB)
|
||||||
|
pi2fd mm6,mm6
|
||||||
|
pi2fd mm3,mm3
|
||||||
|
psrad mm4,(DWORD_BIT-BYTE_BIT) ; mm4=(CD)
|
||||||
|
psrad mm1,(DWORD_BIT-BYTE_BIT) ; mm1=(EF)
|
||||||
|
pi2fd mm4,mm4
|
||||||
|
pi2fd mm1,mm1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], mm3
|
||||||
|
movq MMWORD [MMBLOCK(1,2,edi,SIZEOF_FAST_FLOAT)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(1,3,edi,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_JSAMPROW
|
||||||
|
add edi, byte 2*DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .convloop
|
||||||
|
|
||||||
|
femms ; empty MMX/3DNow! state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_flt_3dnow (JCOEFPTR coef_block, FAST_FLOAT * divisors,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; FAST_FLOAT * divisors
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_flt_3dnow)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_flt_3dnow):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov eax, 0x4B400000 ; (float)0x00C00000 (rndint_magic)
|
||||||
|
movd mm7,eax
|
||||||
|
punpckldq mm7,mm7 ; mm7={12582912.0F 12582912.0F}
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov edx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov eax, DCTSIZE2/16
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(0,1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm1, MMWORD [MMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(0,2,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(0,3,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm2, MMWORD [MMBLOCK(0,2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm3, MMWORD [MMBLOCK(0,3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
pfadd mm0,mm7 ; mm0=(00 ** 01 **)
|
||||||
|
pfadd mm1,mm7 ; mm1=(02 ** 03 **)
|
||||||
|
pfadd mm2,mm7 ; mm0=(04 ** 05 **)
|
||||||
|
pfadd mm3,mm7 ; mm1=(06 ** 07 **)
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
punpcklwd mm0,mm1 ; mm0=(00 02 ** **)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(01 03 ** **)
|
||||||
|
movq mm5,mm2
|
||||||
|
punpcklwd mm2,mm3 ; mm2=(04 06 ** **)
|
||||||
|
punpckhwd mm5,mm3 ; mm5=(05 07 ** **)
|
||||||
|
|
||||||
|
punpcklwd mm0,mm4 ; mm0=(00 01 02 03)
|
||||||
|
punpcklwd mm2,mm5 ; mm2=(04 05 06 07)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [MMBLOCK(1,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(1,1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm6, MMWORD [MMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm1, MMWORD [MMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(1,2,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm4, MMWORD [MMBLOCK(1,3,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm3, MMWORD [MMBLOCK(1,2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
pfmul mm4, MMWORD [MMBLOCK(1,3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
pfadd mm6,mm7 ; mm0=(10 ** 11 **)
|
||||||
|
pfadd mm1,mm7 ; mm4=(12 ** 13 **)
|
||||||
|
pfadd mm3,mm7 ; mm0=(14 ** 15 **)
|
||||||
|
pfadd mm4,mm7 ; mm4=(16 ** 17 **)
|
||||||
|
|
||||||
|
movq mm5,mm6
|
||||||
|
punpcklwd mm6,mm1 ; mm6=(10 12 ** **)
|
||||||
|
punpckhwd mm5,mm1 ; mm5=(11 13 ** **)
|
||||||
|
movq mm1,mm3
|
||||||
|
punpcklwd mm3,mm4 ; mm3=(14 16 ** **)
|
||||||
|
punpckhwd mm1,mm4 ; mm1=(15 17 ** **)
|
||||||
|
|
||||||
|
punpcklwd mm6,mm5 ; mm6=(10 11 12 13)
|
||||||
|
punpcklwd mm3,mm1 ; mm3=(14 15 16 17)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
|
||||||
|
add esi, byte 16*SIZEOF_FAST_FLOAT
|
||||||
|
add edx, byte 16*SIZEOF_FAST_FLOAT
|
||||||
|
add edi, byte 16*SIZEOF_JCOEF
|
||||||
|
dec eax
|
||||||
|
jnz near .quantloop
|
||||||
|
|
||||||
|
femms ; empty MMX/3DNow! state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
202
jcqntflt.asm
Normal file
202
jcqntflt.asm
Normal file
@@ -0,0 +1,202 @@
|
|||||||
|
;
|
||||||
|
; jcqntflt.asm - sample data conversion and quantization (non-SIMD, FP)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : March 21, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_float (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_float)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_float):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi] ; (JSAMPLE *)
|
||||||
|
add ebx, JDIMENSION [start_col]
|
||||||
|
|
||||||
|
%assign i 0 ; i=0
|
||||||
|
%rep 4 ; -- repeat 4 times ---
|
||||||
|
xor eax,eax
|
||||||
|
xor edx,edx
|
||||||
|
mov al, JSAMPLE [ebx+(i+0)*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [ebx+(i+1)*SIZEOF_JSAMPLE]
|
||||||
|
add eax, byte -CENTERJSAMPLE
|
||||||
|
add edx, byte -CENTERJSAMPLE
|
||||||
|
push eax
|
||||||
|
push edx
|
||||||
|
%assign i i+2 ; i+=2
|
||||||
|
%endrep ; -- repeat end ---
|
||||||
|
|
||||||
|
fild INT32 [esp+0*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+1*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+2*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+3*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+4*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+5*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+6*SIZEOF_INT32]
|
||||||
|
fild INT32 [esp+7*SIZEOF_INT32]
|
||||||
|
|
||||||
|
add esp, byte DCTSIZE*SIZEOF_INT32
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [edi+0*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+1*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+2*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+3*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+4*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+5*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+6*SIZEOF_FAST_FLOAT]
|
||||||
|
fstp FAST_FLOAT [edi+7*SIZEOF_FAST_FLOAT]
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
add edi, byte DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .convloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_float (JCOEFPTR coef_block, FAST_FLOAT * divisors,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; FAST_FLOAT * divisors
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
%define FLT_ROUNDS 1 ; from <float.h>
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_float)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_float):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; unused
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
%if (FLT_ROUNDS != 1)
|
||||||
|
push eax
|
||||||
|
fnstcw word [esp]
|
||||||
|
mov eax, [esp]
|
||||||
|
and eax, (~0x0C00) ; round to nearest integer
|
||||||
|
push eax
|
||||||
|
fldcw word [esp]
|
||||||
|
pop eax
|
||||||
|
%endif
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov ebx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov eax, DCTSIZE2/8
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
fld FAST_FLOAT [esi+0*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+0*SIZEOF_FAST_FLOAT]
|
||||||
|
fld FAST_FLOAT [esi+1*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+1*SIZEOF_FAST_FLOAT]
|
||||||
|
fld FAST_FLOAT [esi+2*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+2*SIZEOF_FAST_FLOAT]
|
||||||
|
fld FAST_FLOAT [esi+3*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+3*SIZEOF_FAST_FLOAT]
|
||||||
|
|
||||||
|
fld FAST_FLOAT [esi+4*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+4*SIZEOF_FAST_FLOAT]
|
||||||
|
fxch st0,st1
|
||||||
|
fld FAST_FLOAT [esi+5*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+5*SIZEOF_FAST_FLOAT]
|
||||||
|
fxch st0,st3
|
||||||
|
fld FAST_FLOAT [esi+6*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+6*SIZEOF_FAST_FLOAT]
|
||||||
|
fxch st0,st5
|
||||||
|
fld FAST_FLOAT [esi+7*SIZEOF_FAST_FLOAT]
|
||||||
|
fmul FAST_FLOAT [ebx+7*SIZEOF_FAST_FLOAT]
|
||||||
|
fxch st0,st7
|
||||||
|
|
||||||
|
fistp JCOEF [edi+0*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+1*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+2*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+3*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+4*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+5*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+6*SIZEOF_JCOEF]
|
||||||
|
fistp JCOEF [edi+7*SIZEOF_JCOEF]
|
||||||
|
|
||||||
|
add esi, byte 8*SIZEOF_FAST_FLOAT
|
||||||
|
add ebx, byte 8*SIZEOF_FAST_FLOAT
|
||||||
|
add edi, byte 8*SIZEOF_JCOEF
|
||||||
|
dec eax
|
||||||
|
jnz short .quantloop
|
||||||
|
|
||||||
|
%if (FLT_ROUNDS != 1)
|
||||||
|
fldcw word [esp]
|
||||||
|
pop eax ; pop old control word
|
||||||
|
%endif
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; unused
|
||||||
|
; pop ecx ; unused
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
243
jcqntint.asm
Normal file
243
jcqntint.asm
Normal file
@@ -0,0 +1,243 @@
|
|||||||
|
;
|
||||||
|
; jcqntint.asm - sample data conversion and quantization (non-SIMD, integer)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 27, 2005
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_int (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_int)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_int):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi] ; (JSAMPLE *)
|
||||||
|
add ebx, JDIMENSION [start_col]
|
||||||
|
|
||||||
|
%assign i 0 ; i=0
|
||||||
|
%rep 4 ; -- repeat 4 times ---
|
||||||
|
xor eax,eax
|
||||||
|
xor edx,edx
|
||||||
|
mov al, JSAMPLE [ebx+(i+0)*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [ebx+(i+1)*SIZEOF_JSAMPLE]
|
||||||
|
add eax, byte -CENTERJSAMPLE
|
||||||
|
add edx, byte -CENTERJSAMPLE
|
||||||
|
mov DCTELEM [edi+(i+0)*SIZEOF_DCTELEM], ax
|
||||||
|
mov DCTELEM [edi+(i+1)*SIZEOF_DCTELEM], dx
|
||||||
|
%assign i i+2 ; i+=2
|
||||||
|
%endrep ; -- repeat end ---
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
add edi, byte DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz short .convloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%ifndef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; This implementation is based on an algorithm described in
|
||||||
|
; "How to optimize for the Pentium family of microprocessors"
|
||||||
|
; (http://www.agner.org/assem/).
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_int (JCOEFPTR coef_block, DCTELEM * divisors,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define RECIPROCAL(i,b) ((b)+((i)+DCTSIZE2*0)*SIZEOF_DCTELEM)
|
||||||
|
%define CORRECTION(i,b) ((b)+((i)+DCTSIZE2*1)*SIZEOF_DCTELEM)
|
||||||
|
%define SHIFT(i,b) ((b)+((i)+DCTSIZE2*3)*SIZEOF_DCTELEM)
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; DCTELEM * divisors
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
%define UNROLL 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_int)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_int):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov ebx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov ecx, DCTSIZE2/UNROLL
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
%assign i 0 ; i=0;
|
||||||
|
%rep UNROLL ; ---- repeat (UNROLL) times ----
|
||||||
|
mov cx, DCTELEM [esi+(i)*SIZEOF_DCTELEM]
|
||||||
|
mov ax,cx
|
||||||
|
sar cx,(WORD_BIT-1)
|
||||||
|
xor ax,cx ; if (ax < 0) ax = -ax;
|
||||||
|
sub ax,cx
|
||||||
|
add ax, DCTELEM [CORRECTION(i,ebx)] ; correction + roundfactor
|
||||||
|
shl ax,1
|
||||||
|
mul DCTELEM [RECIPROCAL(i,ebx)] ; reciprocal
|
||||||
|
mov ax,cx
|
||||||
|
mov cx, DCTELEM [SHIFT(i,ebx)] ; shift
|
||||||
|
shr dx,cl
|
||||||
|
xor dx,ax
|
||||||
|
sub dx,ax
|
||||||
|
mov JCOEF [edi+(i)*SIZEOF_JCOEF], dx
|
||||||
|
%assign i i+1 ; i++;
|
||||||
|
%endrep ; ---- repeat end ----
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
add esi, byte UNROLL*SIZEOF_DCTELEM
|
||||||
|
add ebx, byte UNROLL*SIZEOF_DCTELEM
|
||||||
|
add edi, byte UNROLL*SIZEOF_JCOEF
|
||||||
|
dec ecx
|
||||||
|
jnz .quantloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%else ; JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_idiv (JCOEFPTR coef_block, DCTELEM * divisors,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; DCTELEM * divisors
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_idiv)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_idiv):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov ebx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov ecx, DCTSIZE2
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
movsx ecx, DCTELEM [esi] ; temp
|
||||||
|
mov eax,ecx
|
||||||
|
sar ecx,(DWORD_BIT-1)
|
||||||
|
xor edx,edx
|
||||||
|
mov dx, DCTELEM [ebx] ; qval
|
||||||
|
xor eax,ecx ; if (eax < 0) eax = -eax;
|
||||||
|
shr edx,1
|
||||||
|
sub eax,ecx
|
||||||
|
cmp eax,edx ; if (temp + qval/2 >= qval)
|
||||||
|
jge short .quant
|
||||||
|
; ---- if the quantized coefficient is zero
|
||||||
|
xor eax,eax
|
||||||
|
jmp short .output
|
||||||
|
alignx 16,7
|
||||||
|
.quant: ; ---- do quantization
|
||||||
|
add eax,edx
|
||||||
|
xor edx,edx
|
||||||
|
div DCTELEM [ebx] ; Q:ax,R:dx
|
||||||
|
xor ax,cx
|
||||||
|
sub ax,cx
|
||||||
|
alignx 16,7
|
||||||
|
.output:
|
||||||
|
mov JCOEF [edi], ax
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_DCTELEM
|
||||||
|
add ebx, byte SIZEOF_DCTELEM
|
||||||
|
add edi, byte SIZEOF_JCOEF
|
||||||
|
dec ecx
|
||||||
|
jnz short .quantloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; !JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
254
jcqntmmx.asm
Normal file
254
jcqntmmx.asm
Normal file
@@ -0,0 +1,254 @@
|
|||||||
|
;
|
||||||
|
; jcqntmmx.asm - sample data conversion and quantization (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 27, 2005
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_int_mmx (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_int_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_int_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
pxor mm6,mm6 ; mm6=(all 0's)
|
||||||
|
pcmpeqw mm7,mm7
|
||||||
|
psllw mm7,7 ; mm7={0xFF80 0xFF80 0xFF80 0xFF80}
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [start_col]
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq mm0, MMWORD [ebx+eax*SIZEOF_JSAMPLE] ; mm0=(01234567)
|
||||||
|
movq mm1, MMWORD [edx+eax*SIZEOF_JSAMPLE] ; mm1=(89ABCDEF)
|
||||||
|
|
||||||
|
mov ebx, JSAMPROW [esi+2*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+3*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq mm2, MMWORD [ebx+eax*SIZEOF_JSAMPLE] ; mm2=(GHIJKLMN)
|
||||||
|
movq mm3, MMWORD [edx+eax*SIZEOF_JSAMPLE] ; mm3=(OPQRSTUV)
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
punpcklbw mm0,mm6 ; mm0=(0123)
|
||||||
|
punpckhbw mm4,mm6 ; mm4=(4567)
|
||||||
|
movq mm5,mm1
|
||||||
|
punpcklbw mm1,mm6 ; mm1=(89AB)
|
||||||
|
punpckhbw mm5,mm6 ; mm5=(CDEF)
|
||||||
|
|
||||||
|
paddw mm0,mm7
|
||||||
|
paddw mm4,mm7
|
||||||
|
paddw mm1,mm7
|
||||||
|
paddw mm5,mm7
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_DCTELEM)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_DCTELEM)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_DCTELEM)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_DCTELEM)], mm5
|
||||||
|
|
||||||
|
movq mm0,mm2
|
||||||
|
punpcklbw mm2,mm6 ; mm2=(GHIJ)
|
||||||
|
punpckhbw mm0,mm6 ; mm0=(KLMN)
|
||||||
|
movq mm4,mm3
|
||||||
|
punpcklbw mm3,mm6 ; mm3=(OPQR)
|
||||||
|
punpckhbw mm4,mm6 ; mm4=(STUV)
|
||||||
|
|
||||||
|
paddw mm2,mm7
|
||||||
|
paddw mm0,mm7
|
||||||
|
paddw mm3,mm7
|
||||||
|
paddw mm4,mm7
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_DCTELEM)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_DCTELEM)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_DCTELEM)], mm3
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_DCTELEM)], mm4
|
||||||
|
|
||||||
|
add esi, byte 4*SIZEOF_JSAMPROW
|
||||||
|
add edi, byte 4*DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz short .convloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%ifndef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; This implementation is based on an algorithm described in
|
||||||
|
; "How to optimize for the Pentium family of microprocessors"
|
||||||
|
; (http://www.agner.org/assem/).
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_int_mmx (JCOEFPTR coef_block, DCTELEM * divisors,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define RECIPROCAL(m,n,b) MMBLOCK(DCTSIZE*0+(m),(n),(b),SIZEOF_DCTELEM)
|
||||||
|
%define CORRECTION(m,n,b) MMBLOCK(DCTSIZE*1+(m),(n),(b),SIZEOF_DCTELEM)
|
||||||
|
%define SCALE(m,n,b) MMBLOCK(DCTSIZE*2+(m),(n),(b),SIZEOF_DCTELEM)
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; DCTELEM * divisors
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_int_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_int_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov edx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov ah, 2
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop1:
|
||||||
|
mov al, DCTSIZE2/8/2
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop2:
|
||||||
|
movq mm2, MMWORD [MMBLOCK(0,0,esi,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(0,1,esi,SIZEOF_DCTELEM)]
|
||||||
|
movq mm0,mm2
|
||||||
|
movq mm1,mm3
|
||||||
|
psraw mm2,(WORD_BIT-1)
|
||||||
|
psraw mm3,(WORD_BIT-1)
|
||||||
|
pxor mm0,mm2
|
||||||
|
pxor mm1,mm3
|
||||||
|
psubw mm0,mm2 ; if (mm0 < 0) mm0 = -mm0;
|
||||||
|
psubw mm1,mm3 ; if (mm1 < 0) mm1 = -mm1;
|
||||||
|
|
||||||
|
; unsigned long unsigned_multiply(unsigned short x, unsigned short y)
|
||||||
|
; {
|
||||||
|
; enum { SHORT_BIT = 16 };
|
||||||
|
; signed short sx = (signed short) x;
|
||||||
|
; signed short sy = (signed short) y;
|
||||||
|
; signed long sz;
|
||||||
|
;
|
||||||
|
; sz = (long) sx * (long) sy; /* signed multiply */
|
||||||
|
;
|
||||||
|
; if (sx < 0) sz += (long) sy << SHORT_BIT;
|
||||||
|
; if (sy < 0) sz += (long) sx << SHORT_BIT;
|
||||||
|
;
|
||||||
|
; return (unsigned long) sz;
|
||||||
|
; }
|
||||||
|
|
||||||
|
paddw mm0, MMWORD [CORRECTION(0,0,edx)] ; correction + roundfactor
|
||||||
|
paddw mm1, MMWORD [CORRECTION(0,1,edx)]
|
||||||
|
psllw mm0,1
|
||||||
|
psllw mm1,1
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
pmulhw mm0, MMWORD [RECIPROCAL(0,0,edx)] ; reciprocal
|
||||||
|
pmulhw mm1, MMWORD [RECIPROCAL(0,1,edx)]
|
||||||
|
movq mm6, MMWORD [SCALE(0,0,edx)] ; scale
|
||||||
|
movq mm7, MMWORD [SCALE(0,1,edx)]
|
||||||
|
paddw mm0,mm4 ; reciprocal is always negative (MSB=1)
|
||||||
|
paddw mm1,mm5
|
||||||
|
psllw mm0,1
|
||||||
|
psllw mm1,1
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
pmulhw mm0,mm6
|
||||||
|
pmulhw mm1,mm7
|
||||||
|
psraw mm6,(WORD_BIT-1)
|
||||||
|
psraw mm7,(WORD_BIT-1)
|
||||||
|
pand mm6,mm4
|
||||||
|
pand mm7,mm5
|
||||||
|
paddw mm0,mm6
|
||||||
|
paddw mm1,mm7
|
||||||
|
psraw mm4,(WORD_BIT-1)
|
||||||
|
psraw mm5,(WORD_BIT-1)
|
||||||
|
pand mm4, MMWORD [SCALE(0,0,edx)] ; scale
|
||||||
|
pand mm5, MMWORD [SCALE(0,1,edx)]
|
||||||
|
paddw mm0,mm4
|
||||||
|
paddw mm1,mm5
|
||||||
|
|
||||||
|
pxor mm0,mm2
|
||||||
|
pxor mm1,mm3
|
||||||
|
psubw mm0,mm2
|
||||||
|
psubw mm1,mm3
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_DCTELEM)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_DCTELEM)], mm1
|
||||||
|
|
||||||
|
add esi, byte 8*SIZEOF_DCTELEM
|
||||||
|
add edx, byte 8*SIZEOF_DCTELEM
|
||||||
|
add edi, byte 8*SIZEOF_JCOEF
|
||||||
|
dec al
|
||||||
|
jnz near .quantloop2
|
||||||
|
dec ah
|
||||||
|
jnz near .quantloop1 ; to avoid branch misprediction
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; !JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
%endif ; JFDCT_INT_MMX_SUPPORTED
|
||||||
178
jcqnts2f.asm
Normal file
178
jcqnts2f.asm
Normal file
@@ -0,0 +1,178 @@
|
|||||||
|
;
|
||||||
|
; jcqnts2f.asm - sample data conversion and quantization (SSE & SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 18, 2005
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JFDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_flt_sse2 (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_flt_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_flt_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
pcmpeqw xmm7,xmm7
|
||||||
|
psllw xmm7,7
|
||||||
|
packsswb xmm7,xmm7 ; xmm7 = PB_CENTERJSAMPLE (0x808080..)
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [start_col]
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/2
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq xmm0, _MMWORD [ebx+eax*SIZEOF_JSAMPLE]
|
||||||
|
movq xmm1, _MMWORD [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
psubb xmm0,xmm7 ; xmm0=(01234567)
|
||||||
|
psubb xmm1,xmm7 ; xmm1=(89ABCDEF)
|
||||||
|
|
||||||
|
punpcklbw xmm0,xmm0 ; xmm0=(*0*1*2*3*4*5*6*7)
|
||||||
|
punpcklbw xmm1,xmm1 ; xmm1=(*8*9*A*B*C*D*E*F)
|
||||||
|
|
||||||
|
punpcklwd xmm2,xmm0 ; xmm2=(***0***1***2***3)
|
||||||
|
punpckhwd xmm0,xmm0 ; xmm0=(***4***5***6***7)
|
||||||
|
punpcklwd xmm3,xmm1 ; xmm3=(***8***9***A***B)
|
||||||
|
punpckhwd xmm1,xmm1 ; xmm1=(***C***D***E***F)
|
||||||
|
|
||||||
|
psrad xmm2,(DWORD_BIT-BYTE_BIT) ; xmm2=(0123)
|
||||||
|
psrad xmm0,(DWORD_BIT-BYTE_BIT) ; xmm0=(4567)
|
||||||
|
cvtdq2ps xmm2,xmm2 ; xmm2=(0123)
|
||||||
|
cvtdq2ps xmm0,xmm0 ; xmm0=(4567)
|
||||||
|
psrad xmm3,(DWORD_BIT-BYTE_BIT) ; xmm3=(89AB)
|
||||||
|
psrad xmm1,(DWORD_BIT-BYTE_BIT) ; xmm1=(CDEF)
|
||||||
|
cvtdq2ps xmm3,xmm3 ; xmm3=(89AB)
|
||||||
|
cvtdq2ps xmm1,xmm1 ; xmm1=(CDEF)
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], xmm2
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], xmm3
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], xmm1
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_JSAMPROW
|
||||||
|
add edi, byte 2*DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz short .convloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_flt_sse2 (JCOEFPTR coef_block, FAST_FLOAT * divisors,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; FAST_FLOAT * divisors
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_flt_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_flt_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov edx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov eax, DCTSIZE2/16
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
movaps xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(0,1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm1, XMMWORD [XMMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm2, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(1,1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm2, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm3, XMMWORD [XMMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
cvtps2dq xmm0,xmm0
|
||||||
|
cvtps2dq xmm1,xmm1
|
||||||
|
cvtps2dq xmm2,xmm2
|
||||||
|
cvtps2dq xmm3,xmm3
|
||||||
|
|
||||||
|
packssdw xmm0,xmm1
|
||||||
|
packssdw xmm2,xmm3
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_JCOEF)], xmm0
|
||||||
|
movdqa XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_JCOEF)], xmm2
|
||||||
|
|
||||||
|
add esi, byte 16*SIZEOF_FAST_FLOAT
|
||||||
|
add edx, byte 16*SIZEOF_FAST_FLOAT
|
||||||
|
add edi, byte 16*SIZEOF_JCOEF
|
||||||
|
dec eax
|
||||||
|
jnz short .quantloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
216
jcqnts2i.asm
Normal file
216
jcqnts2i.asm
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
;
|
||||||
|
; jcqnts2i.asm - sample data conversion and quantization (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 27, 2005
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_int_sse2 (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_int_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_int_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
pxor xmm6,xmm6 ; xmm6=(all 0's)
|
||||||
|
pcmpeqw xmm7,xmm7
|
||||||
|
psllw xmm7,7 ; xmm7={0xFF80 0xFF80 0xFF80 0xFF80 ..}
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [start_col]
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq xmm0, _MMWORD [ebx+eax*SIZEOF_JSAMPLE] ; xmm0=(01234567)
|
||||||
|
movq xmm1, _MMWORD [edx+eax*SIZEOF_JSAMPLE] ; xmm1=(89ABCDEF)
|
||||||
|
|
||||||
|
mov ebx, JSAMPROW [esi+2*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+3*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq xmm2, _MMWORD [ebx+eax*SIZEOF_JSAMPLE] ; xmm2=(GHIJKLMN)
|
||||||
|
movq xmm3, _MMWORD [edx+eax*SIZEOF_JSAMPLE] ; xmm3=(OPQRSTUV)
|
||||||
|
|
||||||
|
punpcklbw xmm0,xmm6 ; xmm0=(01234567)
|
||||||
|
punpcklbw xmm1,xmm6 ; xmm1=(89ABCDEF)
|
||||||
|
paddw xmm0,xmm7
|
||||||
|
paddw xmm1,xmm7
|
||||||
|
punpcklbw xmm2,xmm6 ; xmm2=(GHIJKLMN)
|
||||||
|
punpcklbw xmm3,xmm6 ; xmm3=(OPQRSTUV)
|
||||||
|
paddw xmm2,xmm7
|
||||||
|
paddw xmm3,xmm7
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_DCTELEM)], xmm0
|
||||||
|
movdqa XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_DCTELEM)], xmm1
|
||||||
|
movdqa XMMWORD [XMMBLOCK(2,0,edi,SIZEOF_DCTELEM)], xmm2
|
||||||
|
movdqa XMMWORD [XMMBLOCK(3,0,edi,SIZEOF_DCTELEM)], xmm3
|
||||||
|
|
||||||
|
add esi, byte 4*SIZEOF_JSAMPROW
|
||||||
|
add edi, byte 4*DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz short .convloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%ifndef JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; This implementation is based on an algorithm described in
|
||||||
|
; "How to optimize for the Pentium family of microprocessors"
|
||||||
|
; (http://www.agner.org/assem/).
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_int_sse2 (JCOEFPTR coef_block, DCTELEM * divisors,
|
||||||
|
; DCTELEM * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define RECIPROCAL(m,n,b) XMMBLOCK(DCTSIZE*0+(m),(n),(b),SIZEOF_DCTELEM)
|
||||||
|
%define CORRECTION(m,n,b) XMMBLOCK(DCTSIZE*1+(m),(n),(b),SIZEOF_DCTELEM)
|
||||||
|
%define SCALE(m,n,b) XMMBLOCK(DCTSIZE*2+(m),(n),(b),SIZEOF_DCTELEM)
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; DCTELEM * divisors
|
||||||
|
%define workspace ebp+16 ; DCTELEM * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_int_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_int_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov edx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov eax, DCTSIZE2/32
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
movdqa xmm4, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm5, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm6, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm7, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm0,xmm4
|
||||||
|
movdqa xmm1,xmm5
|
||||||
|
movdqa xmm2,xmm6
|
||||||
|
movdqa xmm3,xmm7
|
||||||
|
psraw xmm4,(WORD_BIT-1)
|
||||||
|
psraw xmm5,(WORD_BIT-1)
|
||||||
|
psraw xmm6,(WORD_BIT-1)
|
||||||
|
psraw xmm7,(WORD_BIT-1)
|
||||||
|
pxor xmm0,xmm4
|
||||||
|
pxor xmm1,xmm5
|
||||||
|
pxor xmm2,xmm6
|
||||||
|
pxor xmm3,xmm7
|
||||||
|
psubw xmm0,xmm4 ; if (xmm0 < 0) xmm0 = -xmm0;
|
||||||
|
psubw xmm1,xmm5 ; if (xmm1 < 0) xmm1 = -xmm1;
|
||||||
|
psubw xmm2,xmm6 ; if (xmm2 < 0) xmm2 = -xmm2;
|
||||||
|
psubw xmm3,xmm7 ; if (xmm3 < 0) xmm3 = -xmm3;
|
||||||
|
|
||||||
|
paddw xmm0, XMMWORD [CORRECTION(0,0,edx)] ; correction + roundfactor
|
||||||
|
paddw xmm1, XMMWORD [CORRECTION(1,0,edx)]
|
||||||
|
paddw xmm2, XMMWORD [CORRECTION(2,0,edx)]
|
||||||
|
paddw xmm3, XMMWORD [CORRECTION(3,0,edx)]
|
||||||
|
psllw xmm0,1
|
||||||
|
psllw xmm1,1
|
||||||
|
psllw xmm2,1
|
||||||
|
psllw xmm3,1
|
||||||
|
pmulhuw xmm0, XMMWORD [RECIPROCAL(0,0,edx)] ; reciprocal
|
||||||
|
pmulhuw xmm1, XMMWORD [RECIPROCAL(1,0,edx)]
|
||||||
|
pmulhuw xmm2, XMMWORD [RECIPROCAL(2,0,edx)]
|
||||||
|
pmulhuw xmm3, XMMWORD [RECIPROCAL(3,0,edx)]
|
||||||
|
psllw xmm0,1
|
||||||
|
psllw xmm1,1
|
||||||
|
psllw xmm2,1
|
||||||
|
psllw xmm3,1
|
||||||
|
pmulhuw xmm0, XMMWORD [SCALE(0,0,edx)] ; scale
|
||||||
|
pmulhuw xmm1, XMMWORD [SCALE(1,0,edx)]
|
||||||
|
pmulhuw xmm2, XMMWORD [SCALE(2,0,edx)]
|
||||||
|
pmulhuw xmm3, XMMWORD [SCALE(3,0,edx)]
|
||||||
|
|
||||||
|
pxor xmm0,xmm4
|
||||||
|
pxor xmm1,xmm5
|
||||||
|
pxor xmm2,xmm6
|
||||||
|
pxor xmm3,xmm7
|
||||||
|
psubw xmm0,xmm4
|
||||||
|
psubw xmm1,xmm5
|
||||||
|
psubw xmm2,xmm6
|
||||||
|
psubw xmm3,xmm7
|
||||||
|
movdqa XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_DCTELEM)], xmm0
|
||||||
|
movdqa XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_DCTELEM)], xmm1
|
||||||
|
movdqa XMMWORD [XMMBLOCK(2,0,edi,SIZEOF_DCTELEM)], xmm2
|
||||||
|
movdqa XMMWORD [XMMBLOCK(3,0,edi,SIZEOF_DCTELEM)], xmm3
|
||||||
|
|
||||||
|
add esi, byte 32*SIZEOF_DCTELEM
|
||||||
|
add edx, byte 32*SIZEOF_DCTELEM
|
||||||
|
add edi, byte 32*SIZEOF_JCOEF
|
||||||
|
dec eax
|
||||||
|
jnz near .quantloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; !JFDCT_INT_QUANTIZE_WITH_DIVISION
|
||||||
|
%endif ; JFDCT_INT_SSE2_SUPPORTED
|
||||||
218
jcqntsse.asm
Normal file
218
jcqntsse.asm
Normal file
@@ -0,0 +1,218 @@
|
|||||||
|
;
|
||||||
|
; jcqntsse.asm - sample data conversion and quantization (SSE & MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 12, 2005
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JFDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Load data into workspace, applying unsigned->signed conversion
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_convsamp_flt_sse (JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define sample_data ebp+8 ; JSAMPARRAY sample_data
|
||||||
|
%define start_col ebp+12 ; JDIMENSION start_col
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_convsamp_flt_sse)
|
||||||
|
|
||||||
|
EXTN(jpeg_convsamp_flt_sse):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
pcmpeqw mm7,mm7
|
||||||
|
psllw mm7,7
|
||||||
|
packsswb mm7,mm7 ; mm7 = PB_CENTERJSAMPLE (0x808080..)
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [start_col]
|
||||||
|
mov edi, POINTER [workspace] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/2
|
||||||
|
alignx 16,7
|
||||||
|
.convloop:
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
mov edx, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
movq mm0, MMWORD [ebx+eax*SIZEOF_JSAMPLE]
|
||||||
|
movq mm1, MMWORD [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
psubb mm0,mm7 ; mm0=(01234567)
|
||||||
|
psubb mm1,mm7 ; mm1=(89ABCDEF)
|
||||||
|
|
||||||
|
punpcklbw mm2,mm0 ; mm2=(*0*1*2*3)
|
||||||
|
punpckhbw mm0,mm0 ; mm0=(*4*5*6*7)
|
||||||
|
punpcklbw mm3,mm1 ; mm3=(*8*9*A*B)
|
||||||
|
punpckhbw mm1,mm1 ; mm1=(*C*D*E*F)
|
||||||
|
|
||||||
|
punpcklwd mm4,mm2 ; mm4=(***0***1)
|
||||||
|
punpckhwd mm2,mm2 ; mm2=(***2***3)
|
||||||
|
punpcklwd mm5,mm0 ; mm5=(***4***5)
|
||||||
|
punpckhwd mm0,mm0 ; mm0=(***6***7)
|
||||||
|
|
||||||
|
psrad mm4,(DWORD_BIT-BYTE_BIT) ; mm4=(01)
|
||||||
|
psrad mm2,(DWORD_BIT-BYTE_BIT) ; mm2=(23)
|
||||||
|
cvtpi2ps xmm0,mm4 ; xmm0=(01**)
|
||||||
|
cvtpi2ps xmm1,mm2 ; xmm1=(23**)
|
||||||
|
psrad mm5,(DWORD_BIT-BYTE_BIT) ; mm5=(45)
|
||||||
|
psrad mm0,(DWORD_BIT-BYTE_BIT) ; mm0=(67)
|
||||||
|
cvtpi2ps xmm2,mm5 ; xmm2=(45**)
|
||||||
|
cvtpi2ps xmm3,mm0 ; xmm3=(67**)
|
||||||
|
|
||||||
|
punpcklwd mm6,mm3 ; mm6=(***8***9)
|
||||||
|
punpckhwd mm3,mm3 ; mm3=(***A***B)
|
||||||
|
punpcklwd mm4,mm1 ; mm4=(***C***D)
|
||||||
|
punpckhwd mm1,mm1 ; mm1=(***E***F)
|
||||||
|
|
||||||
|
psrad mm6,(DWORD_BIT-BYTE_BIT) ; mm6=(89)
|
||||||
|
psrad mm3,(DWORD_BIT-BYTE_BIT) ; mm3=(AB)
|
||||||
|
cvtpi2ps xmm4,mm6 ; xmm4=(89**)
|
||||||
|
cvtpi2ps xmm5,mm3 ; xmm5=(AB**)
|
||||||
|
psrad mm4,(DWORD_BIT-BYTE_BIT) ; mm4=(CD)
|
||||||
|
psrad mm1,(DWORD_BIT-BYTE_BIT) ; mm1=(EF)
|
||||||
|
cvtpi2ps xmm6,mm4 ; xmm6=(CD**)
|
||||||
|
cvtpi2ps xmm7,mm1 ; xmm7=(EF**)
|
||||||
|
|
||||||
|
movlhps xmm0,xmm1 ; xmm0=(0123)
|
||||||
|
movlhps xmm2,xmm3 ; xmm2=(4567)
|
||||||
|
movlhps xmm4,xmm5 ; xmm4=(89AB)
|
||||||
|
movlhps xmm6,xmm7 ; xmm6=(CDEF)
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], xmm2
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], xmm4
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], xmm6
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_JSAMPROW
|
||||||
|
add edi, byte 2*DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .convloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Quantize/descale the coefficients, and store into coef_block
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_quantize_flt_sse (JCOEFPTR coef_block, FAST_FLOAT * divisors,
|
||||||
|
; FAST_FLOAT * workspace);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define coef_block ebp+8 ; JCOEFPTR coef_block
|
||||||
|
%define divisors ebp+12 ; FAST_FLOAT * divisors
|
||||||
|
%define workspace ebp+16 ; FAST_FLOAT * workspace
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_quantize_flt_sse)
|
||||||
|
|
||||||
|
EXTN(jpeg_quantize_flt_sse):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov esi, POINTER [workspace]
|
||||||
|
mov edx, POINTER [divisors]
|
||||||
|
mov edi, JCOEFPTR [coef_block]
|
||||||
|
mov eax, DCTSIZE2/16
|
||||||
|
alignx 16,7
|
||||||
|
.quantloop:
|
||||||
|
movaps xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(0,1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm1, XMMWORD [XMMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm2, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(1,1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm2, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
mulps xmm3, XMMWORD [XMMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
movhlps xmm4,xmm0
|
||||||
|
movhlps xmm5,xmm1
|
||||||
|
|
||||||
|
cvtps2pi mm0,xmm0
|
||||||
|
cvtps2pi mm1,xmm1
|
||||||
|
cvtps2pi mm4,xmm4
|
||||||
|
cvtps2pi mm5,xmm5
|
||||||
|
|
||||||
|
movhlps xmm6,xmm2
|
||||||
|
movhlps xmm7,xmm3
|
||||||
|
|
||||||
|
cvtps2pi mm2,xmm2
|
||||||
|
cvtps2pi mm3,xmm3
|
||||||
|
cvtps2pi mm6,xmm6
|
||||||
|
cvtps2pi mm7,xmm7
|
||||||
|
|
||||||
|
packssdw mm0,mm4
|
||||||
|
packssdw mm1,mm5
|
||||||
|
packssdw mm2,mm6
|
||||||
|
packssdw mm3,mm7
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
|
||||||
|
add esi, byte 16*SIZEOF_FAST_FLOAT
|
||||||
|
add edx, byte 16*SIZEOF_FAST_FLOAT
|
||||||
|
add edi, byte 16*SIZEOF_JCOEF
|
||||||
|
dec eax
|
||||||
|
jnz short .quantloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
328
jcsammmx.asm
Normal file
328
jcsammmx.asm
Normal file
@@ -0,0 +1,328 @@
|
|||||||
|
;
|
||||||
|
; jcsammmx.asm - downsampling (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 23, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%ifdef JCSAMPLE_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Downsample pixel values of a single component.
|
||||||
|
; This version handles the common case of 2:1 horizontal and 1:1 vertical,
|
||||||
|
; without smoothing.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_downsample_mmx (j_compress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data, JSAMPARRAY output_data);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_compress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data(b) (b)+20 ; JSAMPARRAY output_data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_downsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_downsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov ecx, POINTER [compptr(ebp)]
|
||||||
|
mov ecx, JDIMENSION [jcompinfo_width_in_blocks(ecx)]
|
||||||
|
shl ecx,3 ; imul ecx,DCTSIZE (ecx = output_cols)
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jcstruct_image_width(edx)]
|
||||||
|
|
||||||
|
; -- expand_right_edge
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
shl ecx,1 ; output_cols * 2
|
||||||
|
sub ecx,edx
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, INT [jcstruct_max_v_samp_factor(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
cld
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
alignx 16,7
|
||||||
|
.expandloop:
|
||||||
|
push eax
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPROW [esi]
|
||||||
|
add edi,edx
|
||||||
|
mov al, JSAMPLE [edi-1]
|
||||||
|
|
||||||
|
rep stosb
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
dec eax
|
||||||
|
jg short .expandloop
|
||||||
|
|
||||||
|
.expand_end:
|
||||||
|
pop ecx ; output_cols
|
||||||
|
|
||||||
|
; -- h2v1_downsample
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_v_samp_factor(eax)] ; rowctr
|
||||||
|
test eax,eax
|
||||||
|
jle short .return
|
||||||
|
|
||||||
|
mov edx, 0x00010000 ; bias pattern
|
||||||
|
movd mm7,edx
|
||||||
|
pcmpeqw mm6,mm6
|
||||||
|
punpckldq mm7,mm7 ; mm7={0, 1, 0, 1}
|
||||||
|
psrlw mm6,BYTE_BIT ; mm6={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, JSAMPARRAY [output_data(ebp)] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mm1, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
movq mm2,mm0
|
||||||
|
movq mm3,mm1
|
||||||
|
|
||||||
|
pand mm0,mm6
|
||||||
|
psrlw mm2,BYTE_BIT
|
||||||
|
pand mm1,mm6
|
||||||
|
psrlw mm3,BYTE_BIT
|
||||||
|
|
||||||
|
paddw mm0,mm2
|
||||||
|
paddw mm1,mm3
|
||||||
|
paddw mm0,mm7
|
||||||
|
paddw mm1,mm7
|
||||||
|
psrlw mm0,1
|
||||||
|
psrlw mm1,1
|
||||||
|
|
||||||
|
packuswb mm0,mm1
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm0
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_MMWORD ; inptr
|
||||||
|
add edi, byte 1*SIZEOF_MMWORD ; outptr
|
||||||
|
sub ecx, byte SIZEOF_MMWORD ; outcol
|
||||||
|
jnz short .columnloop
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec eax ; rowctr
|
||||||
|
jg short .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Downsample pixel values of a single component.
|
||||||
|
; This version handles the standard case of 2:1 horizontal and 2:1 vertical,
|
||||||
|
; without smoothing.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_downsample_mmx (j_compress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data, JSAMPARRAY output_data);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_compress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data(b) (b)+20 ; JSAMPARRAY output_data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_downsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_downsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov ecx, POINTER [compptr(ebp)]
|
||||||
|
mov ecx, JDIMENSION [jcompinfo_width_in_blocks(ecx)]
|
||||||
|
shl ecx,3 ; imul ecx,DCTSIZE (ecx = output_cols)
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jcstruct_image_width(edx)]
|
||||||
|
|
||||||
|
; -- expand_right_edge
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
shl ecx,1 ; output_cols * 2
|
||||||
|
sub ecx,edx
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, INT [jcstruct_max_v_samp_factor(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
cld
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
alignx 16,7
|
||||||
|
.expandloop:
|
||||||
|
push eax
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPROW [esi]
|
||||||
|
add edi,edx
|
||||||
|
mov al, JSAMPLE [edi-1]
|
||||||
|
|
||||||
|
rep stosb
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
dec eax
|
||||||
|
jg short .expandloop
|
||||||
|
|
||||||
|
.expand_end:
|
||||||
|
pop ecx ; output_cols
|
||||||
|
|
||||||
|
; -- h2v2_downsample
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_v_samp_factor(eax)] ; rowctr
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
|
||||||
|
mov edx, 0x00020001 ; bias pattern
|
||||||
|
movd mm7,edx
|
||||||
|
pcmpeqw mm6,mm6
|
||||||
|
punpckldq mm7,mm7 ; mm7={1, 2, 1, 2}
|
||||||
|
psrlw mm6,BYTE_BIT ; mm6={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, JSAMPARRAY [output_data(ebp)] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [edx+0*SIZEOF_MMWORD]
|
||||||
|
movq mm1, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mm2, MMWORD [edx+1*SIZEOF_MMWORD]
|
||||||
|
movq mm3, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
pand mm0,mm6
|
||||||
|
psrlw mm4,BYTE_BIT
|
||||||
|
pand mm1,mm6
|
||||||
|
psrlw mm5,BYTE_BIT
|
||||||
|
paddw mm0,mm4
|
||||||
|
paddw mm1,mm5
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm5,mm3
|
||||||
|
pand mm2,mm6
|
||||||
|
psrlw mm4,BYTE_BIT
|
||||||
|
pand mm3,mm6
|
||||||
|
psrlw mm5,BYTE_BIT
|
||||||
|
paddw mm2,mm4
|
||||||
|
paddw mm3,mm5
|
||||||
|
|
||||||
|
paddw mm0,mm1
|
||||||
|
paddw mm2,mm3
|
||||||
|
paddw mm0,mm7
|
||||||
|
paddw mm2,mm7
|
||||||
|
psrlw mm0,2
|
||||||
|
psrlw mm2,2
|
||||||
|
|
||||||
|
packuswb mm0,mm2
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm0
|
||||||
|
|
||||||
|
add edx, byte 2*SIZEOF_MMWORD ; inptr0
|
||||||
|
add esi, byte 2*SIZEOF_MMWORD ; inptr1
|
||||||
|
add edi, byte 1*SIZEOF_MMWORD ; outptr
|
||||||
|
sub ecx, byte SIZEOF_MMWORD ; outcol
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 1*SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec eax ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JCSAMPLE_MMX_SUPPORTED
|
||||||
51
jcsample.c
51
jcsample.c
@@ -5,6 +5,13 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 5, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains downsampling routines.
|
* This file contains downsampling routines.
|
||||||
*
|
*
|
||||||
* Downsampling input data is counted in "row groups". A row group
|
* Downsampling input data is counted in "row groups". A row group
|
||||||
@@ -48,6 +55,7 @@
|
|||||||
#define JPEG_INTERNALS
|
#define JPEG_INTERNALS
|
||||||
#include "jinclude.h"
|
#include "jinclude.h"
|
||||||
#include "jpeglib.h"
|
#include "jpeglib.h"
|
||||||
|
#include "jcolsamp.h" /* Private declarations */
|
||||||
|
|
||||||
|
|
||||||
/* Pointer to routine to downsample a single component */
|
/* Pointer to routine to downsample a single component */
|
||||||
@@ -467,6 +475,7 @@ jinit_downsampler (j_compress_ptr cinfo)
|
|||||||
int ci;
|
int ci;
|
||||||
jpeg_component_info * compptr;
|
jpeg_component_info * compptr;
|
||||||
boolean smoothok = TRUE;
|
boolean smoothok = TRUE;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
downsample = (my_downsample_ptr)
|
downsample = (my_downsample_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -494,6 +503,16 @@ jinit_downsampler (j_compress_ptr cinfo)
|
|||||||
} else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
|
} else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
|
||||||
compptr->v_samp_factor == cinfo->max_v_samp_factor) {
|
compptr->v_samp_factor == cinfo->max_v_samp_factor) {
|
||||||
smoothok = FALSE;
|
smoothok = FALSE;
|
||||||
|
#ifdef JCSAMPLE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2)
|
||||||
|
downsample->methods[ci] = jpeg_h2v1_downsample_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JCSAMPLE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
downsample->methods[ci] = jpeg_h2v1_downsample_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
downsample->methods[ci] = h2v1_downsample;
|
downsample->methods[ci] = h2v1_downsample;
|
||||||
} else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
|
} else if (compptr->h_samp_factor * 2 == cinfo->max_h_samp_factor &&
|
||||||
compptr->v_samp_factor * 2 == cinfo->max_v_samp_factor) {
|
compptr->v_samp_factor * 2 == cinfo->max_v_samp_factor) {
|
||||||
@@ -502,6 +521,16 @@ jinit_downsampler (j_compress_ptr cinfo)
|
|||||||
downsample->methods[ci] = h2v2_smooth_downsample;
|
downsample->methods[ci] = h2v2_smooth_downsample;
|
||||||
downsample->pub.need_context_rows = TRUE;
|
downsample->pub.need_context_rows = TRUE;
|
||||||
} else
|
} else
|
||||||
|
#endif
|
||||||
|
#ifdef JCSAMPLE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2)
|
||||||
|
downsample->methods[ci] = jpeg_h2v2_downsample_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JCSAMPLE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
downsample->methods[ci] = jpeg_h2v2_downsample_mmx;
|
||||||
|
else
|
||||||
#endif
|
#endif
|
||||||
downsample->methods[ci] = h2v2_downsample;
|
downsample->methods[ci] = h2v2_downsample;
|
||||||
} else if ((cinfo->max_h_samp_factor % compptr->h_samp_factor) == 0 &&
|
} else if ((cinfo->max_h_samp_factor % compptr->h_samp_factor) == 0 &&
|
||||||
@@ -517,3 +546,25 @@ jinit_downsampler (j_compress_ptr cinfo)
|
|||||||
TRACEMS(cinfo, 0, JTRC_SMOOTH_NOTIMPL);
|
TRACEMS(cinfo, 0, JTRC_SMOOTH_NOTIMPL);
|
||||||
#endif
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_downsampler (j_compress_ptr cinfo)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
#ifdef JCSAMPLE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2)
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JCSAMPLE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
return JSIMD_NONE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|||||||
355
jcsamss2.asm
Normal file
355
jcsamss2.asm
Normal file
@@ -0,0 +1,355 @@
|
|||||||
|
;
|
||||||
|
; jcsamss2.asm - downsampling (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : January 23, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%ifdef JCSAMPLE_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Downsample pixel values of a single component.
|
||||||
|
; This version handles the common case of 2:1 horizontal and 1:1 vertical,
|
||||||
|
; without smoothing.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_downsample_sse2 (j_compress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data, JSAMPARRAY output_data);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_compress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data(b) (b)+20 ; JSAMPARRAY output_data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_downsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_downsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov ecx, POINTER [compptr(ebp)]
|
||||||
|
mov ecx, JDIMENSION [jcompinfo_width_in_blocks(ecx)]
|
||||||
|
shl ecx,3 ; imul ecx,DCTSIZE (ecx = output_cols)
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jcstruct_image_width(edx)]
|
||||||
|
|
||||||
|
; -- expand_right_edge
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
shl ecx,1 ; output_cols * 2
|
||||||
|
sub ecx,edx
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, INT [jcstruct_max_v_samp_factor(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
cld
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
alignx 16,7
|
||||||
|
.expandloop:
|
||||||
|
push eax
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPROW [esi]
|
||||||
|
add edi,edx
|
||||||
|
mov al, JSAMPLE [edi-1]
|
||||||
|
|
||||||
|
rep stosb
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
dec eax
|
||||||
|
jg short .expandloop
|
||||||
|
|
||||||
|
.expand_end:
|
||||||
|
pop ecx ; output_cols
|
||||||
|
|
||||||
|
; -- h2v1_downsample
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_v_samp_factor(eax)] ; rowctr
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
|
||||||
|
mov edx, 0x00010000 ; bias pattern
|
||||||
|
movd xmm7,edx
|
||||||
|
pcmpeqw xmm6,xmm6
|
||||||
|
pshufd xmm7,xmm7,0x00 ; xmm7={0, 1, 0, 1, 0, 1, 0, 1}
|
||||||
|
psrlw xmm6,BYTE_BIT ; xmm6={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, JSAMPARRAY [output_data(ebp)] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jae short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop_r8:
|
||||||
|
movdqa xmm0, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
pxor xmm1,xmm1
|
||||||
|
mov ecx, SIZEOF_XMMWORD
|
||||||
|
jmp short .downsample
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movdqa xmm0, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm1, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
.downsample:
|
||||||
|
movdqa xmm2,xmm0
|
||||||
|
movdqa xmm3,xmm1
|
||||||
|
|
||||||
|
pand xmm0,xmm6
|
||||||
|
psrlw xmm2,BYTE_BIT
|
||||||
|
pand xmm1,xmm6
|
||||||
|
psrlw xmm3,BYTE_BIT
|
||||||
|
|
||||||
|
paddw xmm0,xmm2
|
||||||
|
paddw xmm1,xmm3
|
||||||
|
paddw xmm0,xmm7
|
||||||
|
paddw xmm1,xmm7
|
||||||
|
psrlw xmm0,1
|
||||||
|
psrlw xmm1,1
|
||||||
|
|
||||||
|
packuswb xmm0,xmm1
|
||||||
|
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm0
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD ; outcol
|
||||||
|
add esi, byte 2*SIZEOF_XMMWORD ; inptr
|
||||||
|
add edi, byte 1*SIZEOF_XMMWORD ; outptr
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jae short .columnloop
|
||||||
|
test ecx,ecx
|
||||||
|
jnz short .columnloop_r8
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec eax ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Downsample pixel values of a single component.
|
||||||
|
; This version handles the standard case of 2:1 horizontal and 2:1 vertical,
|
||||||
|
; without smoothing.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_downsample_sse2 (j_compress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data, JSAMPARRAY output_data);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_compress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data(b) (b)+20 ; JSAMPARRAY output_data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_downsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_downsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov ecx, POINTER [compptr(ebp)]
|
||||||
|
mov ecx, JDIMENSION [jcompinfo_width_in_blocks(ecx)]
|
||||||
|
shl ecx,3 ; imul ecx,DCTSIZE (ecx = output_cols)
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jcstruct_image_width(edx)]
|
||||||
|
|
||||||
|
; -- expand_right_edge
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
shl ecx,1 ; output_cols * 2
|
||||||
|
sub ecx,edx
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, INT [jcstruct_max_v_samp_factor(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle short .expand_end
|
||||||
|
|
||||||
|
cld
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
alignx 16,7
|
||||||
|
.expandloop:
|
||||||
|
push eax
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPROW [esi]
|
||||||
|
add edi,edx
|
||||||
|
mov al, JSAMPLE [edi-1]
|
||||||
|
|
||||||
|
rep stosb
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
dec eax
|
||||||
|
jg short .expandloop
|
||||||
|
|
||||||
|
.expand_end:
|
||||||
|
pop ecx ; output_cols
|
||||||
|
|
||||||
|
; -- h2v2_downsample
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_v_samp_factor(eax)] ; rowctr
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
|
||||||
|
mov edx, 0x00020001 ; bias pattern
|
||||||
|
movd xmm7,edx
|
||||||
|
pcmpeqw xmm6,xmm6
|
||||||
|
pshufd xmm7,xmm7,0x00 ; xmm7={1, 2, 1, 2, 1, 2, 1, 2}
|
||||||
|
psrlw xmm6,BYTE_BIT ; xmm6={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, JSAMPARRAY [output_data(ebp)] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jae short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop_r8:
|
||||||
|
movdqa xmm0, XMMWORD [edx+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm1, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
pxor xmm2,xmm2
|
||||||
|
pxor xmm3,xmm3
|
||||||
|
mov ecx, SIZEOF_XMMWORD
|
||||||
|
jmp short .downsample
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movdqa xmm0, XMMWORD [edx+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm1, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm2, XMMWORD [edx+1*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm3, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
.downsample:
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
pand xmm0,xmm6
|
||||||
|
psrlw xmm4,BYTE_BIT
|
||||||
|
pand xmm1,xmm6
|
||||||
|
psrlw xmm5,BYTE_BIT
|
||||||
|
paddw xmm0,xmm4
|
||||||
|
paddw xmm1,xmm5
|
||||||
|
|
||||||
|
movdqa xmm4,xmm2
|
||||||
|
movdqa xmm5,xmm3
|
||||||
|
pand xmm2,xmm6
|
||||||
|
psrlw xmm4,BYTE_BIT
|
||||||
|
pand xmm3,xmm6
|
||||||
|
psrlw xmm5,BYTE_BIT
|
||||||
|
paddw xmm2,xmm4
|
||||||
|
paddw xmm3,xmm5
|
||||||
|
|
||||||
|
paddw xmm0,xmm1
|
||||||
|
paddw xmm2,xmm3
|
||||||
|
paddw xmm0,xmm7
|
||||||
|
paddw xmm2,xmm7
|
||||||
|
psrlw xmm0,2
|
||||||
|
psrlw xmm2,2
|
||||||
|
|
||||||
|
packuswb xmm0,xmm2
|
||||||
|
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm0
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD ; outcol
|
||||||
|
add edx, byte 2*SIZEOF_XMMWORD ; inptr0
|
||||||
|
add esi, byte 2*SIZEOF_XMMWORD ; inptr1
|
||||||
|
add edi, byte 1*SIZEOF_XMMWORD ; outptr
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jae near .columnloop
|
||||||
|
test ecx,ecx
|
||||||
|
jnz near .columnloop_r8
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 1*SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec eax ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JCSAMPLE_SSE2_SUPPORTED
|
||||||
21
jctrans.c
21
jctrans.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jctrans.c
|
* jctrans.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1995-1996, Thomas G. Lane.
|
* Copyright (C) 1995-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -129,6 +129,23 @@ jpeg_copy_critical_parameters (j_decompress_ptr srcinfo,
|
|||||||
* instead we rely on jpeg_set_colorspace to have made a suitable choice.
|
* instead we rely on jpeg_set_colorspace to have made a suitable choice.
|
||||||
*/
|
*/
|
||||||
}
|
}
|
||||||
|
/* Also copy JFIF version and resolution information, if available.
|
||||||
|
* Strictly speaking this isn't "critical" info, but it's nearly
|
||||||
|
* always appropriate to copy it if available. In particular,
|
||||||
|
* if the application chooses to copy JFIF 1.02 extension markers from
|
||||||
|
* the source file, we need to copy the version to make sure we don't
|
||||||
|
* emit a file that has 1.02 extensions but a claimed version of 1.01.
|
||||||
|
* We will *not*, however, copy version info from mislabeled "2.01" files.
|
||||||
|
*/
|
||||||
|
if (srcinfo->saw_JFIF_marker) {
|
||||||
|
if (srcinfo->JFIF_major_version == 1) {
|
||||||
|
dstinfo->JFIF_major_version = srcinfo->JFIF_major_version;
|
||||||
|
dstinfo->JFIF_minor_version = srcinfo->JFIF_minor_version;
|
||||||
|
}
|
||||||
|
dstinfo->density_unit = srcinfo->density_unit;
|
||||||
|
dstinfo->X_density = srcinfo->X_density;
|
||||||
|
dstinfo->Y_density = srcinfo->Y_density;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -170,7 +187,7 @@ transencode_master_selection (j_compress_ptr cinfo,
|
|||||||
/* We can now tell the memory manager to allocate virtual arrays. */
|
/* We can now tell the memory manager to allocate virtual arrays. */
|
||||||
(*cinfo->mem->realize_virt_arrays) ((j_common_ptr) cinfo);
|
(*cinfo->mem->realize_virt_arrays) ((j_common_ptr) cinfo);
|
||||||
|
|
||||||
/* Write the datastream header (SOI) immediately.
|
/* Write the datastream header (SOI, JFIF) immediately.
|
||||||
* Frame and scan headers are postponed till later.
|
* Frame and scan headers are postponed till later.
|
||||||
* This lets application insert special markers after the SOI.
|
* This lets application insert special markers after the SOI.
|
||||||
*/
|
*/
|
||||||
|
|||||||
29
jdapimin.c
29
jdapimin.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jdapimin.c
|
* jdapimin.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1996, Thomas G. Lane.
|
* Copyright (C) 1994-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -39,13 +39,18 @@ jpeg_CreateDecompress (j_decompress_ptr cinfo, int version, size_t structsize)
|
|||||||
ERREXIT2(cinfo, JERR_BAD_STRUCT_SIZE,
|
ERREXIT2(cinfo, JERR_BAD_STRUCT_SIZE,
|
||||||
(int) SIZEOF(struct jpeg_decompress_struct), (int) structsize);
|
(int) SIZEOF(struct jpeg_decompress_struct), (int) structsize);
|
||||||
|
|
||||||
/* For debugging purposes, zero the whole master structure.
|
/* For debugging purposes, we zero the whole master structure.
|
||||||
* But error manager pointer is already there, so save and restore it.
|
* But the application has already set the err pointer, and may have set
|
||||||
|
* client_data, so we have to save and restore those fields.
|
||||||
|
* Note: if application hasn't set client_data, tools like Purify may
|
||||||
|
* complain here.
|
||||||
*/
|
*/
|
||||||
{
|
{
|
||||||
struct jpeg_error_mgr * err = cinfo->err;
|
struct jpeg_error_mgr * err = cinfo->err;
|
||||||
|
void * client_data = cinfo->client_data; /* ignore Purify complaint here */
|
||||||
MEMZERO(cinfo, SIZEOF(struct jpeg_decompress_struct));
|
MEMZERO(cinfo, SIZEOF(struct jpeg_decompress_struct));
|
||||||
cinfo->err = err;
|
cinfo->err = err;
|
||||||
|
cinfo->client_data = client_data;
|
||||||
}
|
}
|
||||||
cinfo->is_decompressor = TRUE;
|
cinfo->is_decompressor = TRUE;
|
||||||
|
|
||||||
@@ -67,6 +72,7 @@ jpeg_CreateDecompress (j_decompress_ptr cinfo, int version, size_t structsize)
|
|||||||
/* Initialize marker processor so application can override methods
|
/* Initialize marker processor so application can override methods
|
||||||
* for COM, APPn markers before calling jpeg_read_header.
|
* for COM, APPn markers before calling jpeg_read_header.
|
||||||
*/
|
*/
|
||||||
|
cinfo->marker_list = NULL;
|
||||||
jinit_marker_reader(cinfo);
|
jinit_marker_reader(cinfo);
|
||||||
|
|
||||||
/* And initialize the overall input controller. */
|
/* And initialize the overall input controller. */
|
||||||
@@ -100,23 +106,6 @@ jpeg_abort_decompress (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Install a special processing method for COM or APPn markers.
|
|
||||||
*/
|
|
||||||
|
|
||||||
GLOBAL(void)
|
|
||||||
jpeg_set_marker_processor (j_decompress_ptr cinfo, int marker_code,
|
|
||||||
jpeg_marker_parser_method routine)
|
|
||||||
{
|
|
||||||
if (marker_code == JPEG_COM)
|
|
||||||
cinfo->marker->process_COM = routine;
|
|
||||||
else if (marker_code >= JPEG_APP0 && marker_code <= JPEG_APP0+15)
|
|
||||||
cinfo->marker->process_APPn[marker_code-JPEG_APP0] = routine;
|
|
||||||
else
|
|
||||||
ERREXIT1(cinfo, JERR_UNKNOWN_MARKER, marker_code);
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Set default decompression parameters.
|
* Set default decompression parameters.
|
||||||
*/
|
*/
|
||||||
|
|||||||
162
jdcoefct.c
162
jdcoefct.c
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* jdcoefct.c
|
* jdcoefct.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1996, Thomas G. Lane.
|
* Copyright (C) 1994-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified to improve performance.
|
||||||
|
* Last Modified : December 18, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains the coefficient buffer controller for decompression.
|
* This file contains the coefficient buffer controller for decompression.
|
||||||
* This controller is the top level of the JPEG decompressor proper.
|
* This controller is the top level of the JPEG decompressor proper.
|
||||||
* The coefficient buffer lies between entropy decoding and inverse-DCT steps.
|
* The coefficient buffer lies between entropy decoding and inverse-DCT steps.
|
||||||
@@ -133,14 +140,19 @@ start_output_pass (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef NEED_FAR_POINTERS
|
||||||
|
#undef jzero_far
|
||||||
|
#define jzero_far(target, bytestozero) MEMZERO(target, bytestozero)
|
||||||
|
#endif
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Decompress and return some data in the single-pass case.
|
* Decompress and return some data in the single-pass case.
|
||||||
* Always attempts to emit one fully interleaved MCU row ("iMCU" row).
|
* Always attempts to emit one fully interleaved MCU row ("iMCU" row).
|
||||||
* Input and output must run in lockstep since we have only a one-MCU buffer.
|
* Input and output must run in lockstep since we have only a one-MCU buffer.
|
||||||
* Return value is JPEG_ROW_COMPLETED, JPEG_SCAN_COMPLETED, or JPEG_SUSPENDED.
|
* Return value is JPEG_ROW_COMPLETED, JPEG_SCAN_COMPLETED, or JPEG_SUSPENDED.
|
||||||
*
|
*
|
||||||
* NB: output_buf contains a plane for each component in image.
|
* NB: output_buf contains a plane for each component in image,
|
||||||
* For single pass, this is the same as the components in the scan.
|
* which we index according to the component's SOF position.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
METHODDEF(int)
|
METHODDEF(int)
|
||||||
@@ -150,15 +162,61 @@ decompress_onepass (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
JDIMENSION MCU_col_num; /* index of current MCU within row */
|
JDIMENSION MCU_col_num; /* index of current MCU within row */
|
||||||
JDIMENSION last_MCU_col = cinfo->MCUs_per_row - 1;
|
JDIMENSION last_MCU_col = cinfo->MCUs_per_row - 1;
|
||||||
JDIMENSION last_iMCU_row = cinfo->total_iMCU_rows - 1;
|
JDIMENSION last_iMCU_row = cinfo->total_iMCU_rows - 1;
|
||||||
int blkn, ci, xindex, yindex, yoffset, useful_width;
|
int blkn, ci, ctr, xindex, yindex, yoffset;
|
||||||
JSAMPARRAY output_ptr;
|
JSAMPARRAY output_ptr;
|
||||||
JDIMENSION start_col, output_col;
|
JDIMENSION output_col;
|
||||||
jpeg_component_info *compptr;
|
jpeg_component_info *compptr;
|
||||||
inverse_DCT_method_ptr inverse_DCT;
|
inverse_DCT_method_ptr inverse_DCT;
|
||||||
|
JSAMPARRAY output_ptr_blk[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
JDIMENSION output_col_off[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
jpeg_component_info *compptr_blk[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
inverse_DCT_method_ptr inverse_DCT_blk_1[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
inverse_DCT_method_ptr inverse_DCT_blk_2[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
inverse_DCT_method_ptr *inverse_DCT_blk;
|
||||||
|
|
||||||
/* Loop to process as much as one whole iMCU row */
|
/* Loop to process as much as one whole iMCU row */
|
||||||
for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
|
for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
|
||||||
yoffset++) {
|
yoffset++) {
|
||||||
|
/* Determine where data should go in output_buf and do the IDCT thing.
|
||||||
|
* We skip dummy blocks at the right and bottom edges (but blkn gets
|
||||||
|
* incremented past them!). Note the inner loop relies on having
|
||||||
|
* allocated the MCU_buffer[] blocks sequentially.
|
||||||
|
*/
|
||||||
|
blkn = 0; /* index of current DCT block within MCU */
|
||||||
|
for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
|
||||||
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
|
/* Don't bother to IDCT an uninteresting component. */
|
||||||
|
if (! compptr->component_needed) {
|
||||||
|
for (ctr = compptr->MCU_blocks; ctr > 0; ctr--) {
|
||||||
|
inverse_DCT_blk_1[blkn] = inverse_DCT_blk_2[blkn] = NULL;
|
||||||
|
blkn++;
|
||||||
|
}
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
inverse_DCT = cinfo->idct->inverse_DCT[compptr->component_index];
|
||||||
|
output_ptr = output_buf[compptr->component_index] +
|
||||||
|
yoffset * compptr->DCT_scaled_size;
|
||||||
|
for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
|
||||||
|
if (cinfo->input_iMCU_row < last_iMCU_row ||
|
||||||
|
yoffset+yindex < compptr->last_row_height) {
|
||||||
|
for (xindex = 0; xindex < compptr->MCU_width; xindex++) {
|
||||||
|
compptr_blk[blkn] = compptr;
|
||||||
|
output_ptr_blk[blkn] = output_ptr;
|
||||||
|
output_col_off[blkn] = xindex * compptr->DCT_scaled_size;
|
||||||
|
inverse_DCT_blk_1[blkn] = inverse_DCT;
|
||||||
|
inverse_DCT_blk_2[blkn] = (xindex < compptr->last_col_width) ?
|
||||||
|
inverse_DCT : NULL;
|
||||||
|
blkn++;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
for (ctr = compptr->MCU_width; ctr > 0; ctr--) {
|
||||||
|
inverse_DCT_blk_1[blkn] = inverse_DCT_blk_2[blkn] = NULL;
|
||||||
|
blkn++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
output_ptr += compptr->DCT_scaled_size;
|
||||||
|
}
|
||||||
|
}
|
||||||
for (MCU_col_num = coef->MCU_ctr; MCU_col_num <= last_MCU_col;
|
for (MCU_col_num = coef->MCU_ctr; MCU_col_num <= last_MCU_col;
|
||||||
MCU_col_num++) {
|
MCU_col_num++) {
|
||||||
/* Try to fetch an MCU. Entropy decoder expects buffer to be zeroed. */
|
/* Try to fetch an MCU. Entropy decoder expects buffer to be zeroed. */
|
||||||
@@ -170,38 +228,17 @@ decompress_onepass (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
coef->MCU_ctr = MCU_col_num;
|
coef->MCU_ctr = MCU_col_num;
|
||||||
return JPEG_SUSPENDED;
|
return JPEG_SUSPENDED;
|
||||||
}
|
}
|
||||||
/* Determine where data should go in output_buf and do the IDCT thing.
|
inverse_DCT_blk = (MCU_col_num < last_MCU_col) ? inverse_DCT_blk_1
|
||||||
* We skip dummy blocks at the right and bottom edges (but blkn gets
|
: inverse_DCT_blk_2;
|
||||||
* incremented past them!). Note the inner loop relies on having
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
* allocated the MCU_buffer[] blocks sequentially.
|
inverse_DCT = inverse_DCT_blk[blkn];
|
||||||
*/
|
if (inverse_DCT == NULL)
|
||||||
blkn = 0; /* index of current DCT block within MCU */
|
|
||||||
for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
|
|
||||||
compptr = cinfo->cur_comp_info[ci];
|
|
||||||
/* Don't bother to IDCT an uninteresting component. */
|
|
||||||
if (! compptr->component_needed) {
|
|
||||||
blkn += compptr->MCU_blocks;
|
|
||||||
continue;
|
continue;
|
||||||
}
|
compptr = compptr_blk[blkn];
|
||||||
inverse_DCT = cinfo->idct->inverse_DCT[compptr->component_index];
|
output_col = MCU_col_num * compptr->MCU_sample_width +
|
||||||
useful_width = (MCU_col_num < last_MCU_col) ? compptr->MCU_width
|
output_col_off[blkn];
|
||||||
: compptr->last_col_width;
|
(*inverse_DCT) (cinfo, compptr, (JCOEFPTR) coef->MCU_buffer[blkn],
|
||||||
output_ptr = output_buf[ci] + yoffset * compptr->DCT_scaled_size;
|
output_ptr_blk[blkn], output_col);
|
||||||
start_col = MCU_col_num * compptr->MCU_sample_width;
|
|
||||||
for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
|
|
||||||
if (cinfo->input_iMCU_row < last_iMCU_row ||
|
|
||||||
yoffset+yindex < compptr->last_row_height) {
|
|
||||||
output_col = start_col;
|
|
||||||
for (xindex = 0; xindex < useful_width; xindex++) {
|
|
||||||
(*inverse_DCT) (cinfo, compptr,
|
|
||||||
(JCOEFPTR) coef->MCU_buffer[blkn+xindex],
|
|
||||||
output_ptr, output_col);
|
|
||||||
output_col += compptr->DCT_scaled_size;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
blkn += compptr->MCU_width;
|
|
||||||
output_ptr += compptr->DCT_scaled_size;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
/* Completed an MCU row, but perhaps not an iMCU row */
|
/* Completed an MCU row, but perhaps not an iMCU row */
|
||||||
@@ -249,6 +286,8 @@ consume_data (j_decompress_ptr cinfo)
|
|||||||
JBLOCKARRAY buffer[MAX_COMPS_IN_SCAN];
|
JBLOCKARRAY buffer[MAX_COMPS_IN_SCAN];
|
||||||
JBLOCKROW buffer_ptr;
|
JBLOCKROW buffer_ptr;
|
||||||
jpeg_component_info *compptr;
|
jpeg_component_info *compptr;
|
||||||
|
int MCU_width[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
JBLOCKROW MCU_buffer_base[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
|
||||||
/* Align the virtual buffers for the components used in this scan. */
|
/* Align the virtual buffers for the components used in this scan. */
|
||||||
for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
|
for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
|
||||||
@@ -266,20 +305,25 @@ consume_data (j_decompress_ptr cinfo)
|
|||||||
/* Loop to process one whole iMCU row */
|
/* Loop to process one whole iMCU row */
|
||||||
for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
|
for (yoffset = coef->MCU_vert_offset; yoffset < coef->MCU_rows_per_iMCU_row;
|
||||||
yoffset++) {
|
yoffset++) {
|
||||||
for (MCU_col_num = coef->MCU_ctr; MCU_col_num < cinfo->MCUs_per_row;
|
|
||||||
MCU_col_num++) {
|
|
||||||
/* Construct list of pointers to DCT blocks belonging to this MCU */
|
/* Construct list of pointers to DCT blocks belonging to this MCU */
|
||||||
blkn = 0; /* index of current DCT block within MCU */
|
blkn = 0; /* index of current DCT block within MCU */
|
||||||
for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
|
for (ci = 0; ci < cinfo->comps_in_scan; ci++) {
|
||||||
compptr = cinfo->cur_comp_info[ci];
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
start_col = MCU_col_num * compptr->MCU_width;
|
|
||||||
for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
|
for (yindex = 0; yindex < compptr->MCU_height; yindex++) {
|
||||||
buffer_ptr = buffer[ci][yindex+yoffset] + start_col;
|
buffer_ptr = buffer[ci][yindex+yoffset];
|
||||||
for (xindex = 0; xindex < compptr->MCU_width; xindex++) {
|
for (xindex = 0; xindex < compptr->MCU_width; xindex++) {
|
||||||
coef->MCU_buffer[blkn++] = buffer_ptr++;
|
MCU_width[blkn] = compptr->MCU_width;
|
||||||
|
MCU_buffer_base[blkn] = buffer_ptr++;
|
||||||
|
blkn++;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
for (MCU_col_num = coef->MCU_ctr; MCU_col_num < cinfo->MCUs_per_row;
|
||||||
|
MCU_col_num++) {
|
||||||
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
|
start_col = MCU_col_num * MCU_width[blkn];
|
||||||
|
coef->MCU_buffer[blkn] = MCU_buffer_base[blkn] + start_col;
|
||||||
|
}
|
||||||
/* Try to fetch the MCU. */
|
/* Try to fetch the MCU. */
|
||||||
if (! (*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
|
if (! (*cinfo->entropy->decode_mcu) (cinfo, coef->MCU_buffer)) {
|
||||||
/* Suspension forced; update state counters and exit */
|
/* Suspension forced; update state counters and exit */
|
||||||
@@ -452,6 +496,15 @@ smoothing_ok (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SIMD Ext: Most of SSE/SSE2 instructions require that the memory address
|
||||||
|
* is aligned to a 16-byte boundary; if not, a general-protection exception
|
||||||
|
* (#GP) is generated.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define ALIGN_SIZE 16 /* sizeof SSE/SSE2 register */
|
||||||
|
#define ALIGN_MEM(p,a) ((void *) (((size_t) (p) + (a) - 1) & -(a)))
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Variant of decompress_data for use when doing block smoothing.
|
* Variant of decompress_data for use when doing block smoothing.
|
||||||
*/
|
*/
|
||||||
@@ -470,7 +523,8 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
jpeg_component_info *compptr;
|
jpeg_component_info *compptr;
|
||||||
inverse_DCT_method_ptr inverse_DCT;
|
inverse_DCT_method_ptr inverse_DCT;
|
||||||
boolean first_row, last_row;
|
boolean first_row, last_row;
|
||||||
JBLOCK workspace;
|
JCOEF workspace[DCTSIZE2 + ALIGN_SIZE/sizeof(JCOEF)];
|
||||||
|
JCOEF * workptr = (JCOEF *) ALIGN_MEM(workspace, ALIGN_SIZE);
|
||||||
int *coef_bits;
|
int *coef_bits;
|
||||||
JQUANT_TBL *quanttbl;
|
JQUANT_TBL *quanttbl;
|
||||||
INT32 Q00,Q01,Q02,Q10,Q11,Q20, num;
|
INT32 Q00,Q01,Q02,Q10,Q11,Q20, num;
|
||||||
@@ -559,7 +613,7 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
last_block_column = compptr->width_in_blocks - 1;
|
last_block_column = compptr->width_in_blocks - 1;
|
||||||
for (block_num = 0; block_num <= last_block_column; block_num++) {
|
for (block_num = 0; block_num <= last_block_column; block_num++) {
|
||||||
/* Fetch current DCT block into workspace so we can modify it. */
|
/* Fetch current DCT block into workspace so we can modify it. */
|
||||||
jcopy_block_row(buffer_ptr, (JBLOCKROW) workspace, (JDIMENSION) 1);
|
jcopy_block_row(buffer_ptr, (JBLOCKROW) workptr, (JDIMENSION) 1);
|
||||||
/* Update DC values */
|
/* Update DC values */
|
||||||
if (block_num < last_block_column) {
|
if (block_num < last_block_column) {
|
||||||
DC3 = (int) prev_block_row[1][0];
|
DC3 = (int) prev_block_row[1][0];
|
||||||
@@ -571,7 +625,7 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
* and is not known to be fully accurate.
|
* and is not known to be fully accurate.
|
||||||
*/
|
*/
|
||||||
/* AC01 */
|
/* AC01 */
|
||||||
if ((Al=coef_bits[1]) != 0 && workspace[1] == 0) {
|
if ((Al=coef_bits[1]) != 0 && workptr[1] == 0) {
|
||||||
num = 36 * Q00 * (DC4 - DC6);
|
num = 36 * Q00 * (DC4 - DC6);
|
||||||
if (num >= 0) {
|
if (num >= 0) {
|
||||||
pred = (int) (((Q01<<7) + num) / (Q01<<8));
|
pred = (int) (((Q01<<7) + num) / (Q01<<8));
|
||||||
@@ -583,10 +637,10 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
pred = (1<<Al)-1;
|
pred = (1<<Al)-1;
|
||||||
pred = -pred;
|
pred = -pred;
|
||||||
}
|
}
|
||||||
workspace[1] = (JCOEF) pred;
|
workptr[1] = (JCOEF) pred;
|
||||||
}
|
}
|
||||||
/* AC10 */
|
/* AC10 */
|
||||||
if ((Al=coef_bits[2]) != 0 && workspace[8] == 0) {
|
if ((Al=coef_bits[2]) != 0 && workptr[8] == 0) {
|
||||||
num = 36 * Q00 * (DC2 - DC8);
|
num = 36 * Q00 * (DC2 - DC8);
|
||||||
if (num >= 0) {
|
if (num >= 0) {
|
||||||
pred = (int) (((Q10<<7) + num) / (Q10<<8));
|
pred = (int) (((Q10<<7) + num) / (Q10<<8));
|
||||||
@@ -598,10 +652,10 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
pred = (1<<Al)-1;
|
pred = (1<<Al)-1;
|
||||||
pred = -pred;
|
pred = -pred;
|
||||||
}
|
}
|
||||||
workspace[8] = (JCOEF) pred;
|
workptr[8] = (JCOEF) pred;
|
||||||
}
|
}
|
||||||
/* AC20 */
|
/* AC20 */
|
||||||
if ((Al=coef_bits[3]) != 0 && workspace[16] == 0) {
|
if ((Al=coef_bits[3]) != 0 && workptr[16] == 0) {
|
||||||
num = 9 * Q00 * (DC2 + DC8 - 2*DC5);
|
num = 9 * Q00 * (DC2 + DC8 - 2*DC5);
|
||||||
if (num >= 0) {
|
if (num >= 0) {
|
||||||
pred = (int) (((Q20<<7) + num) / (Q20<<8));
|
pred = (int) (((Q20<<7) + num) / (Q20<<8));
|
||||||
@@ -613,10 +667,10 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
pred = (1<<Al)-1;
|
pred = (1<<Al)-1;
|
||||||
pred = -pred;
|
pred = -pred;
|
||||||
}
|
}
|
||||||
workspace[16] = (JCOEF) pred;
|
workptr[16] = (JCOEF) pred;
|
||||||
}
|
}
|
||||||
/* AC11 */
|
/* AC11 */
|
||||||
if ((Al=coef_bits[4]) != 0 && workspace[9] == 0) {
|
if ((Al=coef_bits[4]) != 0 && workptr[9] == 0) {
|
||||||
num = 5 * Q00 * (DC1 - DC3 - DC7 + DC9);
|
num = 5 * Q00 * (DC1 - DC3 - DC7 + DC9);
|
||||||
if (num >= 0) {
|
if (num >= 0) {
|
||||||
pred = (int) (((Q11<<7) + num) / (Q11<<8));
|
pred = (int) (((Q11<<7) + num) / (Q11<<8));
|
||||||
@@ -628,10 +682,10 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
pred = (1<<Al)-1;
|
pred = (1<<Al)-1;
|
||||||
pred = -pred;
|
pred = -pred;
|
||||||
}
|
}
|
||||||
workspace[9] = (JCOEF) pred;
|
workptr[9] = (JCOEF) pred;
|
||||||
}
|
}
|
||||||
/* AC02 */
|
/* AC02 */
|
||||||
if ((Al=coef_bits[5]) != 0 && workspace[2] == 0) {
|
if ((Al=coef_bits[5]) != 0 && workptr[2] == 0) {
|
||||||
num = 9 * Q00 * (DC4 + DC6 - 2*DC5);
|
num = 9 * Q00 * (DC4 + DC6 - 2*DC5);
|
||||||
if (num >= 0) {
|
if (num >= 0) {
|
||||||
pred = (int) (((Q02<<7) + num) / (Q02<<8));
|
pred = (int) (((Q02<<7) + num) / (Q02<<8));
|
||||||
@@ -643,10 +697,10 @@ decompress_smooth_data (j_decompress_ptr cinfo, JSAMPIMAGE output_buf)
|
|||||||
pred = (1<<Al)-1;
|
pred = (1<<Al)-1;
|
||||||
pred = -pred;
|
pred = -pred;
|
||||||
}
|
}
|
||||||
workspace[2] = (JCOEF) pred;
|
workptr[2] = (JCOEF) pred;
|
||||||
}
|
}
|
||||||
/* OK, do the IDCT */
|
/* OK, do the IDCT */
|
||||||
(*inverse_DCT) (cinfo, compptr, (JCOEFPTR) workspace,
|
(*inverse_DCT) (cinfo, compptr, (JCOEFPTR) workptr,
|
||||||
output_ptr, output_col);
|
output_ptr, output_col);
|
||||||
/* Advance for next column */
|
/* Advance for next column */
|
||||||
DC1 = DC2; DC2 = DC3;
|
DC1 = DC2; DC2 = DC3;
|
||||||
|
|||||||
438
jdcolmmx.asm
Normal file
438
jdcolmmx.asm
Normal file
@@ -0,0 +1,438 @@
|
|||||||
|
;
|
||||||
|
; jdcolmmx.asm - colorspace conversion (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
%ifdef JDCOLOR_YCCRGB_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define SCALEBITS 16
|
||||||
|
|
||||||
|
F_0_344 equ 22554 ; FIX(0.34414)
|
||||||
|
F_0_714 equ 46802 ; FIX(0.71414)
|
||||||
|
F_1_402 equ 91881 ; FIX(1.40200)
|
||||||
|
F_1_772 equ 116130 ; FIX(1.77200)
|
||||||
|
F_0_402 equ (F_1_402 - 65536) ; FIX(1.40200) - FIX(1)
|
||||||
|
F_0_285 equ ( 65536 - F_0_714) ; FIX(1) - FIX(0.71414)
|
||||||
|
F_0_228 equ (131072 - F_1_772) ; FIX(2) - FIX(1.77200)
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_ycc_rgb_convert_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_ycc_rgb_convert_mmx):
|
||||||
|
|
||||||
|
PW_F0402 times 4 dw F_0_402
|
||||||
|
PW_MF0228 times 4 dw -F_0_228
|
||||||
|
PW_MF0344_F0285 times 2 dw -F_0_344, F_0_285
|
||||||
|
PW_ONE times 4 dw 1
|
||||||
|
PD_ONEHALF times 2 dd 1 << (SCALEBITS-1)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Convert some rows of samples to the output colorspace.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_ycc_rgb_convert_mmx (j_decompress_ptr cinfo,
|
||||||
|
; JSAMPIMAGE input_buf, JDIMENSION input_row,
|
||||||
|
; JSAMPARRAY output_buf, int num_rows)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPIMAGE input_buf
|
||||||
|
%define input_row(b) (b)+16 ; JDIMENSION input_row
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define num_rows(b) (b)+24 ; int num_rows
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_ycc_rgb_convert_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_ycc_rgb_convert_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(eax)]
|
||||||
|
mov ecx, JDIMENSION [jdstruct_output_width(ecx)] ; num_cols
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPIMAGE [input_buf(eax)]
|
||||||
|
mov ecx, JDIMENSION [input_row(eax)]
|
||||||
|
mov esi, JSAMPARRAY [edi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [edi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [edi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
lea esi, [esi+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea ebx, [ebx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea edx, [edx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)]
|
||||||
|
mov eax, INT [num_rows(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax
|
||||||
|
push edi
|
||||||
|
push edx
|
||||||
|
push ebx
|
||||||
|
push esi
|
||||||
|
push ecx ; col
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr0
|
||||||
|
mov ebx, JSAMPROW [ebx] ; inptr1
|
||||||
|
mov edx, JSAMPROW [edx] ; inptr2
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
movpic eax, POINTER [gotptr] ; load GOT address (eax)
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm5, MMWORD [ebx] ; mm5=Cb(01234567)
|
||||||
|
movq mm1, MMWORD [edx] ; mm1=Cr(01234567)
|
||||||
|
|
||||||
|
pcmpeqw mm4,mm4
|
||||||
|
pcmpeqw mm7,mm7
|
||||||
|
psrlw mm4,BYTE_BIT
|
||||||
|
psllw mm7,7 ; mm7={0xFF80 0xFF80 0xFF80 0xFF80}
|
||||||
|
movq mm0,mm4 ; mm0=mm4={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
|
||||||
|
pand mm4,mm5 ; mm4=Cb(0246)=CbE
|
||||||
|
psrlw mm5,BYTE_BIT ; mm5=Cb(1357)=CbO
|
||||||
|
pand mm0,mm1 ; mm0=Cr(0246)=CrE
|
||||||
|
psrlw mm1,BYTE_BIT ; mm1=Cr(1357)=CrO
|
||||||
|
|
||||||
|
paddw mm4,mm7
|
||||||
|
paddw mm5,mm7
|
||||||
|
paddw mm0,mm7
|
||||||
|
paddw mm1,mm7
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; R = Y + 1.40200 * Cr
|
||||||
|
; G = Y - 0.34414 * Cb - 0.71414 * Cr
|
||||||
|
; B = Y + 1.77200 * Cb
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; R = Y + 0.40200 * Cr + Cr
|
||||||
|
; G = Y - 0.34414 * Cb + 0.28586 * Cr - Cr
|
||||||
|
; B = Y - 0.22800 * Cb + Cb + Cb
|
||||||
|
|
||||||
|
movq mm2,mm4 ; mm2=CbE
|
||||||
|
movq mm3,mm5 ; mm3=CbO
|
||||||
|
paddw mm4,mm4 ; mm4=2*CbE
|
||||||
|
paddw mm5,mm5 ; mm5=2*CbO
|
||||||
|
movq mm6,mm0 ; mm6=CrE
|
||||||
|
movq mm7,mm1 ; mm7=CrO
|
||||||
|
paddw mm0,mm0 ; mm0=2*CrE
|
||||||
|
paddw mm1,mm1 ; mm1=2*CrO
|
||||||
|
|
||||||
|
pmulhw mm4,[GOTOFF(eax,PW_MF0228)] ; mm4=(2*CbE * -FIX(0.22800))
|
||||||
|
pmulhw mm5,[GOTOFF(eax,PW_MF0228)] ; mm5=(2*CbO * -FIX(0.22800))
|
||||||
|
pmulhw mm0,[GOTOFF(eax,PW_F0402)] ; mm0=(2*CrE * FIX(0.40200))
|
||||||
|
pmulhw mm1,[GOTOFF(eax,PW_F0402)] ; mm1=(2*CrO * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw mm4,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw mm5,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw mm4,1 ; mm4=(CbE * -FIX(0.22800))
|
||||||
|
psraw mm5,1 ; mm5=(CbO * -FIX(0.22800))
|
||||||
|
paddw mm0,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw mm1,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw mm0,1 ; mm0=(CrE * FIX(0.40200))
|
||||||
|
psraw mm1,1 ; mm1=(CrO * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw mm4,mm2
|
||||||
|
paddw mm5,mm3
|
||||||
|
paddw mm4,mm2 ; mm4=(CbE * FIX(1.77200))=(B-Y)E
|
||||||
|
paddw mm5,mm3 ; mm5=(CbO * FIX(1.77200))=(B-Y)O
|
||||||
|
paddw mm0,mm6 ; mm0=(CrE * FIX(1.40200))=(R-Y)E
|
||||||
|
paddw mm1,mm7 ; mm1=(CrO * FIX(1.40200))=(R-Y)O
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=(B-Y)E
|
||||||
|
movq MMWORD [wk(1)], mm5 ; wk(1)=(B-Y)O
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm5,mm3
|
||||||
|
punpcklwd mm2,mm6
|
||||||
|
punpckhwd mm4,mm6
|
||||||
|
pmaddwd mm2,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd mm4,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
punpcklwd mm3,mm7
|
||||||
|
punpckhwd mm5,mm7
|
||||||
|
pmaddwd mm3,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd mm5,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
|
||||||
|
paddd mm2,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd mm4,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad mm2,SCALEBITS
|
||||||
|
psrad mm4,SCALEBITS
|
||||||
|
paddd mm3,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd mm5,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad mm3,SCALEBITS
|
||||||
|
psrad mm5,SCALEBITS
|
||||||
|
|
||||||
|
packssdw mm2,mm4 ; mm2=CbE*-FIX(0.344)+CrE*FIX(0.285)
|
||||||
|
packssdw mm3,mm5 ; mm3=CbO*-FIX(0.344)+CrO*FIX(0.285)
|
||||||
|
psubw mm2,mm6 ; mm2=CbE*-FIX(0.344)+CrE*-FIX(0.714)=(G-Y)E
|
||||||
|
psubw mm3,mm7 ; mm3=CbO*-FIX(0.344)+CrO*-FIX(0.714)=(G-Y)O
|
||||||
|
|
||||||
|
movq mm5, MMWORD [esi] ; mm5=Y(01234567)
|
||||||
|
|
||||||
|
pcmpeqw mm4,mm4
|
||||||
|
psrlw mm4,BYTE_BIT ; mm4={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
pand mm4,mm5 ; mm4=Y(0246)=YE
|
||||||
|
psrlw mm5,BYTE_BIT ; mm5=Y(1357)=YO
|
||||||
|
|
||||||
|
paddw mm0,mm4 ; mm0=((R-Y)E+YE)=RE=(R0 R2 R4 R6)
|
||||||
|
paddw mm1,mm5 ; mm1=((R-Y)O+YO)=RO=(R1 R3 R5 R7)
|
||||||
|
packuswb mm0,mm0 ; mm0=(R0 R2 R4 R6 ** ** ** **)
|
||||||
|
packuswb mm1,mm1 ; mm1=(R1 R3 R5 R7 ** ** ** **)
|
||||||
|
|
||||||
|
paddw mm2,mm4 ; mm2=((G-Y)E+YE)=GE=(G0 G2 G4 G6)
|
||||||
|
paddw mm3,mm5 ; mm3=((G-Y)O+YO)=GO=(G1 G3 G5 G7)
|
||||||
|
packuswb mm2,mm2 ; mm2=(G0 G2 G4 G6 ** ** ** **)
|
||||||
|
packuswb mm3,mm3 ; mm3=(G1 G3 G5 G7 ** ** ** **)
|
||||||
|
|
||||||
|
paddw mm4, MMWORD [wk(0)] ; mm4=(YE+(B-Y)E)=BE=(B0 B2 B4 B6)
|
||||||
|
paddw mm5, MMWORD [wk(1)] ; mm5=(YO+(B-Y)O)=BO=(B1 B3 B5 B7)
|
||||||
|
packuswb mm4,mm4 ; mm4=(B0 B2 B4 B6 ** ** ** **)
|
||||||
|
packuswb mm5,mm5 ; mm5=(B1 B3 B5 B7 ** ** ** **)
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 ; ---------------
|
||||||
|
|
||||||
|
; mmA=(00 02 04 06 ** ** ** **), mmB=(01 03 05 07 ** ** ** **)
|
||||||
|
; mmC=(10 12 14 16 ** ** ** **), mmD=(11 13 15 17 ** ** ** **)
|
||||||
|
; mmE=(20 22 24 26 ** ** ** **), mmF=(21 23 25 27 ** ** ** **)
|
||||||
|
; mmG=(** ** ** ** ** ** ** **), mmH=(** ** ** ** ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw mmA,mmC ; mmA=(00 10 02 12 04 14 06 16)
|
||||||
|
punpcklbw mmE,mmB ; mmE=(20 01 22 03 24 05 26 07)
|
||||||
|
punpcklbw mmD,mmF ; mmD=(11 21 13 23 15 25 17 27)
|
||||||
|
|
||||||
|
movq mmG,mmA
|
||||||
|
movq mmH,mmA
|
||||||
|
punpcklwd mmA,mmE ; mmA=(00 10 20 01 02 12 22 03)
|
||||||
|
punpckhwd mmG,mmE ; mmG=(04 14 24 05 06 16 26 07)
|
||||||
|
|
||||||
|
psrlq mmH,2*BYTE_BIT ; mmH=(02 12 04 14 06 16 -- --)
|
||||||
|
psrlq mmE,2*BYTE_BIT ; mmE=(22 03 24 05 26 07 -- --)
|
||||||
|
|
||||||
|
movq mmC,mmD
|
||||||
|
movq mmB,mmD
|
||||||
|
punpcklwd mmD,mmH ; mmD=(11 21 02 12 13 23 04 14)
|
||||||
|
punpckhwd mmC,mmH ; mmC=(15 25 06 16 17 27 -- --)
|
||||||
|
|
||||||
|
psrlq mmB,2*BYTE_BIT ; mmB=(13 23 15 25 17 27 -- --)
|
||||||
|
|
||||||
|
movq mmF,mmE
|
||||||
|
punpcklwd mmE,mmB ; mmE=(22 03 13 23 24 05 15 25)
|
||||||
|
punpckhwd mmF,mmB ; mmF=(26 07 17 27 -- -- -- --)
|
||||||
|
|
||||||
|
punpckldq mmA,mmD ; mmA=(00 10 20 01 11 21 02 12)
|
||||||
|
punpckldq mmE,mmG ; mmE=(22 03 13 23 04 14 24 05)
|
||||||
|
punpckldq mmC,mmF ; mmC=(15 25 06 16 26 07 17 27)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmE
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_MMWORD ; inptr0
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr2
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st16:
|
||||||
|
lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE
|
||||||
|
cmp ecx, byte 2*SIZEOF_MMWORD
|
||||||
|
jb short .column_st8
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmE
|
||||||
|
movq mmA,mmC
|
||||||
|
sub ecx, byte 2*SIZEOF_MMWORD
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD
|
||||||
|
jmp short .column_st4
|
||||||
|
.column_st8:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st4
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq mmA,mmE
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
add edi, byte SIZEOF_MMWORD
|
||||||
|
.column_st4:
|
||||||
|
movd eax,mmA
|
||||||
|
cmp ecx, byte SIZEOF_DWORD
|
||||||
|
jb short .column_st2
|
||||||
|
mov DWORD [edi+0*SIZEOF_DWORD], eax
|
||||||
|
psrlq mmA,DWORD_BIT
|
||||||
|
movd eax,mmA
|
||||||
|
sub ecx, byte SIZEOF_DWORD
|
||||||
|
add edi, byte SIZEOF_DWORD
|
||||||
|
.column_st2:
|
||||||
|
cmp ecx, byte SIZEOF_WORD
|
||||||
|
jb short .column_st1
|
||||||
|
mov WORD [edi+0*SIZEOF_WORD], ax
|
||||||
|
shr eax,WORD_BIT
|
||||||
|
sub ecx, byte SIZEOF_WORD
|
||||||
|
add edi, byte SIZEOF_WORD
|
||||||
|
.column_st1:
|
||||||
|
cmp ecx, byte SIZEOF_BYTE
|
||||||
|
jb short .nextrow
|
||||||
|
mov BYTE [edi+0*SIZEOF_BYTE], al
|
||||||
|
|
||||||
|
%else ; RGB_PIXELSIZE == 4 ; -----------
|
||||||
|
|
||||||
|
%ifdef RGBX_FILLER_0XFF
|
||||||
|
pcmpeqb mm6,mm6 ; mm6=(X0 X2 X4 X6 ** ** ** **)
|
||||||
|
pcmpeqb mm7,mm7 ; mm7=(X1 X3 X5 X7 ** ** ** **)
|
||||||
|
%else
|
||||||
|
pxor mm6,mm6 ; mm6=(X0 X2 X4 X6 ** ** ** **)
|
||||||
|
pxor mm7,mm7 ; mm7=(X1 X3 X5 X7 ** ** ** **)
|
||||||
|
%endif
|
||||||
|
; mmA=(00 02 04 06 ** ** ** **), mmB=(01 03 05 07 ** ** ** **)
|
||||||
|
; mmC=(10 12 14 16 ** ** ** **), mmD=(11 13 15 17 ** ** ** **)
|
||||||
|
; mmE=(20 22 24 26 ** ** ** **), mmF=(21 23 25 27 ** ** ** **)
|
||||||
|
; mmG=(30 32 34 36 ** ** ** **), mmH=(31 33 35 37 ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw mmA,mmC ; mmA=(00 10 02 12 04 14 06 16)
|
||||||
|
punpcklbw mmE,mmG ; mmE=(20 30 22 32 24 34 26 36)
|
||||||
|
punpcklbw mmB,mmD ; mmB=(01 11 03 13 05 15 07 17)
|
||||||
|
punpcklbw mmF,mmH ; mmF=(21 31 23 33 25 35 27 37)
|
||||||
|
|
||||||
|
movq mmC,mmA
|
||||||
|
punpcklwd mmA,mmE ; mmA=(00 10 20 30 02 12 22 32)
|
||||||
|
punpckhwd mmC,mmE ; mmC=(04 14 24 34 06 16 26 36)
|
||||||
|
movq mmG,mmB
|
||||||
|
punpcklwd mmB,mmF ; mmB=(01 11 21 31 03 13 23 33)
|
||||||
|
punpckhwd mmG,mmF ; mmG=(05 15 25 35 07 17 27 37)
|
||||||
|
|
||||||
|
movq mmD,mmA
|
||||||
|
punpckldq mmA,mmB ; mmA=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq mmD,mmB ; mmD=(02 12 22 32 03 13 23 33)
|
||||||
|
movq mmH,mmC
|
||||||
|
punpckldq mmC,mmG ; mmC=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq mmH,mmG ; mmH=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmD
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
|
||||||
|
movq MMWORD [edi+3*SIZEOF_MMWORD], mmH
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_MMWORD ; inptr0
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr2
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st16:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/2
|
||||||
|
jb short .column_st8
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmD
|
||||||
|
movq mmA,mmC
|
||||||
|
movq mmD,mmH
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/2
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD
|
||||||
|
.column_st8:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/4
|
||||||
|
jb short .column_st4
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq mmA,mmD
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/4
|
||||||
|
add edi, byte 1*SIZEOF_MMWORD
|
||||||
|
.column_st4:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/8
|
||||||
|
jb short .nextrow
|
||||||
|
movd DWORD [edi+0*SIZEOF_DWORD], mmA
|
||||||
|
|
||||||
|
%endif ; RGB_PIXELSIZE ; ---------------
|
||||||
|
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop ecx
|
||||||
|
pop esi
|
||||||
|
pop ebx
|
||||||
|
pop edx
|
||||||
|
pop edi
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
add ebx, byte SIZEOF_JSAMPROW
|
||||||
|
add edx, byte SIZEOF_JSAMPROW
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_buf
|
||||||
|
dec eax ; num_rows
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JDCOLOR_YCCRGB_MMX_SUPPORTED
|
||||||
|
%endif ; RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
97
jdcolor.c
97
jdcolor.c
@@ -1,16 +1,24 @@
|
|||||||
/*
|
/*
|
||||||
* jdcolor.c
|
* jdcolor.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 5, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains output colorspace conversion routines.
|
* This file contains output colorspace conversion routines.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#define JPEG_INTERNALS
|
#define JPEG_INTERNALS
|
||||||
#include "jinclude.h"
|
#include "jinclude.h"
|
||||||
#include "jpeglib.h"
|
#include "jpeglib.h"
|
||||||
|
#include "jcolsamp.h" /* Private declarations */
|
||||||
|
|
||||||
|
|
||||||
/* Private subobject */
|
/* Private subobject */
|
||||||
@@ -105,6 +113,17 @@ build_ycc_rgb_table (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
/* offset of filler byte */
|
||||||
|
#define RGB_FILLER (6 - (RGB_RED) - (RGB_GREEN) - (RGB_BLUE))
|
||||||
|
/* byte pattern to fill with */
|
||||||
|
#ifdef RGBX_FILLER_0XFF
|
||||||
|
#define RGB_FILLER_BYTE 0xFF
|
||||||
|
#else
|
||||||
|
#define RGB_FILLER_BYTE 0x00
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 4 */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Convert some rows of samples to the output colorspace.
|
* Convert some rows of samples to the output colorspace.
|
||||||
*
|
*
|
||||||
@@ -151,6 +170,9 @@ ycc_rgb_convert (j_decompress_ptr cinfo,
|
|||||||
((int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
|
((int) RIGHT_SHIFT(Cbgtab[cb] + Crgtab[cr],
|
||||||
SCALEBITS))];
|
SCALEBITS))];
|
||||||
outptr[RGB_BLUE] = range_limit[y + Cbbtab[cb]];
|
outptr[RGB_BLUE] = range_limit[y + Cbbtab[cb]];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr += RGB_PIXELSIZE;
|
outptr += RGB_PIXELSIZE;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -207,6 +229,36 @@ grayscale_convert (j_decompress_ptr cinfo,
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Convert grayscale to RGB: just duplicate the graylevel three times.
|
||||||
|
* This is provided to support applications that don't want to cope
|
||||||
|
* with grayscale as a separate case.
|
||||||
|
*/
|
||||||
|
|
||||||
|
METHODDEF(void)
|
||||||
|
gray_rgb_convert (j_decompress_ptr cinfo,
|
||||||
|
JSAMPIMAGE input_buf, JDIMENSION input_row,
|
||||||
|
JSAMPARRAY output_buf, int num_rows)
|
||||||
|
{
|
||||||
|
register JSAMPROW inptr, outptr;
|
||||||
|
register JDIMENSION col;
|
||||||
|
JDIMENSION num_cols = cinfo->output_width;
|
||||||
|
|
||||||
|
while (--num_rows >= 0) {
|
||||||
|
inptr = input_buf[0][input_row++];
|
||||||
|
outptr = *output_buf++;
|
||||||
|
for (col = 0; col < num_cols; col++) {
|
||||||
|
/* We can dispense with GETJSAMPLE() here */
|
||||||
|
outptr[RGB_RED] = outptr[RGB_GREEN] = outptr[RGB_BLUE] = inptr[col];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
|
outptr += RGB_PIXELSIZE;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Adobe-style YCCK->CMYK conversion.
|
* Adobe-style YCCK->CMYK conversion.
|
||||||
* We convert YCbCr to R=1-C, G=1-M, and B=1-Y using the same
|
* We convert YCbCr to R=1-C, G=1-M, and B=1-Y using the same
|
||||||
@@ -278,6 +330,7 @@ jinit_color_deconverter (j_decompress_ptr cinfo)
|
|||||||
{
|
{
|
||||||
my_cconvert_ptr cconvert;
|
my_cconvert_ptr cconvert;
|
||||||
int ci;
|
int ci;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
cconvert = (my_cconvert_ptr)
|
cconvert = (my_cconvert_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -331,8 +384,25 @@ jinit_color_deconverter (j_decompress_ptr cinfo)
|
|||||||
case JCS_RGB:
|
case JCS_RGB:
|
||||||
cinfo->out_color_components = RGB_PIXELSIZE;
|
cinfo->out_color_components = RGB_PIXELSIZE;
|
||||||
if (cinfo->jpeg_color_space == JCS_YCbCr) {
|
if (cinfo->jpeg_color_space == JCS_YCbCr) {
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JDCOLOR_YCCRGB_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_ycc_rgb_convert_sse2)) {
|
||||||
|
cconvert->pub.color_convert = jpeg_ycc_rgb_convert_sse2;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#ifdef JDCOLOR_YCCRGB_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX) {
|
||||||
|
cconvert->pub.color_convert = jpeg_ycc_rgb_convert_mmx;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
{
|
||||||
cconvert->pub.color_convert = ycc_rgb_convert;
|
cconvert->pub.color_convert = ycc_rgb_convert;
|
||||||
build_ycc_rgb_table(cinfo);
|
build_ycc_rgb_table(cinfo);
|
||||||
|
}
|
||||||
|
} else if (cinfo->jpeg_color_space == JCS_GRAYSCALE) {
|
||||||
|
cconvert->pub.color_convert = gray_rgb_convert;
|
||||||
} else if (cinfo->jpeg_color_space == JCS_RGB && RGB_PIXELSIZE == 3) {
|
} else if (cinfo->jpeg_color_space == JCS_RGB && RGB_PIXELSIZE == 3) {
|
||||||
cconvert->pub.color_convert = null_convert;
|
cconvert->pub.color_convert = null_convert;
|
||||||
} else
|
} else
|
||||||
@@ -365,3 +435,28 @@ jinit_color_deconverter (j_decompress_ptr cinfo)
|
|||||||
else
|
else
|
||||||
cinfo->output_components = cinfo->out_color_components;
|
cinfo->output_components = cinfo->out_color_components;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_color_deconverter (j_decompress_ptr cinfo)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JDCOLOR_YCCRGB_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_ycc_rgb_convert_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JDCOLOR_YCCRGB_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
|
||||||
|
return JSIMD_NONE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|||||||
536
jdcolss2.asm
Normal file
536
jdcolss2.asm
Normal file
@@ -0,0 +1,536 @@
|
|||||||
|
;
|
||||||
|
; jdcolss2.asm - colorspace conversion (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
%ifdef JDCOLOR_YCCRGB_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define SCALEBITS 16
|
||||||
|
|
||||||
|
F_0_344 equ 22554 ; FIX(0.34414)
|
||||||
|
F_0_714 equ 46802 ; FIX(0.71414)
|
||||||
|
F_1_402 equ 91881 ; FIX(1.40200)
|
||||||
|
F_1_772 equ 116130 ; FIX(1.77200)
|
||||||
|
F_0_402 equ (F_1_402 - 65536) ; FIX(1.40200) - FIX(1)
|
||||||
|
F_0_285 equ ( 65536 - F_0_714) ; FIX(1) - FIX(0.71414)
|
||||||
|
F_0_228 equ (131072 - F_1_772) ; FIX(2) - FIX(1.77200)
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_ycc_rgb_convert_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_ycc_rgb_convert_sse2):
|
||||||
|
|
||||||
|
PW_F0402 times 8 dw F_0_402
|
||||||
|
PW_MF0228 times 8 dw -F_0_228
|
||||||
|
PW_MF0344_F0285 times 4 dw -F_0_344, F_0_285
|
||||||
|
PW_ONE times 8 dw 1
|
||||||
|
PD_ONEHALF times 4 dd 1 << (SCALEBITS-1)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Convert some rows of samples to the output colorspace.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_ycc_rgb_convert_sse2 (j_decompress_ptr cinfo,
|
||||||
|
; JSAMPIMAGE input_buf, JDIMENSION input_row,
|
||||||
|
; JSAMPARRAY output_buf, int num_rows)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPIMAGE input_buf
|
||||||
|
%define input_row(b) (b)+16 ; JDIMENSION input_row
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define num_rows(b) (b)+24 ; int num_rows
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_ycc_rgb_convert_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_ycc_rgb_convert_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(eax)]
|
||||||
|
mov ecx, JDIMENSION [jdstruct_output_width(ecx)] ; num_cols
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPIMAGE [input_buf(eax)]
|
||||||
|
mov ecx, JDIMENSION [input_row(eax)]
|
||||||
|
mov esi, JSAMPARRAY [edi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [edi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [edi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
lea esi, [esi+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea ebx, [ebx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
lea edx, [edx+ecx*SIZEOF_JSAMPROW]
|
||||||
|
|
||||||
|
pop ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)]
|
||||||
|
mov eax, INT [num_rows(eax)]
|
||||||
|
test eax,eax
|
||||||
|
jle near .return
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax
|
||||||
|
push edi
|
||||||
|
push edx
|
||||||
|
push ebx
|
||||||
|
push esi
|
||||||
|
push ecx ; col
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr0
|
||||||
|
mov ebx, JSAMPROW [ebx] ; inptr1
|
||||||
|
mov edx, JSAMPROW [edx] ; inptr2
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
movpic eax, POINTER [gotptr] ; load GOT address (eax)
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [ebx] ; xmm5=Cb(0123456789ABCDEF)
|
||||||
|
movdqa xmm1, XMMWORD [edx] ; xmm1=Cr(0123456789ABCDEF)
|
||||||
|
|
||||||
|
pcmpeqw xmm4,xmm4
|
||||||
|
pcmpeqw xmm7,xmm7
|
||||||
|
psrlw xmm4,BYTE_BIT
|
||||||
|
psllw xmm7,7 ; xmm7={0xFF80 0xFF80 0xFF80 0xFF80 ..}
|
||||||
|
movdqa xmm0,xmm4 ; xmm0=xmm4={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
|
||||||
|
pand xmm4,xmm5 ; xmm4=Cb(02468ACE)=CbE
|
||||||
|
psrlw xmm5,BYTE_BIT ; xmm5=Cb(13579BDF)=CbO
|
||||||
|
pand xmm0,xmm1 ; xmm0=Cr(02468ACE)=CrE
|
||||||
|
psrlw xmm1,BYTE_BIT ; xmm1=Cr(13579BDF)=CrO
|
||||||
|
|
||||||
|
paddw xmm4,xmm7
|
||||||
|
paddw xmm5,xmm7
|
||||||
|
paddw xmm0,xmm7
|
||||||
|
paddw xmm1,xmm7
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; R = Y + 1.40200 * Cr
|
||||||
|
; G = Y - 0.34414 * Cb - 0.71414 * Cr
|
||||||
|
; B = Y + 1.77200 * Cb
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; R = Y + 0.40200 * Cr + Cr
|
||||||
|
; G = Y - 0.34414 * Cb + 0.28586 * Cr - Cr
|
||||||
|
; B = Y - 0.22800 * Cb + Cb + Cb
|
||||||
|
|
||||||
|
movdqa xmm2,xmm4 ; xmm2=CbE
|
||||||
|
movdqa xmm3,xmm5 ; xmm3=CbO
|
||||||
|
paddw xmm4,xmm4 ; xmm4=2*CbE
|
||||||
|
paddw xmm5,xmm5 ; xmm5=2*CbO
|
||||||
|
movdqa xmm6,xmm0 ; xmm6=CrE
|
||||||
|
movdqa xmm7,xmm1 ; xmm7=CrO
|
||||||
|
paddw xmm0,xmm0 ; xmm0=2*CrE
|
||||||
|
paddw xmm1,xmm1 ; xmm1=2*CrO
|
||||||
|
|
||||||
|
pmulhw xmm4,[GOTOFF(eax,PW_MF0228)] ; xmm4=(2*CbE * -FIX(0.22800))
|
||||||
|
pmulhw xmm5,[GOTOFF(eax,PW_MF0228)] ; xmm5=(2*CbO * -FIX(0.22800))
|
||||||
|
pmulhw xmm0,[GOTOFF(eax,PW_F0402)] ; xmm0=(2*CrE * FIX(0.40200))
|
||||||
|
pmulhw xmm1,[GOTOFF(eax,PW_F0402)] ; xmm1=(2*CrO * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw xmm4,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw xmm5,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw xmm4,1 ; xmm4=(CbE * -FIX(0.22800))
|
||||||
|
psraw xmm5,1 ; xmm5=(CbO * -FIX(0.22800))
|
||||||
|
paddw xmm0,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw xmm1,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw xmm0,1 ; xmm0=(CrE * FIX(0.40200))
|
||||||
|
psraw xmm1,1 ; xmm1=(CrO * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw xmm4,xmm2
|
||||||
|
paddw xmm5,xmm3
|
||||||
|
paddw xmm4,xmm2 ; xmm4=(CbE * FIX(1.77200))=(B-Y)E
|
||||||
|
paddw xmm5,xmm3 ; xmm5=(CbO * FIX(1.77200))=(B-Y)O
|
||||||
|
paddw xmm0,xmm6 ; xmm0=(CrE * FIX(1.40200))=(R-Y)E
|
||||||
|
paddw xmm1,xmm7 ; xmm1=(CrO * FIX(1.40200))=(R-Y)O
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm4 ; wk(0)=(B-Y)E
|
||||||
|
movdqa XMMWORD [wk(1)], xmm5 ; wk(1)=(B-Y)O
|
||||||
|
|
||||||
|
movdqa xmm4,xmm2
|
||||||
|
movdqa xmm5,xmm3
|
||||||
|
punpcklwd xmm2,xmm6
|
||||||
|
punpckhwd xmm4,xmm6
|
||||||
|
pmaddwd xmm2,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd xmm4,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
punpcklwd xmm3,xmm7
|
||||||
|
punpckhwd xmm5,xmm7
|
||||||
|
pmaddwd xmm3,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd xmm5,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
|
||||||
|
paddd xmm2,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd xmm4,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad xmm2,SCALEBITS
|
||||||
|
psrad xmm4,SCALEBITS
|
||||||
|
paddd xmm3,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd xmm5,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad xmm3,SCALEBITS
|
||||||
|
psrad xmm5,SCALEBITS
|
||||||
|
|
||||||
|
packssdw xmm2,xmm4 ; xmm2=CbE*-FIX(0.344)+CrE*FIX(0.285)
|
||||||
|
packssdw xmm3,xmm5 ; xmm3=CbO*-FIX(0.344)+CrO*FIX(0.285)
|
||||||
|
psubw xmm2,xmm6 ; xmm2=CbE*-FIX(0.344)+CrE*-FIX(0.714)=(G-Y)E
|
||||||
|
psubw xmm3,xmm7 ; xmm3=CbO*-FIX(0.344)+CrO*-FIX(0.714)=(G-Y)O
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [esi] ; xmm5=Y(0123456789ABCDEF)
|
||||||
|
|
||||||
|
pcmpeqw xmm4,xmm4
|
||||||
|
psrlw xmm4,BYTE_BIT ; xmm4={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
pand xmm4,xmm5 ; xmm4=Y(02468ACE)=YE
|
||||||
|
psrlw xmm5,BYTE_BIT ; xmm5=Y(13579BDF)=YO
|
||||||
|
|
||||||
|
paddw xmm0,xmm4 ; xmm0=((R-Y)E+YE)=RE=R(02468ACE)
|
||||||
|
paddw xmm1,xmm5 ; xmm1=((R-Y)O+YO)=RO=R(13579BDF)
|
||||||
|
packuswb xmm0,xmm0 ; xmm0=R(02468ACE********)
|
||||||
|
packuswb xmm1,xmm1 ; xmm1=R(13579BDF********)
|
||||||
|
|
||||||
|
paddw xmm2,xmm4 ; xmm2=((G-Y)E+YE)=GE=G(02468ACE)
|
||||||
|
paddw xmm3,xmm5 ; xmm3=((G-Y)O+YO)=GO=G(13579BDF)
|
||||||
|
packuswb xmm2,xmm2 ; xmm2=G(02468ACE********)
|
||||||
|
packuswb xmm3,xmm3 ; xmm3=G(13579BDF********)
|
||||||
|
|
||||||
|
paddw xmm4, XMMWORD [wk(0)] ; xmm4=(YE+(B-Y)E)=BE=B(02468ACE)
|
||||||
|
paddw xmm5, XMMWORD [wk(1)] ; xmm5=(YO+(B-Y)O)=BO=B(13579BDF)
|
||||||
|
packuswb xmm4,xmm4 ; xmm4=B(02468ACE********)
|
||||||
|
packuswb xmm5,xmm5 ; xmm5=B(13579BDF********)
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 ; ---------------
|
||||||
|
|
||||||
|
; xmmA=(00 02 04 06 08 0A 0C 0E **), xmmB=(01 03 05 07 09 0B 0D 0F **)
|
||||||
|
; xmmC=(10 12 14 16 18 1A 1C 1E **), xmmD=(11 13 15 17 19 1B 1D 1F **)
|
||||||
|
; xmmE=(20 22 24 26 28 2A 2C 2E **), xmmF=(21 23 25 27 29 2B 2D 2F **)
|
||||||
|
; xmmG=(** ** ** ** ** ** ** ** **), xmmH=(** ** ** ** ** ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw xmmA,xmmC ; xmmA=(00 10 02 12 04 14 06 16 08 18 0A 1A 0C 1C 0E 1E)
|
||||||
|
punpcklbw xmmE,xmmB ; xmmE=(20 01 22 03 24 05 26 07 28 09 2A 0B 2C 0D 2E 0F)
|
||||||
|
punpcklbw xmmD,xmmF ; xmmD=(11 21 13 23 15 25 17 27 19 29 1B 2B 1D 2D 1F 2F)
|
||||||
|
|
||||||
|
movdqa xmmG,xmmA
|
||||||
|
movdqa xmmH,xmmA
|
||||||
|
punpcklwd xmmA,xmmE ; xmmA=(00 10 20 01 02 12 22 03 04 14 24 05 06 16 26 07)
|
||||||
|
punpckhwd xmmG,xmmE ; xmmG=(08 18 28 09 0A 1A 2A 0B 0C 1C 2C 0D 0E 1E 2E 0F)
|
||||||
|
|
||||||
|
psrldq xmmH,2 ; xmmH=(02 12 04 14 06 16 08 18 0A 1A 0C 1C 0E 1E -- --)
|
||||||
|
psrldq xmmE,2 ; xmmE=(22 03 24 05 26 07 28 09 2A 0B 2C 0D 2E 0F -- --)
|
||||||
|
|
||||||
|
movdqa xmmC,xmmD
|
||||||
|
movdqa xmmB,xmmD
|
||||||
|
punpcklwd xmmD,xmmH ; xmmD=(11 21 02 12 13 23 04 14 15 25 06 16 17 27 08 18)
|
||||||
|
punpckhwd xmmC,xmmH ; xmmC=(19 29 0A 1A 1B 2B 0C 1C 1D 2D 0E 1E 1F 2F -- --)
|
||||||
|
|
||||||
|
psrldq xmmB,2 ; xmmB=(13 23 15 25 17 27 19 29 1B 2B 1D 2D 1F 2F -- --)
|
||||||
|
|
||||||
|
movdqa xmmF,xmmE
|
||||||
|
punpcklwd xmmE,xmmB ; xmmE=(22 03 13 23 24 05 15 25 26 07 17 27 28 09 19 29)
|
||||||
|
punpckhwd xmmF,xmmB ; xmmF=(2A 0B 1B 2B 2C 0D 1D 2D 2E 0F 1F 2F -- -- -- --)
|
||||||
|
|
||||||
|
pshufd xmmH,xmmA,0x4E; xmmH=(04 14 24 05 06 16 26 07 00 10 20 01 02 12 22 03)
|
||||||
|
movdqa xmmB,xmmE
|
||||||
|
punpckldq xmmA,xmmD ; xmmA=(00 10 20 01 11 21 02 12 02 12 22 03 13 23 04 14)
|
||||||
|
punpckldq xmmE,xmmH ; xmmE=(22 03 13 23 04 14 24 05 24 05 15 25 06 16 26 07)
|
||||||
|
punpckhdq xmmD,xmmB ; xmmD=(15 25 06 16 26 07 17 27 17 27 08 18 28 09 19 29)
|
||||||
|
|
||||||
|
pshufd xmmH,xmmG,0x4E; xmmH=(0C 1C 2C 0D 0E 1E 2E 0F 08 18 28 09 0A 1A 2A 0B)
|
||||||
|
movdqa xmmB,xmmF
|
||||||
|
punpckldq xmmG,xmmC ; xmmG=(08 18 28 09 19 29 0A 1A 0A 1A 2A 0B 1B 2B 0C 1C)
|
||||||
|
punpckldq xmmF,xmmH ; xmmF=(2A 0B 1B 2B 0C 1C 2C 0D 2C 0D 1D 2D 0E 1E 2E 0F)
|
||||||
|
punpckhdq xmmC,xmmB ; xmmC=(1D 2D 0E 1E 2E 0F 1F 2F 1F 2F -- -- -- -- -- --)
|
||||||
|
|
||||||
|
punpcklqdq xmmA,xmmE ; xmmA=(00 10 20 01 11 21 02 12 22 03 13 23 04 14 24 05)
|
||||||
|
punpcklqdq xmmD,xmmG ; xmmD=(15 25 06 16 26 07 17 27 08 18 28 09 19 29 0A 1A)
|
||||||
|
punpcklqdq xmmF,xmmC ; xmmF=(2A 0B 1B 2B 0C 1C 2C 0D 1D 2D 0E 1E 2E 0F 1F 2F)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jb short .column_st32
|
||||||
|
|
||||||
|
test edi, SIZEOF_XMMWORD-1
|
||||||
|
jnz short .out1
|
||||||
|
; --(aligned)-------------------
|
||||||
|
movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
|
||||||
|
movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
|
||||||
|
movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmF
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
|
||||||
|
jmp short .out0
|
||||||
|
.out1: ; --(unaligned)-----------------
|
||||||
|
pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
|
||||||
|
maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmF,xmmH ; movntdqu XMMWORD [edi], xmmF
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
.out0:
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD
|
||||||
|
jz near .nextrow
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_XMMWORD ; inptr0
|
||||||
|
add ebx, byte SIZEOF_XMMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_XMMWORD ; inptr2
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st32:
|
||||||
|
pcmpeqb xmmH,xmmH ; xmmH=(all 1's)
|
||||||
|
lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE
|
||||||
|
cmp ecx, byte 2*SIZEOF_XMMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmD,xmmH ; movntdqu XMMWORD [edi], xmmD
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
movdqa xmmA,xmmF
|
||||||
|
sub ecx, byte 2*SIZEOF_XMMWORD
|
||||||
|
jmp short .column_st15
|
||||||
|
.column_st16:
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jb short .column_st15
|
||||||
|
maskmovdqu xmmA,xmmH ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
movdqa xmmA,xmmD
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD
|
||||||
|
.column_st15:
|
||||||
|
mov eax,ecx
|
||||||
|
xor ecx, byte 0x0F
|
||||||
|
shl ecx, 2
|
||||||
|
movd xmmB,ecx
|
||||||
|
psrlq xmmH,4
|
||||||
|
pcmpeqb xmmE,xmmE
|
||||||
|
psrlq xmmH,xmmB
|
||||||
|
psrlq xmmE,xmmB
|
||||||
|
punpcklbw xmmE,xmmH
|
||||||
|
; ----------------
|
||||||
|
mov ecx,edi
|
||||||
|
and ecx, byte SIZEOF_XMMWORD-1
|
||||||
|
jz short .adj0
|
||||||
|
add eax,ecx
|
||||||
|
cmp eax, byte SIZEOF_XMMWORD
|
||||||
|
ja short .adj0
|
||||||
|
and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
|
||||||
|
shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
|
||||||
|
movdqa xmmG,xmmA
|
||||||
|
movdqa xmmC,xmmE
|
||||||
|
pslldq xmmA, SIZEOF_XMMWORD/2
|
||||||
|
pslldq xmmE, SIZEOF_XMMWORD/2
|
||||||
|
movd xmmD,ecx
|
||||||
|
sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
|
||||||
|
jb short .adj1
|
||||||
|
movd xmmF,ecx
|
||||||
|
psllq xmmA,xmmF
|
||||||
|
psllq xmmE,xmmF
|
||||||
|
jmp short .adj0
|
||||||
|
.adj1: neg ecx
|
||||||
|
movd xmmF,ecx
|
||||||
|
psrlq xmmA,xmmF
|
||||||
|
psrlq xmmE,xmmF
|
||||||
|
psllq xmmG,xmmD
|
||||||
|
psllq xmmC,xmmD
|
||||||
|
por xmmA,xmmG
|
||||||
|
por xmmE,xmmC
|
||||||
|
.adj0: ; ----------------
|
||||||
|
maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
|
||||||
|
%else ; RGB_PIXELSIZE == 4 ; -----------
|
||||||
|
|
||||||
|
%ifdef RGBX_FILLER_0XFF
|
||||||
|
pcmpeqb xmm6,xmm6 ; xmm6=XE=X(02468ACE********)
|
||||||
|
pcmpeqb xmm7,xmm7 ; xmm7=XO=X(13579BDF********)
|
||||||
|
%else
|
||||||
|
pxor xmm6,xmm6 ; xmm6=XE=X(02468ACE********)
|
||||||
|
pxor xmm7,xmm7 ; xmm7=XO=X(13579BDF********)
|
||||||
|
%endif
|
||||||
|
; xmmA=(00 02 04 06 08 0A 0C 0E **), xmmB=(01 03 05 07 09 0B 0D 0F **)
|
||||||
|
; xmmC=(10 12 14 16 18 1A 1C 1E **), xmmD=(11 13 15 17 19 1B 1D 1F **)
|
||||||
|
; xmmE=(20 22 24 26 28 2A 2C 2E **), xmmF=(21 23 25 27 29 2B 2D 2F **)
|
||||||
|
; xmmG=(30 32 34 36 38 3A 3C 3E **), xmmH=(31 33 35 37 39 3B 3D 3F **)
|
||||||
|
|
||||||
|
punpcklbw xmmA,xmmC ; xmmA=(00 10 02 12 04 14 06 16 08 18 0A 1A 0C 1C 0E 1E)
|
||||||
|
punpcklbw xmmE,xmmG ; xmmE=(20 30 22 32 24 34 26 36 28 38 2A 3A 2C 3C 2E 3E)
|
||||||
|
punpcklbw xmmB,xmmD ; xmmB=(01 11 03 13 05 15 07 17 09 19 0B 1B 0D 1D 0F 1F)
|
||||||
|
punpcklbw xmmF,xmmH ; xmmF=(21 31 23 33 25 35 27 37 29 39 2B 3B 2D 3D 2F 3F)
|
||||||
|
|
||||||
|
movdqa xmmC,xmmA
|
||||||
|
punpcklwd xmmA,xmmE ; xmmA=(00 10 20 30 02 12 22 32 04 14 24 34 06 16 26 36)
|
||||||
|
punpckhwd xmmC,xmmE ; xmmC=(08 18 28 38 0A 1A 2A 3A 0C 1C 2C 3C 0E 1E 2E 3E)
|
||||||
|
movdqa xmmG,xmmB
|
||||||
|
punpcklwd xmmB,xmmF ; xmmB=(01 11 21 31 03 13 23 33 05 15 25 35 07 17 27 37)
|
||||||
|
punpckhwd xmmG,xmmF ; xmmG=(09 19 29 39 0B 1B 2B 3B 0D 1D 2D 3D 0F 1F 2F 3F)
|
||||||
|
|
||||||
|
movdqa xmmD,xmmA
|
||||||
|
punpckldq xmmA,xmmB ; xmmA=(00 10 20 30 01 11 21 31 02 12 22 32 03 13 23 33)
|
||||||
|
punpckhdq xmmD,xmmB ; xmmD=(04 14 24 34 05 15 25 35 06 16 26 36 07 17 27 37)
|
||||||
|
movdqa xmmH,xmmC
|
||||||
|
punpckldq xmmC,xmmG ; xmmC=(08 18 28 38 09 19 29 39 0A 1A 2A 3A 0B 1B 2B 3B)
|
||||||
|
punpckhdq xmmH,xmmG ; xmmH=(0C 1C 2C 3C 0D 1D 2D 3D 0E 1E 2E 3E 0F 1F 2F 3F)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD
|
||||||
|
jb short .column_st32
|
||||||
|
|
||||||
|
test edi, SIZEOF_XMMWORD-1
|
||||||
|
jnz short .out1
|
||||||
|
; --(aligned)-------------------
|
||||||
|
movntdq XMMWORD [edi+0*SIZEOF_XMMWORD], xmmA
|
||||||
|
movntdq XMMWORD [edi+1*SIZEOF_XMMWORD], xmmD
|
||||||
|
movntdq XMMWORD [edi+2*SIZEOF_XMMWORD], xmmC
|
||||||
|
movntdq XMMWORD [edi+3*SIZEOF_XMMWORD], xmmH
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_XMMWORD ; outptr
|
||||||
|
jmp short .out0
|
||||||
|
.out1: ; --(unaligned)-----------------
|
||||||
|
pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
|
||||||
|
maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmC,xmmE ; movntdqu XMMWORD [edi], xmmC
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmH,xmmE ; movntdqu XMMWORD [edi], xmmH
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
.out0:
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD
|
||||||
|
jz near .nextrow
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_XMMWORD ; inptr0
|
||||||
|
add ebx, byte SIZEOF_XMMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_XMMWORD ; inptr2
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st32:
|
||||||
|
pcmpeqb xmmE,xmmE ; xmmE=(all 1's)
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD/2
|
||||||
|
jb short .column_st16
|
||||||
|
maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
maskmovdqu xmmD,xmmE ; movntdqu XMMWORD [edi], xmmD
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
movdqa xmmA,xmmC
|
||||||
|
movdqa xmmD,xmmH
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD/2
|
||||||
|
.column_st16:
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD/4
|
||||||
|
jb short .column_st15
|
||||||
|
maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
add edi, byte SIZEOF_XMMWORD ; outptr
|
||||||
|
movdqa xmmA,xmmD
|
||||||
|
sub ecx, byte SIZEOF_XMMWORD/4
|
||||||
|
.column_st15:
|
||||||
|
cmp ecx, byte SIZEOF_XMMWORD/16
|
||||||
|
jb short .nextrow
|
||||||
|
mov eax,ecx
|
||||||
|
xor ecx, byte 0x03
|
||||||
|
inc ecx
|
||||||
|
shl ecx, 4
|
||||||
|
movd xmmF,ecx
|
||||||
|
psrlq xmmE,xmmF
|
||||||
|
punpcklbw xmmE,xmmE
|
||||||
|
; ----------------
|
||||||
|
mov ecx,edi
|
||||||
|
and ecx, byte SIZEOF_XMMWORD-1
|
||||||
|
jz short .adj0
|
||||||
|
lea eax, [ecx+eax*4] ; RGB_PIXELSIZE
|
||||||
|
cmp eax, byte SIZEOF_XMMWORD
|
||||||
|
ja short .adj0
|
||||||
|
and edi, byte (-SIZEOF_XMMWORD) ; align to 16-byte boundary
|
||||||
|
shl ecx, 3 ; pslldq xmmA,ecx & pslldq xmmE,ecx
|
||||||
|
movdqa xmmB,xmmA
|
||||||
|
movdqa xmmG,xmmE
|
||||||
|
pslldq xmmA, SIZEOF_XMMWORD/2
|
||||||
|
pslldq xmmE, SIZEOF_XMMWORD/2
|
||||||
|
movd xmmC,ecx
|
||||||
|
sub ecx, byte (SIZEOF_XMMWORD/2)*BYTE_BIT
|
||||||
|
jb short .adj1
|
||||||
|
movd xmmH,ecx
|
||||||
|
psllq xmmA,xmmH
|
||||||
|
psllq xmmE,xmmH
|
||||||
|
jmp short .adj0
|
||||||
|
.adj1: neg ecx
|
||||||
|
movd xmmH,ecx
|
||||||
|
psrlq xmmA,xmmH
|
||||||
|
psrlq xmmE,xmmH
|
||||||
|
psllq xmmB,xmmC
|
||||||
|
psllq xmmG,xmmC
|
||||||
|
por xmmA,xmmB
|
||||||
|
por xmmE,xmmG
|
||||||
|
.adj0: ; ----------------
|
||||||
|
maskmovdqu xmmA,xmmE ; movntdqu XMMWORD [edi], xmmA
|
||||||
|
|
||||||
|
%endif ; RGB_PIXELSIZE ; ---------------
|
||||||
|
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop ecx
|
||||||
|
pop esi
|
||||||
|
pop ebx
|
||||||
|
pop edx
|
||||||
|
pop edi
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW
|
||||||
|
add ebx, byte SIZEOF_JSAMPROW
|
||||||
|
add edx, byte SIZEOF_JSAMPROW
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_buf
|
||||||
|
dec eax ; num_rows
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
sfence ; flush the write buffer
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JDCOLOR_YCCRGB_SSE2_SUPPORTED
|
||||||
|
%endif ; RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
216
jdct.h
216
jdct.h
@@ -5,6 +5,13 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 5, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This include file contains common declarations for the forward and
|
* This include file contains common declarations for the forward and
|
||||||
* inverse DCT modules. These declarations are private to the DCT managers
|
* inverse DCT modules. These declarations are private to the DCT managers
|
||||||
* (jcdctmgr.c, jddctmgr.c) and the individual DCT algorithms.
|
* (jcdctmgr.c, jddctmgr.c) and the individual DCT algorithms.
|
||||||
@@ -13,6 +20,13 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
/* SIMD Ext: configuration check */
|
||||||
|
|
||||||
|
#if BITS_IN_JSAMPLE != 8
|
||||||
|
#error "Sorry, this SIMD code only copes with 8-bit sample values."
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* A forward DCT routine is given a pointer to a work area of type DCTELEM[];
|
* A forward DCT routine is given a pointer to a work area of type DCTELEM[];
|
||||||
* the DCT is to be performed in-place in that buffer. Type DCTELEM is int
|
* the DCT is to be performed in-place in that buffer. Type DCTELEM is int
|
||||||
@@ -26,14 +40,25 @@
|
|||||||
* Quantization of the output coefficients is done by jcdctmgr.c.
|
* Quantization of the output coefficients is done by jcdctmgr.c.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#if BITS_IN_JSAMPLE == 8
|
/* SIMD Ext: To maximize parallelism, Type DCTELEM is changed to short
|
||||||
typedef int DCTELEM; /* 16 or 32 bits is fine */
|
* (originally, int).
|
||||||
#else
|
*/
|
||||||
typedef INT32 DCTELEM; /* must have 32 bits */
|
typedef short DCTELEM; /* SIMD Ext: must be short */
|
||||||
#endif
|
|
||||||
|
|
||||||
typedef JMETHOD(void, forward_DCT_method_ptr, (DCTELEM * data));
|
typedef JMETHOD(void, forward_DCT_method_ptr, (DCTELEM * data));
|
||||||
typedef JMETHOD(void, float_DCT_method_ptr, (FAST_FLOAT * data));
|
typedef JMETHOD(void, float_DCT_method_ptr, (FAST_FLOAT * data));
|
||||||
|
typedef JMETHOD(void, convsamp_int_method_ptr,
|
||||||
|
(JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
DCTELEM * workspace));
|
||||||
|
typedef JMETHOD(void, convsamp_float_method_ptr,
|
||||||
|
(JSAMPARRAY sample_data, JDIMENSION start_col,
|
||||||
|
FAST_FLOAT *workspace));
|
||||||
|
typedef JMETHOD(void, quantize_int_method_ptr,
|
||||||
|
(JCOEFPTR coef_block, DCTELEM * divisors,
|
||||||
|
DCTELEM * workspace));
|
||||||
|
typedef JMETHOD(void, quantize_float_method_ptr,
|
||||||
|
(JCOEFPTR coef_block, FAST_FLOAT * divisors,
|
||||||
|
FAST_FLOAT * workspace));
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@@ -49,19 +74,22 @@ typedef JMETHOD(void, float_DCT_method_ptr, (FAST_FLOAT * data));
|
|||||||
|
|
||||||
/* typedef inverse_DCT_method_ptr is declared in jpegint.h */
|
/* typedef inverse_DCT_method_ptr is declared in jpegint.h */
|
||||||
|
|
||||||
|
/* SIMD Ext: To maximize parallelism, Type MULTIPLIER is changed to short.
|
||||||
|
* Macro definitions of MULTIPLIER and FAST_FLOAT in jmorecfg.h are ignored.
|
||||||
|
*/
|
||||||
|
#undef MULTIPLIER
|
||||||
|
#define MULTIPLIER short /* SIMD Ext: must be short */
|
||||||
|
#undef FAST_FLOAT
|
||||||
|
#define FAST_FLOAT float /* SIMD Ext: must be float */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Each IDCT routine has its own ideas about the best dct_table element type.
|
* Each IDCT routine has its own ideas about the best dct_table element type.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
typedef MULTIPLIER ISLOW_MULT_TYPE; /* short or int, whichever is faster */
|
typedef MULTIPLIER ISLOW_MULT_TYPE; /* SIMD Ext: must be short */
|
||||||
#if BITS_IN_JSAMPLE == 8
|
typedef MULTIPLIER IFAST_MULT_TYPE; /* SIMD Ext: must be short */
|
||||||
typedef MULTIPLIER IFAST_MULT_TYPE; /* 16 bits is OK, use short if faster */
|
|
||||||
#define IFAST_SCALE_BITS 2 /* fractional bits in scale factors */
|
#define IFAST_SCALE_BITS 2 /* fractional bits in scale factors */
|
||||||
#else
|
typedef FAST_FLOAT FLOAT_MULT_TYPE; /* SIMD Ext: must be float */
|
||||||
typedef INT32 IFAST_MULT_TYPE; /* need 32 bits for scaled quantizers */
|
|
||||||
#define IFAST_SCALE_BITS 13 /* fractional bits in scale factors */
|
|
||||||
#endif
|
|
||||||
typedef FAST_FLOAT FLOAT_MULT_TYPE; /* preferred floating type */
|
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@@ -81,15 +109,64 @@ typedef FAST_FLOAT FLOAT_MULT_TYPE; /* preferred floating type */
|
|||||||
/* Short forms of external names for systems with brain-damaged linkers. */
|
/* Short forms of external names for systems with brain-damaged linkers. */
|
||||||
|
|
||||||
#ifdef NEED_SHORT_EXTERNAL_NAMES
|
#ifdef NEED_SHORT_EXTERNAL_NAMES
|
||||||
#define jpeg_fdct_islow jFDislow
|
#define jpeg_fdct_islow jFDislow /* jfdctint.asm */
|
||||||
#define jpeg_fdct_ifast jFDifast
|
#define jpeg_fdct_ifast jFDifast /* jfdctfst.asm */
|
||||||
#define jpeg_fdct_float jFDfloat
|
#define jpeg_fdct_float jFDfloat /* jfdctflt.asm */
|
||||||
#define jpeg_idct_islow jRDislow
|
#define jpeg_fdct_islow_mmx jFDMislow /* jfmmxint.asm */
|
||||||
#define jpeg_idct_ifast jRDifast
|
#define jpeg_fdct_ifast_mmx jFDMifast /* jfmmxfst.asm */
|
||||||
#define jpeg_idct_float jRDfloat
|
#define jpeg_fdct_float_3dnow jFD3float /* jf3dnflt.asm */
|
||||||
#define jpeg_idct_4x4 jRD4x4
|
#define jpeg_fdct_islow_sse2 jFDSislow /* jfss2int.asm */
|
||||||
#define jpeg_idct_2x2 jRD2x2
|
#define jpeg_fdct_ifast_sse2 jFDSifast /* jfss2fst.asm */
|
||||||
#define jpeg_idct_1x1 jRD1x1
|
#define jpeg_fdct_float_sse jFDSfloat /* jfsseflt.asm */
|
||||||
|
#define jpeg_convsamp_int jCnvInt /* jcqntint.asm */
|
||||||
|
#define jpeg_quantize_int jQntInt /* jcqntint.asm */
|
||||||
|
#define jpeg_quantize_idiv jQntIDiv /* jcqntint.asm */
|
||||||
|
#define jpeg_convsamp_float jCnvFloat /* jcqntflt.asm */
|
||||||
|
#define jpeg_quantize_float jQntFloat /* jcqntflt.asm */
|
||||||
|
#define jpeg_convsamp_int_mmx jCnvMmx /* jcqntmmx.asm */
|
||||||
|
#define jpeg_quantize_int_mmx jQntMmx /* jcqntmmx.asm */
|
||||||
|
#define jpeg_convsamp_flt_3dnow jCnv3dnow /* jcqnt3dn.asm */
|
||||||
|
#define jpeg_quantize_flt_3dnow jQnt3dnow /* jcqnt3dn.asm */
|
||||||
|
#define jpeg_convsamp_int_sse2 jCnvISse2 /* jcqnts2i.asm */
|
||||||
|
#define jpeg_quantize_int_sse2 jQntISse2 /* jcqnts2i.asm */
|
||||||
|
#define jpeg_convsamp_flt_sse jCnvSse /* jcqntsse.asm */
|
||||||
|
#define jpeg_quantize_flt_sse jQntSse /* jcqntsse.asm */
|
||||||
|
#define jpeg_convsamp_flt_sse2 jCnvFSse2 /* jcqnts2f.asm */
|
||||||
|
#define jpeg_quantize_flt_sse2 jQntFSse2 /* jcqnts2f.asm */
|
||||||
|
#define jpeg_idct_islow jRDislow /* jidctint.asm */
|
||||||
|
#define jpeg_idct_ifast jRDifast /* jidctfst.asm */
|
||||||
|
#define jpeg_idct_float jRDfloat /* jidctflt.asm */
|
||||||
|
#define jpeg_idct_4x4 jRD4x4 /* jidctred.asm */
|
||||||
|
#define jpeg_idct_2x2 jRD2x2 /* jidctred.asm */
|
||||||
|
#define jpeg_idct_1x1 jRD1x1 /* jidctred.asm */
|
||||||
|
#define jpeg_idct_islow_mmx jRDMislow /* jimmxint.asm */
|
||||||
|
#define jpeg_idct_ifast_mmx jRDMifast /* jimmxfst.asm */
|
||||||
|
#define jpeg_idct_float_3dnow jRD3float /* ji3dnflt.asm */
|
||||||
|
#define jpeg_idct_4x4_mmx jRDM4x4 /* jimmxred.asm */
|
||||||
|
#define jpeg_idct_2x2_mmx jRDM2x2 /* jimmxred.asm */
|
||||||
|
#define jpeg_idct_islow_sse2 jRDSislow /* jiss2int.asm */
|
||||||
|
#define jpeg_idct_ifast_sse2 jRDSifast /* jiss2fst.asm */
|
||||||
|
#define jpeg_idct_float_sse jRDSfloat /* jisseflt.asm */
|
||||||
|
#define jpeg_idct_float_sse2 jRD2float /* jiss2flt.asm */
|
||||||
|
#define jpeg_idct_4x4_sse2 jRDS4x4 /* jiss2red.asm */
|
||||||
|
#define jpeg_idct_2x2_sse2 jRDS2x2 /* jiss2red.asm */
|
||||||
|
#define jconst_fdct_float jFCfloat /* jfdctflt.asm */
|
||||||
|
#define jconst_fdct_islow_mmx jFCMislow /* jfmmxint.asm */
|
||||||
|
#define jconst_fdct_ifast_mmx jFCMifast /* jfmmxfst.asm */
|
||||||
|
#define jconst_fdct_float_3dnow jFC3float /* jf3dnflt.asm */
|
||||||
|
#define jconst_fdct_islow_sse2 jFCSislow /* jfss2int.asm */
|
||||||
|
#define jconst_fdct_ifast_sse2 jFCSifast /* jfss2fst.asm */
|
||||||
|
#define jconst_fdct_float_sse jFCSfloat /* jfsseflt.asm */
|
||||||
|
#define jconst_idct_float jRCfloat /* jidctflt.asm */
|
||||||
|
#define jconst_idct_islow_mmx jRCMislow /* jimmxint.asm */
|
||||||
|
#define jconst_idct_ifast_mmx jRCMifast /* jimmxfst.asm */
|
||||||
|
#define jconst_idct_float_3dnow jRC3float /* ji3dnflt.asm */
|
||||||
|
#define jconst_idct_red_mmx jRCMred /* jimmxred.asm */
|
||||||
|
#define jconst_idct_islow_sse2 jRCSislow /* jiss2int.asm */
|
||||||
|
#define jconst_idct_ifast_sse2 jRCSifast /* jiss2fst.asm */
|
||||||
|
#define jconst_idct_float_sse jRCSfloat /* jisseflt.asm */
|
||||||
|
#define jconst_idct_float_sse2 jRC2float /* jiss2flt.asm */
|
||||||
|
#define jconst_idct_red_sse2 jRCSred /* jiss2red.asm */
|
||||||
#endif /* NEED_SHORT_EXTERNAL_NAMES */
|
#endif /* NEED_SHORT_EXTERNAL_NAMES */
|
||||||
|
|
||||||
/* Extern declarations for the forward and inverse DCT routines. */
|
/* Extern declarations for the forward and inverse DCT routines. */
|
||||||
@@ -98,6 +175,47 @@ EXTERN(void) jpeg_fdct_islow JPP((DCTELEM * data));
|
|||||||
EXTERN(void) jpeg_fdct_ifast JPP((DCTELEM * data));
|
EXTERN(void) jpeg_fdct_ifast JPP((DCTELEM * data));
|
||||||
EXTERN(void) jpeg_fdct_float JPP((FAST_FLOAT * data));
|
EXTERN(void) jpeg_fdct_float JPP((FAST_FLOAT * data));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_fdct_islow_mmx JPP((DCTELEM * data));
|
||||||
|
EXTERN(void) jpeg_fdct_ifast_mmx JPP((DCTELEM * data));
|
||||||
|
EXTERN(void) jpeg_fdct_float_3dnow JPP((FAST_FLOAT * data));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_fdct_islow_sse2 JPP((DCTELEM * data));
|
||||||
|
EXTERN(void) jpeg_fdct_ifast_sse2 JPP((DCTELEM * data));
|
||||||
|
EXTERN(void) jpeg_fdct_float_sse JPP((FAST_FLOAT * data));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_convsamp_int
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_int
|
||||||
|
JPP((JCOEFPTR coef_block, DCTELEM * divisors, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_idiv
|
||||||
|
JPP((JCOEFPTR coef_block, DCTELEM * divisors, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_convsamp_float
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_float
|
||||||
|
JPP((JCOEFPTR coef_block, FAST_FLOAT * divisors, FAST_FLOAT * workspace));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_convsamp_int_mmx
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_int_mmx
|
||||||
|
JPP((JCOEFPTR coef_block, DCTELEM * divisors, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_convsamp_flt_3dnow
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_flt_3dnow
|
||||||
|
JPP((JCOEFPTR coef_block, FAST_FLOAT * divisors, FAST_FLOAT * workspace));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_convsamp_int_sse2
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_int_sse2
|
||||||
|
JPP((JCOEFPTR coef_block, DCTELEM * divisors, DCTELEM * workspace));
|
||||||
|
EXTERN(void) jpeg_convsamp_flt_sse
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_flt_sse
|
||||||
|
JPP((JCOEFPTR coef_block, FAST_FLOAT * divisors, FAST_FLOAT * workspace));
|
||||||
|
EXTERN(void) jpeg_convsamp_flt_sse2
|
||||||
|
JPP((JSAMPARRAY sample_data, JDIMENSION start_col, FAST_FLOAT *workspace));
|
||||||
|
EXTERN(void) jpeg_quantize_flt_sse2
|
||||||
|
JPP((JCOEFPTR coef_block, FAST_FLOAT * divisors, FAST_FLOAT * workspace));
|
||||||
|
|
||||||
EXTERN(void) jpeg_idct_islow
|
EXTERN(void) jpeg_idct_islow
|
||||||
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
@@ -117,6 +235,60 @@ EXTERN(void) jpeg_idct_1x1
|
|||||||
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_idct_islow_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_ifast_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_4x4_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_2x2_mmx
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_idct_float_3dnow
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_float_sse
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_float_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
|
||||||
|
EXTERN(void) jpeg_idct_islow_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_ifast_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_4x4_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
EXTERN(void) jpeg_idct_2x2_sse2
|
||||||
|
JPP((j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JCOEFPTR coef_block, JSAMPARRAY output_buf, JDIMENSION output_col));
|
||||||
|
|
||||||
|
extern const int jconst_fdct_float[];
|
||||||
|
extern const int jconst_fdct_islow_mmx[];
|
||||||
|
extern const int jconst_fdct_ifast_mmx[];
|
||||||
|
extern const int jconst_fdct_float_3dnow[];
|
||||||
|
extern const int jconst_fdct_islow_sse2[];
|
||||||
|
extern const int jconst_fdct_ifast_sse2[];
|
||||||
|
extern const int jconst_fdct_float_sse[];
|
||||||
|
extern const int jconst_idct_float[];
|
||||||
|
extern const int jconst_idct_islow_mmx[];
|
||||||
|
extern const int jconst_idct_ifast_mmx[];
|
||||||
|
extern const int jconst_idct_float_3dnow[];
|
||||||
|
extern const int jconst_idct_red_mmx[];
|
||||||
|
extern const int jconst_idct_islow_sse2[];
|
||||||
|
extern const int jconst_idct_ifast_sse2[];
|
||||||
|
extern const int jconst_idct_float_sse[];
|
||||||
|
extern const int jconst_idct_float_sse2[];
|
||||||
|
extern const int jconst_idct_red_sse2[];
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Macros for handling fixed-point arithmetic; these are used by many
|
* Macros for handling fixed-point arithmetic; these are used by many
|
||||||
|
|||||||
125
jdct.inc
Normal file
125
jdct.inc
Normal file
@@ -0,0 +1,125 @@
|
|||||||
|
;
|
||||||
|
; jdct.inc - private declarations for forward & reverse DCT subsystems
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; Last Modified : January 5, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
; ---- jdct.h --------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; configuration check: BITS_IN_JSAMPLE==8 (8-bit sample values) is the only
|
||||||
|
; valid setting on this SIMD extension.
|
||||||
|
;
|
||||||
|
%if BITS_IN_JSAMPLE != 8
|
||||||
|
%error "Sorry, this SIMD code only copes with 8-bit sample values."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; A forward DCT routine is given a pointer to a work area of type DCTELEM[];
|
||||||
|
; the DCT is to be performed in-place in that buffer.
|
||||||
|
; To maximize parallelism, Type DCTELEM is changed to short (originally, int).
|
||||||
|
;
|
||||||
|
%define DCTELEM word ; short
|
||||||
|
%define SIZEOF_DCTELEM SIZEOF_WORD ; sizeof(DCTELEM)
|
||||||
|
|
||||||
|
; To maximize parallelism, Type MULTIPLIER is changed to short.
|
||||||
|
;
|
||||||
|
%define MULTIPLIER word ; short
|
||||||
|
%define SIZEOF_MULTIPLIER SIZEOF_WORD ; sizeof(MULTIPLIER)
|
||||||
|
%define FAST_FLOAT FP32 ; float
|
||||||
|
%define SIZEOF_FAST_FLOAT SIZEOF_FP32 ; sizeof(FAST_FLOAT)
|
||||||
|
|
||||||
|
; Each IDCT routine has its own ideas about the best dct_table element type.
|
||||||
|
;
|
||||||
|
%define ISLOW_MULT_TYPE MULTIPLIER ; must be short
|
||||||
|
%define SIZEOF_ISLOW_MULT_TYPE SIZEOF_MULTIPLIER ; sizeof(ISLOW_MULT_TYPE)
|
||||||
|
%define IFAST_MULT_TYPE MULTIPLIER ; must be short
|
||||||
|
%define SIZEOF_IFAST_MULT_TYPE SIZEOF_MULTIPLIER ; sizeof(IFAST_MULT_TYPE)
|
||||||
|
%define IFAST_SCALE_BITS 2 ; fractional bits in scale factors
|
||||||
|
%define FLOAT_MULT_TYPE FAST_FLOAT ; must be float
|
||||||
|
%define SIZEOF_FLOAT_MULT_TYPE SIZEOF_FAST_FLOAT ; sizeof(FLOAT_MULT_TYPE)
|
||||||
|
|
||||||
|
; Each IDCT routine is responsible for range-limiting its results and
|
||||||
|
; converting them to unsigned form (0..MAXJSAMPLE). The raw outputs could
|
||||||
|
; be quite far out of range if the input data is corrupt, so a bulletproof
|
||||||
|
; range-limiting step is required. We use a mask-and-table-lookup method
|
||||||
|
; to do the combined operations quickly.
|
||||||
|
;
|
||||||
|
%define RANGE_MASK (MAXJSAMPLE * 4 + 3) ; 2 bits wider than legal samples
|
||||||
|
|
||||||
|
; Short forms of external names for systems with brain-damaged linkers.
|
||||||
|
;
|
||||||
|
%ifdef NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
%define jpeg_fdct_islow jFDislow ; jfdctint.asm
|
||||||
|
%define jpeg_fdct_ifast jFDifast ; jfdctfst.asm
|
||||||
|
%define jpeg_fdct_float jFDfloat ; jfdctflt.asm
|
||||||
|
%define jpeg_fdct_islow_mmx jFDMislow ; jfmmxint.asm
|
||||||
|
%define jpeg_fdct_ifast_mmx jFDMifast ; jfmmxfst.asm
|
||||||
|
%define jpeg_fdct_float_3dnow jFD3float ; jf3dnflt.asm
|
||||||
|
%define jpeg_fdct_islow_sse2 jFDSislow ; jfss2int.asm
|
||||||
|
%define jpeg_fdct_ifast_sse2 jFDSifast ; jfss2fst.asm
|
||||||
|
%define jpeg_fdct_float_sse jFDSfloat ; jfsseflt.asm
|
||||||
|
%define jpeg_convsamp_int jCnvInt ; jcqntint.asm
|
||||||
|
%define jpeg_quantize_int jQntInt ; jcqntint.asm
|
||||||
|
%define jpeg_quantize_idiv jQntIDiv ; jcqntint.asm
|
||||||
|
%define jpeg_convsamp_float jCnvFloat ; jcqntflt.asm
|
||||||
|
%define jpeg_quantize_float jQntFloat ; jcqntflt.asm
|
||||||
|
%define jpeg_convsamp_int_mmx jCnvMmx ; jcqntmmx.asm
|
||||||
|
%define jpeg_quantize_int_mmx jQntMmx ; jcqntmmx.asm
|
||||||
|
%define jpeg_convsamp_flt_3dnow jCnv3dnow ; jcqnt3dn.asm
|
||||||
|
%define jpeg_quantize_flt_3dnow jQnt3dnow ; jcqnt3dn.asm
|
||||||
|
%define jpeg_convsamp_int_sse2 jCnvISse2 ; jcqnts2i.asm
|
||||||
|
%define jpeg_quantize_int_sse2 jQntISse2 ; jcqnts2i.asm
|
||||||
|
%define jpeg_convsamp_flt_sse jCnvSse ; jcqntsse.asm
|
||||||
|
%define jpeg_quantize_flt_sse jQntSse ; jcqntsse.asm
|
||||||
|
%define jpeg_convsamp_flt_sse2 jCnvFSse2 ; jcqnts2f.asm
|
||||||
|
%define jpeg_quantize_flt_sse2 jQntFSse2 ; jcqnts2f.asm
|
||||||
|
%define jpeg_idct_islow jRDislow ; jidctint.asm
|
||||||
|
%define jpeg_idct_ifast jRDifast ; jidctfst.asm
|
||||||
|
%define jpeg_idct_float jRDfloat ; jidctflt.asm
|
||||||
|
%define jpeg_idct_4x4 jRD4x4 ; jidctred.asm
|
||||||
|
%define jpeg_idct_2x2 jRD2x2 ; jidctred.asm
|
||||||
|
%define jpeg_idct_1x1 jRD1x1 ; jidctred.asm
|
||||||
|
%define jpeg_idct_islow_mmx jRDMislow ; jimmxint.asm
|
||||||
|
%define jpeg_idct_ifast_mmx jRDMifast ; jimmxfst.asm
|
||||||
|
%define jpeg_idct_float_3dnow jRD3float ; ji3dnflt.asm
|
||||||
|
%define jpeg_idct_4x4_mmx jRDM4x4 ; jimmxred.asm
|
||||||
|
%define jpeg_idct_2x2_mmx jRDM2x2 ; jimmxred.asm
|
||||||
|
%define jpeg_idct_islow_sse2 jRDSislow ; jiss2int.asm
|
||||||
|
%define jpeg_idct_ifast_sse2 jRDSifast ; jiss2fst.asm
|
||||||
|
%define jpeg_idct_float_sse jRDSfloat ; jisseflt.asm
|
||||||
|
%define jpeg_idct_float_sse2 jRD2float ; jiss2flt.asm
|
||||||
|
%define jpeg_idct_4x4_sse2 jRDS4x4 ; jiss2red.asm
|
||||||
|
%define jpeg_idct_2x2_sse2 jRDS2x2 ; jiss2red.asm
|
||||||
|
%define jconst_fdct_float jFCfloat ; jfdctflt.asm
|
||||||
|
%define jconst_fdct_islow_mmx jFCMislow ; jfmmxint.asm
|
||||||
|
%define jconst_fdct_ifast_mmx jFCMifast ; jfmmxfst.asm
|
||||||
|
%define jconst_fdct_float_3dnow jFC3float ; jf3dnflt.asm
|
||||||
|
%define jconst_fdct_islow_sse2 jFCSislow ; jfss2int.asm
|
||||||
|
%define jconst_fdct_ifast_sse2 jFCSifast ; jfss2fst.asm
|
||||||
|
%define jconst_fdct_float_sse jFCSfloat ; jfsseflt.asm
|
||||||
|
%define jconst_idct_float jRCfloat ; jidctflt.asm
|
||||||
|
%define jconst_idct_islow_mmx jRCMislow ; jimmxint.asm
|
||||||
|
%define jconst_idct_ifast_mmx jRCMifast ; jimmxfst.asm
|
||||||
|
%define jconst_idct_float_3dnow jRC3float ; ji3dnflt.asm
|
||||||
|
%define jconst_idct_red_mmx jRCMred ; jimmxred.asm
|
||||||
|
%define jconst_idct_islow_sse2 jRCSislow ; jiss2int.asm
|
||||||
|
%define jconst_idct_ifast_sse2 jRCSifast ; jiss2fst.asm
|
||||||
|
%define jconst_idct_float_sse jRCSfloat ; jisseflt.asm
|
||||||
|
%define jconst_idct_float_sse2 jRC2float ; jiss2flt.asm
|
||||||
|
%define jconst_idct_red_sse2 jRCSred ; jiss2red.asm
|
||||||
|
%endif ; NEED_SHORT_EXTERNAL_NAMES
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define ROW(n,b,s) ((b)+(n)*(s))
|
||||||
|
%define COL(n,b,s) ((b)+(n)*(s)*DCTSIZE)
|
||||||
|
|
||||||
|
%define DWBLOCK(m,n,b,s) ((b)+(m)*DCTSIZE*(s)+(n)*SIZEOF_DWORD)
|
||||||
|
%define MMBLOCK(m,n,b,s) ((b)+(m)*DCTSIZE*(s)+(n)*SIZEOF_MMWORD)
|
||||||
|
%define XMMBLOCK(m,n,b,s) ((b)+(m)*DCTSIZE*(s)+(n)*SIZEOF_XMMWORD)
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
152
jddctmgr.c
152
jddctmgr.c
@@ -5,6 +5,13 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : December 24, 2005
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains the inverse-DCT management logic.
|
* This file contains the inverse-DCT management logic.
|
||||||
* This code selects a particular IDCT implementation to be used,
|
* This code selects a particular IDCT implementation to be used,
|
||||||
* and it performs related housekeeping chores. No code in this file
|
* and it performs related housekeeping chores. No code in this file
|
||||||
@@ -94,6 +101,7 @@ start_pass (j_decompress_ptr cinfo)
|
|||||||
int method = 0;
|
int method = 0;
|
||||||
inverse_DCT_method_ptr method_ptr = NULL;
|
inverse_DCT_method_ptr method_ptr = NULL;
|
||||||
JQUANT_TBL * qtbl;
|
JQUANT_TBL * qtbl;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
|
for (ci = 0, compptr = cinfo->comp_info; ci < cinfo->num_components;
|
||||||
ci++, compptr++) {
|
ci++, compptr++) {
|
||||||
@@ -105,34 +113,95 @@ start_pass (j_decompress_ptr cinfo)
|
|||||||
method = JDCT_ISLOW; /* jidctred uses islow-style table */
|
method = JDCT_ISLOW; /* jidctred uses islow-style table */
|
||||||
break;
|
break;
|
||||||
case 2:
|
case 2:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_red_sse2))
|
||||||
|
method_ptr = jpeg_idct_2x2_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
method_ptr = jpeg_idct_2x2_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
method_ptr = jpeg_idct_2x2;
|
method_ptr = jpeg_idct_2x2;
|
||||||
method = JDCT_ISLOW; /* jidctred uses islow-style table */
|
method = JDCT_ISLOW; /* jidctred uses islow-style table */
|
||||||
break;
|
break;
|
||||||
case 4:
|
case 4:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_red_sse2))
|
||||||
|
method_ptr = jpeg_idct_4x4_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
method_ptr = jpeg_idct_4x4_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
method_ptr = jpeg_idct_4x4;
|
method_ptr = jpeg_idct_4x4;
|
||||||
method = JDCT_ISLOW; /* jidctred uses islow-style table */
|
method = JDCT_ISLOW; /* jidctred uses islow-style table */
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif /* IDCT_SCALING_SUPPORTED */
|
||||||
case DCTSIZE:
|
case DCTSIZE:
|
||||||
switch (cinfo->dct_method) {
|
switch (cinfo->dct_method) {
|
||||||
#ifdef DCT_ISLOW_SUPPORTED
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
case JDCT_ISLOW:
|
case JDCT_ISLOW:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_islow_sse2))
|
||||||
|
method_ptr = jpeg_idct_islow_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
method_ptr = jpeg_idct_islow_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
method_ptr = jpeg_idct_islow;
|
method_ptr = jpeg_idct_islow;
|
||||||
method = JDCT_ISLOW;
|
method = JDCT_ISLOW;
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif /* DCT_ISLOW_SUPPORTED */
|
||||||
#ifdef DCT_IFAST_SUPPORTED
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
case JDCT_IFAST:
|
case JDCT_IFAST:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_ifast_sse2))
|
||||||
|
method_ptr = jpeg_idct_ifast_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
method_ptr = jpeg_idct_ifast_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
method_ptr = jpeg_idct_ifast;
|
method_ptr = jpeg_idct_ifast;
|
||||||
method = JDCT_IFAST;
|
method = JDCT_IFAST;
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif /* DCT_IFAST_SUPPORTED */
|
||||||
#ifdef DCT_FLOAT_SUPPORTED
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
case JDCT_FLOAT:
|
case JDCT_FLOAT:
|
||||||
|
#ifdef JIDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE && simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_float_sse2))
|
||||||
|
method_ptr = jpeg_idct_float_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_float_sse))
|
||||||
|
method_ptr = jpeg_idct_float_sse;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_3DNOW)
|
||||||
|
method_ptr = jpeg_idct_float_3dnow;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
method_ptr = jpeg_idct_float;
|
method_ptr = jpeg_idct_float;
|
||||||
method = JDCT_FLOAT;
|
method = JDCT_FLOAT;
|
||||||
break;
|
break;
|
||||||
#endif
|
#endif /* DCT_FLOAT_SUPPORTED */
|
||||||
default:
|
default:
|
||||||
ERREXIT(cinfo, JERR_NOT_COMPILED);
|
ERREXIT(cinfo, JERR_NOT_COMPILED);
|
||||||
break;
|
break;
|
||||||
@@ -267,3 +336,78 @@ jinit_inverse_dct (j_decompress_ptr cinfo)
|
|||||||
idct->cur_method[ci] = -1;
|
idct->cur_method[ci] = -1;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_inverse_dct (j_decompress_ptr cinfo, int method)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
switch (method) {
|
||||||
|
#ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
case JDCT_ISLOW:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_islow_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* DCT_ISLOW_SUPPORTED */
|
||||||
|
#ifdef DCT_IFAST_SUPPORTED
|
||||||
|
case JDCT_IFAST:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_ifast_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* DCT_IFAST_SUPPORTED */
|
||||||
|
#ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
case JDCT_FLOAT:
|
||||||
|
#ifdef JIDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE && simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_float_sse2))
|
||||||
|
return JSIMD_SSE; /* (JSIMD_SSE | JSIMD_SSE2); */
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_float_sse))
|
||||||
|
return JSIMD_SSE; /* (JSIMD_SSE | JSIMD_MMX); */
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_3DNOW)
|
||||||
|
return JSIMD_3DNOW; /* (JSIMD_3DNOW | JSIMD_MMX); */
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* DCT_FLOAT_SUPPORTED */
|
||||||
|
#ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
case JDCT_FLOAT + 1:
|
||||||
|
#ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_idct_red_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
return JSIMD_NONE;
|
||||||
|
#endif /* IDCT_SCALING_SUPPORTED */
|
||||||
|
default:
|
||||||
|
;
|
||||||
|
}
|
||||||
|
|
||||||
|
return JSIMD_NONE; /* not compiled */
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|||||||
475
jdhuff.c
475
jdhuff.c
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* jdhuff.c
|
* jdhuff.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified to improve performance.
|
||||||
|
* Last Modified : October 31, 2004
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains Huffman entropy decoding routines.
|
* This file contains Huffman entropy decoding routines.
|
||||||
*
|
*
|
||||||
* Much of the complexity here has to do with supporting input suspension.
|
* Much of the complexity here has to do with supporting input suspension.
|
||||||
@@ -64,6 +71,15 @@ typedef struct {
|
|||||||
/* Pointers to derived tables (these workspaces have image lifespan) */
|
/* Pointers to derived tables (these workspaces have image lifespan) */
|
||||||
d_derived_tbl * dc_derived_tbls[NUM_HUFF_TBLS];
|
d_derived_tbl * dc_derived_tbls[NUM_HUFF_TBLS];
|
||||||
d_derived_tbl * ac_derived_tbls[NUM_HUFF_TBLS];
|
d_derived_tbl * ac_derived_tbls[NUM_HUFF_TBLS];
|
||||||
|
|
||||||
|
/* Precalculated info set up by start_pass for use in decode_mcu: */
|
||||||
|
|
||||||
|
/* Pointers to derived tables to be used for each block within an MCU */
|
||||||
|
d_derived_tbl * dc_cur_tbls[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
d_derived_tbl * ac_cur_tbls[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
/* Whether we care about the DC and AC coefficient values for each block */
|
||||||
|
boolean dc_needed[D_MAX_BLOCKS_IN_MCU];
|
||||||
|
boolean ac_needed[D_MAX_BLOCKS_IN_MCU];
|
||||||
} huff_entropy_decoder;
|
} huff_entropy_decoder;
|
||||||
|
|
||||||
typedef huff_entropy_decoder * huff_entropy_ptr;
|
typedef huff_entropy_decoder * huff_entropy_ptr;
|
||||||
@@ -77,7 +93,7 @@ METHODDEF(void)
|
|||||||
start_pass_huff_decoder (j_decompress_ptr cinfo)
|
start_pass_huff_decoder (j_decompress_ptr cinfo)
|
||||||
{
|
{
|
||||||
huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
|
huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
|
||||||
int ci, dctbl, actbl;
|
int ci, blkn, dctbl, actbl;
|
||||||
jpeg_component_info * compptr;
|
jpeg_component_info * compptr;
|
||||||
|
|
||||||
/* Check that the scan parameters Ss, Se, Ah/Al are OK for sequential JPEG.
|
/* Check that the scan parameters Ss, Se, Ah/Al are OK for sequential JPEG.
|
||||||
@@ -92,27 +108,37 @@ start_pass_huff_decoder (j_decompress_ptr cinfo)
|
|||||||
compptr = cinfo->cur_comp_info[ci];
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
dctbl = compptr->dc_tbl_no;
|
dctbl = compptr->dc_tbl_no;
|
||||||
actbl = compptr->ac_tbl_no;
|
actbl = compptr->ac_tbl_no;
|
||||||
/* Make sure requested tables are present */
|
|
||||||
if (dctbl < 0 || dctbl >= NUM_HUFF_TBLS ||
|
|
||||||
cinfo->dc_huff_tbl_ptrs[dctbl] == NULL)
|
|
||||||
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, dctbl);
|
|
||||||
if (actbl < 0 || actbl >= NUM_HUFF_TBLS ||
|
|
||||||
cinfo->ac_huff_tbl_ptrs[actbl] == NULL)
|
|
||||||
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, actbl);
|
|
||||||
/* Compute derived values for Huffman tables */
|
/* Compute derived values for Huffman tables */
|
||||||
/* We may do this more than once for a table, but it's not expensive */
|
/* We may do this more than once for a table, but it's not expensive */
|
||||||
jpeg_make_d_derived_tbl(cinfo, cinfo->dc_huff_tbl_ptrs[dctbl],
|
jpeg_make_d_derived_tbl(cinfo, TRUE, dctbl,
|
||||||
& entropy->dc_derived_tbls[dctbl]);
|
& entropy->dc_derived_tbls[dctbl]);
|
||||||
jpeg_make_d_derived_tbl(cinfo, cinfo->ac_huff_tbl_ptrs[actbl],
|
jpeg_make_d_derived_tbl(cinfo, FALSE, actbl,
|
||||||
& entropy->ac_derived_tbls[actbl]);
|
& entropy->ac_derived_tbls[actbl]);
|
||||||
/* Initialize DC predictions to 0 */
|
/* Initialize DC predictions to 0 */
|
||||||
entropy->saved.last_dc_val[ci] = 0;
|
entropy->saved.last_dc_val[ci] = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Precalculate decoding info for each block in an MCU of this scan */
|
||||||
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
|
ci = cinfo->MCU_membership[blkn];
|
||||||
|
compptr = cinfo->cur_comp_info[ci];
|
||||||
|
/* Precalculate which table to use for each block */
|
||||||
|
entropy->dc_cur_tbls[blkn] = entropy->dc_derived_tbls[compptr->dc_tbl_no];
|
||||||
|
entropy->ac_cur_tbls[blkn] = entropy->ac_derived_tbls[compptr->ac_tbl_no];
|
||||||
|
/* Decide whether we really care about the coefficient values */
|
||||||
|
if (compptr->component_needed) {
|
||||||
|
entropy->dc_needed[blkn] = TRUE;
|
||||||
|
/* we don't need the ACs if producing a 1/8th-size image */
|
||||||
|
entropy->ac_needed[blkn] = (compptr->DCT_scaled_size > 1);
|
||||||
|
} else {
|
||||||
|
entropy->dc_needed[blkn] = entropy->ac_needed[blkn] = FALSE;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/* Initialize bitread state variables */
|
/* Initialize bitread state variables */
|
||||||
entropy->bitstate.bits_left = 0;
|
entropy->bitstate.bits_left = 0;
|
||||||
entropy->bitstate.get_buffer = 0; /* unnecessary, but keeps Purify quiet */
|
entropy->bitstate.get_buffer = 0; /* unnecessary, but keeps Purify quiet */
|
||||||
entropy->bitstate.printed_eod = FALSE;
|
entropy->pub.insufficient_data = FALSE;
|
||||||
|
|
||||||
/* Initialize restart counter */
|
/* Initialize restart counter */
|
||||||
entropy->restarts_to_go = cinfo->restart_interval;
|
entropy->restarts_to_go = cinfo->restart_interval;
|
||||||
@@ -121,20 +147,35 @@ start_pass_huff_decoder (j_decompress_ptr cinfo)
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
* Compute the derived values for a Huffman table.
|
* Compute the derived values for a Huffman table.
|
||||||
|
* This routine also performs some validation checks on the table.
|
||||||
|
*
|
||||||
* Note this is also used by jdphuff.c.
|
* Note this is also used by jdphuff.c.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
GLOBAL(void)
|
GLOBAL(void)
|
||||||
jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, JHUFF_TBL * htbl,
|
jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, boolean isDC, int tblno,
|
||||||
d_derived_tbl ** pdtbl)
|
d_derived_tbl ** pdtbl)
|
||||||
{
|
{
|
||||||
|
JHUFF_TBL *htbl;
|
||||||
d_derived_tbl *dtbl;
|
d_derived_tbl *dtbl;
|
||||||
int p, i, l, si;
|
int p, i, l, la, lx, si, numsymbols;
|
||||||
int lookbits, ctr;
|
int lookbits, look_end, sym, val, ctr;
|
||||||
char huffsize[257];
|
char huffsize[257];
|
||||||
unsigned int huffcode[257];
|
unsigned int huffcode[257];
|
||||||
unsigned int code;
|
unsigned int code;
|
||||||
|
|
||||||
|
/* Note that huffsize[] and huffcode[] are filled in code-length order,
|
||||||
|
* paralleling the order of the symbols themselves in htbl->huffval[].
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* Find the input Huffman table */
|
||||||
|
if (tblno < 0 || tblno >= NUM_HUFF_TBLS)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tblno);
|
||||||
|
htbl =
|
||||||
|
isDC ? cinfo->dc_huff_tbl_ptrs[tblno] : cinfo->ac_huff_tbl_ptrs[tblno];
|
||||||
|
if (htbl == NULL)
|
||||||
|
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tblno);
|
||||||
|
|
||||||
/* Allocate a workspace if we haven't already done so. */
|
/* Allocate a workspace if we haven't already done so. */
|
||||||
if (*pdtbl == NULL)
|
if (*pdtbl == NULL)
|
||||||
*pdtbl = (d_derived_tbl *)
|
*pdtbl = (d_derived_tbl *)
|
||||||
@@ -144,17 +185,20 @@ jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
dtbl->pub = htbl; /* fill in back link */
|
dtbl->pub = htbl; /* fill in back link */
|
||||||
|
|
||||||
/* Figure C.1: make table of Huffman code length for each symbol */
|
/* Figure C.1: make table of Huffman code length for each symbol */
|
||||||
/* Note that this is in code-length order. */
|
|
||||||
|
|
||||||
p = 0;
|
p = 0;
|
||||||
for (l = 1; l <= 16; l++) {
|
for (l = 1; l <= 16; l++) {
|
||||||
for (i = 1; i <= (int) htbl->bits[l]; i++)
|
i = (int) htbl->bits[l];
|
||||||
|
if (i < 0 || p + i > 256) /* protect against table overrun */
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
|
while (i--)
|
||||||
huffsize[p++] = (char) l;
|
huffsize[p++] = (char) l;
|
||||||
}
|
}
|
||||||
huffsize[p] = 0;
|
huffsize[p] = 0;
|
||||||
|
numsymbols = p;
|
||||||
|
|
||||||
/* Figure C.2: generate the codes themselves */
|
/* Figure C.2: generate the codes themselves */
|
||||||
/* Note that this is in code-length order. */
|
/* We also validate that the counts represent a legal Huffman code tree. */
|
||||||
|
|
||||||
code = 0;
|
code = 0;
|
||||||
si = huffsize[0];
|
si = huffsize[0];
|
||||||
@@ -164,6 +208,11 @@ jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
huffcode[p++] = code;
|
huffcode[p++] = code;
|
||||||
code++;
|
code++;
|
||||||
}
|
}
|
||||||
|
/* code is now 1 more than the last code used for codelength si; but
|
||||||
|
* it must still fit in si bits, since no code is allowed to be all ones.
|
||||||
|
*/
|
||||||
|
if (((INT32) code) >= (((INT32) 1) << si))
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
code <<= 1;
|
code <<= 1;
|
||||||
si++;
|
si++;
|
||||||
}
|
}
|
||||||
@@ -173,8 +222,10 @@ jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
p = 0;
|
p = 0;
|
||||||
for (l = 1; l <= 16; l++) {
|
for (l = 1; l <= 16; l++) {
|
||||||
if (htbl->bits[l]) {
|
if (htbl->bits[l]) {
|
||||||
dtbl->valptr[l] = p; /* huffval[] index of 1st symbol of code length l */
|
/* valoffset[l] = huffval[] index of 1st symbol of code length l,
|
||||||
dtbl->mincode[l] = huffcode[p]; /* minimum code of length l */
|
* minus the minimum code of length l
|
||||||
|
*/
|
||||||
|
dtbl->valoffset[l] = (INT32) p - (INT32) huffcode[p];
|
||||||
p += htbl->bits[l];
|
p += htbl->bits[l];
|
||||||
dtbl->maxcode[l] = huffcode[p-1]; /* maximum code of length l */
|
dtbl->maxcode[l] = huffcode[p-1]; /* maximum code of length l */
|
||||||
} else {
|
} else {
|
||||||
@@ -190,21 +241,51 @@ jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
* with that code.
|
* with that code.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
MEMZERO(dtbl->look_nbits, SIZEOF(dtbl->look_nbits));
|
MEMZERO(dtbl->lookx_nbits, SIZEOF(dtbl->lookx_nbits));
|
||||||
|
|
||||||
p = 0;
|
p = 0;
|
||||||
for (l = 1; l <= HUFF_LOOKAHEAD; l++) {
|
for (l = 1; l <= HUFFX_LOOKAHEAD-1; l++) {
|
||||||
for (i = 1; i <= (int) htbl->bits[l]; i++, p++) {
|
for (i = 1; i <= (int) htbl->bits[l]; i++, p++) {
|
||||||
/* l = current code's length, p = its index in huffcode[] & huffval[]. */
|
/* l = current code's length, p = its index in huffcode[] & huffval[]. */
|
||||||
/* Generate left-justified code followed by all possible bit sequences */
|
/* Generate left-justified code followed by all possible bit sequences */
|
||||||
lookbits = huffcode[p] << (HUFF_LOOKAHEAD-l);
|
sym = htbl->huffval[p]; /* current symbol */
|
||||||
for (ctr = 1 << (HUFF_LOOKAHEAD-l); ctr > 0; ctr--) {
|
la = sym & 15; /* length of additional bits field */
|
||||||
dtbl->look_nbits[lookbits] = l;
|
lx = HUFFX_LOOKAHEAD - l;
|
||||||
dtbl->look_sym[lookbits] = htbl->huffval[p];
|
lookbits = huffcode[p] << lx;
|
||||||
|
look_end = lookbits + (1 << lx);
|
||||||
|
lx -= la;
|
||||||
|
while (lookbits < look_end) {
|
||||||
|
if (lx >= 0) {
|
||||||
|
val = (lookbits >> lx) & ((1 << la) - 1);
|
||||||
|
ctr = 1 << lx;
|
||||||
|
} else {
|
||||||
|
val = (lookbits << -lx) & ((1 << la) - 1);
|
||||||
|
ctr = 1;
|
||||||
|
}
|
||||||
|
val = HUFF_EXTEND(val, la);
|
||||||
|
for (; ctr > 0; ctr--) {
|
||||||
|
dtbl->lookx_nbits[lookbits] = l + la;
|
||||||
|
dtbl->lookx_val[lookbits] = val;
|
||||||
|
dtbl->lookx_sym[lookbits] = sym;
|
||||||
lookbits++;
|
lookbits++;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Validate symbols as being reasonable.
|
||||||
|
* For AC tables, we make no check, but accept all byte values 0..255.
|
||||||
|
* For DC tables, we require the symbols to be in range 0..15.
|
||||||
|
* (Tighter bounds could be applied depending on the data depth and mode,
|
||||||
|
* but this is sufficient to ensure safe decoding.)
|
||||||
|
*/
|
||||||
|
if (isDC) {
|
||||||
|
for (i = 0; i < numsymbols; i++) {
|
||||||
|
int sym = htbl->huffval[i];
|
||||||
|
if (sym < 0 || sym > 15)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -213,23 +294,8 @@ jpeg_make_d_derived_tbl (j_decompress_ptr cinfo, JHUFF_TBL * htbl,
|
|||||||
* See jdhuff.h for info about usage.
|
* See jdhuff.h for info about usage.
|
||||||
* Note: current values of get_buffer and bits_left are passed as parameters,
|
* Note: current values of get_buffer and bits_left are passed as parameters,
|
||||||
* but are returned in the corresponding fields of the state struct.
|
* but are returned in the corresponding fields of the state struct.
|
||||||
*
|
|
||||||
* On most machines MIN_GET_BITS should be 25 to allow the full 32-bit width
|
|
||||||
* of get_buffer to be used. (On machines with wider words, an even larger
|
|
||||||
* buffer could be used.) However, on some machines 32-bit shifts are
|
|
||||||
* quite slow and take time proportional to the number of places shifted.
|
|
||||||
* (This is true with most PC compilers, for instance.) In this case it may
|
|
||||||
* be a win to set MIN_GET_BITS to the minimum value of 15. This reduces the
|
|
||||||
* average shift distance at the cost of more calls to jpeg_fill_bit_buffer.
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#ifdef SLOW_SHIFT_32
|
|
||||||
#define MIN_GET_BITS 15 /* minimum allowable value */
|
|
||||||
#else
|
|
||||||
#define MIN_GET_BITS (BIT_BUF_SIZE-7)
|
|
||||||
#endif
|
|
||||||
|
|
||||||
|
|
||||||
GLOBAL(boolean)
|
GLOBAL(boolean)
|
||||||
jpeg_fill_bit_buffer (bitread_working_state * state,
|
jpeg_fill_bit_buffer (bitread_working_state * state,
|
||||||
register bit_buf_type get_buffer, register int bits_left,
|
register bit_buf_type get_buffer, register int bits_left,
|
||||||
@@ -239,33 +305,39 @@ jpeg_fill_bit_buffer (bitread_working_state * state,
|
|||||||
/* Copy heavily used state fields into locals (hopefully registers) */
|
/* Copy heavily used state fields into locals (hopefully registers) */
|
||||||
register const JOCTET * next_input_byte = state->next_input_byte;
|
register const JOCTET * next_input_byte = state->next_input_byte;
|
||||||
register size_t bytes_in_buffer = state->bytes_in_buffer;
|
register size_t bytes_in_buffer = state->bytes_in_buffer;
|
||||||
register int c;
|
j_decompress_ptr cinfo = state->cinfo;
|
||||||
|
|
||||||
/* Attempt to load at least MIN_GET_BITS bits into get_buffer. */
|
/* Attempt to load at least MIN_GET_BITS bits into get_buffer. */
|
||||||
/* (It is assumed that no request will be for more than that many bits.) */
|
/* (It is assumed that no request will be for more than that many bits.) */
|
||||||
|
/* We fail to do so only if we hit a marker or are forced to suspend. */
|
||||||
|
|
||||||
|
if (cinfo->unread_marker == 0) { /* cannot advance past a marker */
|
||||||
while (bits_left < MIN_GET_BITS) {
|
while (bits_left < MIN_GET_BITS) {
|
||||||
/* Attempt to read a byte */
|
register int c;
|
||||||
if (state->unread_marker != 0)
|
|
||||||
goto no_more_data; /* can't advance past a marker */
|
|
||||||
|
|
||||||
|
/* Attempt to read a byte */
|
||||||
if (bytes_in_buffer == 0) {
|
if (bytes_in_buffer == 0) {
|
||||||
if (! (*state->cinfo->src->fill_input_buffer) (state->cinfo))
|
if (! (*cinfo->src->fill_input_buffer) (cinfo))
|
||||||
return FALSE;
|
return FALSE;
|
||||||
next_input_byte = state->cinfo->src->next_input_byte;
|
next_input_byte = cinfo->src->next_input_byte;
|
||||||
bytes_in_buffer = state->cinfo->src->bytes_in_buffer;
|
bytes_in_buffer = cinfo->src->bytes_in_buffer;
|
||||||
}
|
}
|
||||||
bytes_in_buffer--;
|
bytes_in_buffer--;
|
||||||
c = GETJOCTET(*next_input_byte++);
|
c = GETJOCTET(*next_input_byte++);
|
||||||
|
|
||||||
/* If it's 0xFF, check and discard stuffed zero byte */
|
/* If it's 0xFF, check and discard stuffed zero byte */
|
||||||
if (c == 0xFF) {
|
if (c == 0xFF) {
|
||||||
|
/* Loop here to discard any padding FF's on terminating marker,
|
||||||
|
* so that we can save a valid unread_marker value. NOTE: we will
|
||||||
|
* accept multiple FF's followed by a 0 as meaning a single FF data
|
||||||
|
* byte. This data pattern is not valid according to the standard.
|
||||||
|
*/
|
||||||
do {
|
do {
|
||||||
if (bytes_in_buffer == 0) {
|
if (bytes_in_buffer == 0) {
|
||||||
if (! (*state->cinfo->src->fill_input_buffer) (state->cinfo))
|
if (! (*cinfo->src->fill_input_buffer) (cinfo))
|
||||||
return FALSE;
|
return FALSE;
|
||||||
next_input_byte = state->cinfo->src->next_input_byte;
|
next_input_byte = cinfo->src->next_input_byte;
|
||||||
bytes_in_buffer = state->cinfo->src->bytes_in_buffer;
|
bytes_in_buffer = cinfo->src->bytes_in_buffer;
|
||||||
}
|
}
|
||||||
bytes_in_buffer--;
|
bytes_in_buffer--;
|
||||||
c = GETJOCTET(*next_input_byte++);
|
c = GETJOCTET(*next_input_byte++);
|
||||||
@@ -275,32 +347,44 @@ jpeg_fill_bit_buffer (bitread_working_state * state,
|
|||||||
/* Found FF/00, which represents an FF data byte */
|
/* Found FF/00, which represents an FF data byte */
|
||||||
c = 0xFF;
|
c = 0xFF;
|
||||||
} else {
|
} else {
|
||||||
/* Oops, it's actually a marker indicating end of compressed data. */
|
/* Oops, it's actually a marker indicating end of compressed data.
|
||||||
/* Better put it back for use later */
|
* Save the marker code for later use.
|
||||||
state->unread_marker = c;
|
* Fine point: it might appear that we should save the marker into
|
||||||
|
* bitread working state, not straight into permanent state. But
|
||||||
no_more_data:
|
* once we have hit a marker, we cannot need to suspend within the
|
||||||
/* There should be enough bits still left in the data segment; */
|
* current MCU, because we will read no more bytes from the data
|
||||||
/* if so, just break out of the outer while loop. */
|
* source. So it is OK to update permanent state right away.
|
||||||
if (bits_left >= nbits)
|
|
||||||
break;
|
|
||||||
/* Uh-oh. Report corrupted data to user and stuff zeroes into
|
|
||||||
* the data stream, so that we can produce some kind of image.
|
|
||||||
* Note that this code will be repeated for each byte demanded
|
|
||||||
* for the rest of the segment. We use a nonvolatile flag to ensure
|
|
||||||
* that only one warning message appears.
|
|
||||||
*/
|
*/
|
||||||
if (! *(state->printed_eod_ptr)) {
|
cinfo->unread_marker = c;
|
||||||
WARNMS(state->cinfo, JWRN_HIT_MARKER);
|
/* See if we need to insert some fake zero bits. */
|
||||||
*(state->printed_eod_ptr) = TRUE;
|
goto no_more_bytes;
|
||||||
}
|
|
||||||
c = 0; /* insert a zero byte into bit buffer */
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* OK, load c into get_buffer */
|
/* OK, load c into get_buffer */
|
||||||
get_buffer = (get_buffer << 8) | c;
|
get_buffer = (get_buffer << 8) | c;
|
||||||
bits_left += 8;
|
bits_left += 8;
|
||||||
|
} /* end while */
|
||||||
|
} else {
|
||||||
|
no_more_bytes:
|
||||||
|
/* We get here if we've read the marker that terminates the compressed
|
||||||
|
* data segment. There should be enough bits in the buffer register
|
||||||
|
* to satisfy the request; if so, no problem.
|
||||||
|
*/
|
||||||
|
if (nbits > bits_left) {
|
||||||
|
/* Uh-oh. Report corrupted data to user and stuff zeroes into
|
||||||
|
* the data stream, so that we can produce some kind of image.
|
||||||
|
* We use a nonvolatile flag to ensure that only one warning message
|
||||||
|
* appears per data segment.
|
||||||
|
*/
|
||||||
|
if (! cinfo->entropy->insufficient_data) {
|
||||||
|
WARNMS(cinfo, JWRN_HIT_MARKER);
|
||||||
|
cinfo->entropy->insufficient_data = TRUE;
|
||||||
|
}
|
||||||
|
/* Fill the buffer with zero bits */
|
||||||
|
get_buffer <<= MIN_GET_BITS - bits_left;
|
||||||
|
bits_left = MIN_GET_BITS;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Unload the local registers */
|
/* Unload the local registers */
|
||||||
@@ -353,37 +437,10 @@ jpeg_huff_decode (bitread_working_state * state,
|
|||||||
return 0; /* fake a zero as the safest result */
|
return 0; /* fake a zero as the safest result */
|
||||||
}
|
}
|
||||||
|
|
||||||
return htbl->pub->huffval[ htbl->valptr[l] +
|
return htbl->pub->huffval[ (int) (code + htbl->valoffset[l]) ];
|
||||||
((int) (code - htbl->mincode[l])) ];
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Figure F.12: extend sign bit.
|
|
||||||
* On some machines, a shift and add will be faster than a table lookup.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#ifdef AVOID_TABLES
|
|
||||||
|
|
||||||
#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))
|
|
||||||
|
|
||||||
#else
|
|
||||||
|
|
||||||
#define HUFF_EXTEND(x,s) ((x) < extend_test[s] ? (x) + extend_offset[s] : (x))
|
|
||||||
|
|
||||||
static const int extend_test[16] = /* entry n is 2**(n-1) */
|
|
||||||
{ 0, 0x0001, 0x0002, 0x0004, 0x0008, 0x0010, 0x0020, 0x0040, 0x0080,
|
|
||||||
0x0100, 0x0200, 0x0400, 0x0800, 0x1000, 0x2000, 0x4000 };
|
|
||||||
|
|
||||||
static const int extend_offset[16] = /* entry n is (-1 << n) + 1 */
|
|
||||||
{ 0, ((-1)<<1) + 1, ((-1)<<2) + 1, ((-1)<<3) + 1, ((-1)<<4) + 1,
|
|
||||||
((-1)<<5) + 1, ((-1)<<6) + 1, ((-1)<<7) + 1, ((-1)<<8) + 1,
|
|
||||||
((-1)<<9) + 1, ((-1)<<10) + 1, ((-1)<<11) + 1, ((-1)<<12) + 1,
|
|
||||||
((-1)<<13) + 1, ((-1)<<14) + 1, ((-1)<<15) + 1 };
|
|
||||||
|
|
||||||
#endif /* AVOID_TABLES */
|
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Check for a restart marker & resynchronize decoder.
|
* Check for a restart marker & resynchronize decoder.
|
||||||
* Returns FALSE if must suspend.
|
* Returns FALSE if must suspend.
|
||||||
@@ -411,8 +468,13 @@ process_restart (j_decompress_ptr cinfo)
|
|||||||
/* Reset restart counter */
|
/* Reset restart counter */
|
||||||
entropy->restarts_to_go = cinfo->restart_interval;
|
entropy->restarts_to_go = cinfo->restart_interval;
|
||||||
|
|
||||||
/* Next segment can get another out-of-data warning */
|
/* Reset out-of-data flag, unless read_restart_marker left us smack up
|
||||||
entropy->bitstate.printed_eod = FALSE;
|
* against a marker. In that case we will end up treating the next data
|
||||||
|
* segment as empty, and we can avoid producing bogus output pixels by
|
||||||
|
* leaving the flag set.
|
||||||
|
*/
|
||||||
|
if (cinfo->unread_marker == 0)
|
||||||
|
entropy->pub.insufficient_data = FALSE;
|
||||||
|
|
||||||
return TRUE;
|
return TRUE;
|
||||||
}
|
}
|
||||||
@@ -437,14 +499,9 @@ METHODDEF(boolean)
|
|||||||
decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
||||||
{
|
{
|
||||||
huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
|
huff_entropy_ptr entropy = (huff_entropy_ptr) cinfo->entropy;
|
||||||
register int s, k, r;
|
int blkn;
|
||||||
int blkn, ci;
|
|
||||||
JBLOCKROW block;
|
|
||||||
BITREAD_STATE_VARS;
|
BITREAD_STATE_VARS;
|
||||||
savable_state state;
|
savable_state state;
|
||||||
d_derived_tbl * dctbl;
|
|
||||||
d_derived_tbl * actbl;
|
|
||||||
jpeg_component_info * compptr;
|
|
||||||
|
|
||||||
/* Process restart marker if needed; may have to suspend */
|
/* Process restart marker if needed; may have to suspend */
|
||||||
if (cinfo->restart_interval) {
|
if (cinfo->restart_interval) {
|
||||||
@@ -453,6 +510,11 @@ decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
return FALSE;
|
return FALSE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* If we've run out of data, just leave the MCU set to zeroes.
|
||||||
|
* This way, we return uniform gray for the remainder of the segment.
|
||||||
|
*/
|
||||||
|
if (! entropy->pub.insufficient_data) {
|
||||||
|
|
||||||
/* Load up working state */
|
/* Load up working state */
|
||||||
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
||||||
ASSIGN_STATE(state, entropy->saved);
|
ASSIGN_STATE(state, entropy->saved);
|
||||||
@@ -460,48 +522,140 @@ decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
/* Outer loop handles each block in the MCU */
|
/* Outer loop handles each block in the MCU */
|
||||||
|
|
||||||
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
block = MCU_data[blkn];
|
JBLOCKROW block = MCU_data[blkn];
|
||||||
ci = cinfo->MCU_membership[blkn];
|
d_derived_tbl * dctbl = entropy->dc_cur_tbls[blkn];
|
||||||
compptr = cinfo->cur_comp_info[ci];
|
d_derived_tbl * actbl = entropy->ac_cur_tbls[blkn];
|
||||||
dctbl = entropy->dc_derived_tbls[compptr->dc_tbl_no];
|
register int s, k, r;
|
||||||
actbl = entropy->ac_derived_tbls[compptr->ac_tbl_no];
|
|
||||||
|
|
||||||
/* Decode a single block's worth of coefficients */
|
/* Decode a single block's worth of coefficients */
|
||||||
|
|
||||||
/* Section F.2.2.1: decode the DC coefficient difference */
|
/* Section F.2.2.1: decode the DC coefficient difference */
|
||||||
HUFF_DECODE(s, br_state, dctbl, return FALSE, label1);
|
{ /* HUFFX_DECODE */
|
||||||
|
register int nb, look, t;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
register const JOCTET * next_input_byte = br_state.next_input_byte;
|
||||||
|
register size_t bytes_in_buffer = br_state.bytes_in_buffer;
|
||||||
|
if (cinfo->unread_marker == 0) {
|
||||||
|
while (bits_left < MIN_GET_BITS) {
|
||||||
|
register int c;
|
||||||
|
if (bytes_in_buffer == 0 ||
|
||||||
|
(c = GETJOCTET(*next_input_byte)) == 0xFF) {
|
||||||
|
goto label11; }
|
||||||
|
bytes_in_buffer--; next_input_byte++;
|
||||||
|
get_buffer = (get_buffer << 8) | c;
|
||||||
|
bits_left += 8;
|
||||||
|
}
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
} else {
|
||||||
|
label11:
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
if (! jpeg_fill_bit_buffer(&br_state,get_buffer,bits_left, 0)) {
|
||||||
|
return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
nb = 1; goto label1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
look = PEEK_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
if ((nb = dctbl->lookx_nbits[look]) != 0) {
|
||||||
|
s = dctbl->lookx_val[look];
|
||||||
|
if (nb <= HUFFX_LOOKAHEAD) {
|
||||||
|
DROP_BITS(nb);
|
||||||
|
} else {
|
||||||
|
DROP_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
nb -= HUFFX_LOOKAHEAD;
|
||||||
|
CHECK_BIT_BUFFER(br_state, nb, return FALSE);
|
||||||
|
s += GET_BITS(nb);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nb = HUFFX_LOOKAHEAD;
|
||||||
|
label1:
|
||||||
|
if ((s=jpeg_huff_decode(&br_state,get_buffer,bits_left,dctbl,nb))
|
||||||
|
< 0) { return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
if (s) {
|
if (s) {
|
||||||
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
||||||
r = GET_BITS(s);
|
t = GET_BITS(s);
|
||||||
s = HUFF_EXTEND(r, s);
|
s = HUFF_EXTEND(t, s);
|
||||||
}
|
}
|
||||||
|
}
|
||||||
/* Shortcut if component's values are not interesting */
|
}
|
||||||
if (! compptr->component_needed)
|
if (entropy->dc_needed[blkn]) {
|
||||||
goto skip_ACs;
|
|
||||||
|
|
||||||
/* Convert DC difference to actual value, update last_dc_val */
|
/* Convert DC difference to actual value, update last_dc_val */
|
||||||
|
int ci = cinfo->MCU_membership[blkn];
|
||||||
s += state.last_dc_val[ci];
|
s += state.last_dc_val[ci];
|
||||||
state.last_dc_val[ci] = s;
|
state.last_dc_val[ci] = s;
|
||||||
/* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */
|
/* Output the DC coefficient (assumes jpeg_natural_order[0] = 0) */
|
||||||
(*block)[0] = (JCOEF) s;
|
(*block)[0] = (JCOEF) s;
|
||||||
|
}
|
||||||
|
|
||||||
/* Do we need to decode the AC coefficients for this component? */
|
if (entropy->ac_needed[blkn]) {
|
||||||
if (compptr->DCT_scaled_size > 1) {
|
|
||||||
|
|
||||||
/* Section F.2.2.2: decode the AC coefficients */
|
/* Section F.2.2.2: decode the AC coefficients */
|
||||||
/* Since zeroes are skipped, output area must be cleared beforehand */
|
/* Since zeroes are skipped, output area must be cleared beforehand */
|
||||||
for (k = 1; k < DCTSIZE2; k++) {
|
for (k = 1; k < DCTSIZE2; k++) {
|
||||||
HUFF_DECODE(s, br_state, actbl, return FALSE, label2);
|
{ /* HUFFX_DECODE */
|
||||||
|
register int nb, look, t;
|
||||||
r = s >> 4;
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
s &= 15;
|
register const JOCTET * next_input_byte
|
||||||
|
= br_state.next_input_byte;
|
||||||
|
register size_t bytes_in_buffer = br_state.bytes_in_buffer;
|
||||||
|
if (cinfo->unread_marker == 0) {
|
||||||
|
while (bits_left < MIN_GET_BITS) {
|
||||||
|
register int c;
|
||||||
|
if (bytes_in_buffer == 0 ||
|
||||||
|
(c = GETJOCTET(*next_input_byte)) == 0xFF) {
|
||||||
|
goto label21; }
|
||||||
|
bytes_in_buffer--; next_input_byte++;
|
||||||
|
get_buffer = (get_buffer << 8) | c;
|
||||||
|
bits_left += 8;
|
||||||
|
}
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
} else {
|
||||||
|
label21:
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
if (! jpeg_fill_bit_buffer(&br_state,get_buffer,bits_left,0)) {
|
||||||
|
return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer;
|
||||||
|
bits_left = br_state.bits_left;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
nb = 1; goto label2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
look = PEEK_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
if ((nb = actbl->lookx_nbits[look]) != 0) {
|
||||||
|
s = actbl->lookx_val[look];
|
||||||
|
r = actbl->lookx_sym[look] >> 4;
|
||||||
|
if (nb <= HUFFX_LOOKAHEAD) {
|
||||||
|
DROP_BITS(nb);
|
||||||
|
} else {
|
||||||
|
DROP_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
nb -= HUFFX_LOOKAHEAD;
|
||||||
|
CHECK_BIT_BUFFER(br_state, nb, return FALSE);
|
||||||
|
s += GET_BITS(nb);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nb = HUFFX_LOOKAHEAD;
|
||||||
|
label2:
|
||||||
|
if ((s=jpeg_huff_decode(&br_state,get_buffer,bits_left,actbl,nb))
|
||||||
|
< 0) { return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
r = s >> 4; s &= 15;
|
||||||
|
if (s) {
|
||||||
|
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
||||||
|
t = GET_BITS(s);
|
||||||
|
s = HUFF_EXTEND(t, s);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
if (s) {
|
if (s) {
|
||||||
k += r;
|
k += r;
|
||||||
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
|
||||||
r = GET_BITS(s);
|
|
||||||
s = HUFF_EXTEND(r, s);
|
|
||||||
/* Output coefficient in natural (dezigzagged) order.
|
/* Output coefficient in natural (dezigzagged) order.
|
||||||
* Note: the extra entries in jpeg_natural_order[] will save us
|
* Note: the extra entries in jpeg_natural_order[] will save us
|
||||||
* if k >= DCTSIZE2, which could happen if the data is corrupted.
|
* if k >= DCTSIZE2, which could happen if the data is corrupted.
|
||||||
@@ -515,20 +669,68 @@ decode_mcu (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
}
|
}
|
||||||
|
|
||||||
} else {
|
} else {
|
||||||
skip_ACs:
|
|
||||||
|
|
||||||
/* Section F.2.2.2: decode the AC coefficients */
|
/* Section F.2.2.2: decode the AC coefficients */
|
||||||
/* In this path we just discard the values */
|
/* In this path we just discard the values */
|
||||||
for (k = 1; k < DCTSIZE2; k++) {
|
for (k = 1; k < DCTSIZE2; k++) {
|
||||||
HUFF_DECODE(s, br_state, actbl, return FALSE, label3);
|
{ /* HUFFX_DECODE */
|
||||||
|
register int nb, look;
|
||||||
r = s >> 4;
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
s &= 15;
|
register const JOCTET * next_input_byte
|
||||||
|
= br_state.next_input_byte;
|
||||||
|
register size_t bytes_in_buffer = br_state.bytes_in_buffer;
|
||||||
|
if (cinfo->unread_marker == 0) {
|
||||||
|
while (bits_left < MIN_GET_BITS) {
|
||||||
|
register int c;
|
||||||
|
if (bytes_in_buffer == 0 ||
|
||||||
|
(c = GETJOCTET(*next_input_byte)) == 0xFF) {
|
||||||
|
goto label31; }
|
||||||
|
bytes_in_buffer--; next_input_byte++;
|
||||||
|
get_buffer = (get_buffer << 8) | c;
|
||||||
|
bits_left += 8;
|
||||||
|
}
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
} else {
|
||||||
|
label31:
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
if (! jpeg_fill_bit_buffer(&br_state,get_buffer,bits_left,0)) {
|
||||||
|
return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer;
|
||||||
|
bits_left = br_state.bits_left;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
nb = 1; goto label3;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
look = PEEK_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
if ((nb = actbl->lookx_nbits[look]) != 0) {
|
||||||
|
s = actbl->lookx_sym[look];
|
||||||
|
r = s >> 4; s &= 15;
|
||||||
|
if (nb <= HUFFX_LOOKAHEAD) {
|
||||||
|
DROP_BITS(nb);
|
||||||
|
} else {
|
||||||
|
DROP_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
nb -= HUFFX_LOOKAHEAD;
|
||||||
|
CHECK_BIT_BUFFER(br_state, nb, return FALSE);
|
||||||
|
DROP_BITS(nb);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nb = HUFFX_LOOKAHEAD;
|
||||||
|
label3:
|
||||||
|
if ((s=jpeg_huff_decode(&br_state,get_buffer,bits_left,actbl,nb))
|
||||||
|
< 0) { return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
r = s >> 4; s &= 15;
|
||||||
if (s) {
|
if (s) {
|
||||||
k += r;
|
|
||||||
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
||||||
DROP_BITS(s);
|
DROP_BITS(s);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (s) {
|
||||||
|
k += r;
|
||||||
} else {
|
} else {
|
||||||
if (r != 15)
|
if (r != 15)
|
||||||
break;
|
break;
|
||||||
@@ -542,6 +744,7 @@ skip_ACs:
|
|||||||
/* Completed MCU, so update state */
|
/* Completed MCU, so update state */
|
||||||
BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
|
BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
|
||||||
ASSIGN_STATE(entropy->saved, state);
|
ASSIGN_STATE(entropy->saved, state);
|
||||||
|
}
|
||||||
|
|
||||||
/* Account for restart interval (no-op if not using restarts) */
|
/* Account for restart interval (no-op if not using restarts) */
|
||||||
entropy->restarts_to_go--;
|
entropy->restarts_to_go--;
|
||||||
|
|||||||
116
jdhuff.h
116
jdhuff.h
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* jdhuff.h
|
* jdhuff.h
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified to improve performance.
|
||||||
|
* Last Modified : October 31, 2004
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains declarations for Huffman entropy decoding routines
|
* This file contains declarations for Huffman entropy decoding routines
|
||||||
* that are shared between the sequential decoder (jdhuff.c) and the
|
* that are shared between the sequential decoder (jdhuff.c) and the
|
||||||
* progressive decoder (jdphuff.c). No other modules need to see these.
|
* progressive decoder (jdphuff.c). No other modules need to see these.
|
||||||
@@ -21,30 +28,36 @@
|
|||||||
|
|
||||||
/* Derived data constructed for each Huffman table */
|
/* Derived data constructed for each Huffman table */
|
||||||
|
|
||||||
#define HUFF_LOOKAHEAD 8 /* # of bits of lookahead */
|
#define HUFFX_LOOKAHEAD 9 /* # of bits of lookahead */
|
||||||
|
|
||||||
typedef struct {
|
typedef struct {
|
||||||
/* Basic tables: (element [0] of each array is unused) */
|
/* Basic tables: (element [0] of each array is unused) */
|
||||||
INT32 mincode[17]; /* smallest code of length k */
|
|
||||||
INT32 maxcode[18]; /* largest code of length k (-1 if none) */
|
INT32 maxcode[18]; /* largest code of length k (-1 if none) */
|
||||||
/* (maxcode[17] is a sentinel to ensure jpeg_huff_decode terminates) */
|
/* (maxcode[17] is a sentinel to ensure jpeg_huff_decode terminates) */
|
||||||
int valptr[17]; /* huffval[] index of 1st symbol of length k */
|
INT32 valoffset[17]; /* huffval[] offset for codes of length k */
|
||||||
|
/* valoffset[k] = huffval[] index of 1st symbol of code length k, less
|
||||||
|
* the smallest code of length k; so given a code of length k, the
|
||||||
|
* corresponding symbol is huffval[code + valoffset[k]]
|
||||||
|
*/
|
||||||
|
|
||||||
/* Link to public Huffman table (needed only in jpeg_huff_decode) */
|
/* Link to public Huffman table (needed only in jpeg_huff_decode) */
|
||||||
JHUFF_TBL *pub;
|
JHUFF_TBL *pub;
|
||||||
|
|
||||||
/* Lookahead tables: indexed by the next HUFF_LOOKAHEAD bits of
|
/* Lookahead tables: indexed by the next HUFFX_LOOKAHEAD bits of
|
||||||
* the input data stream. If the next Huffman code is no more
|
* the input data stream. If the next Huffman code is no more
|
||||||
* than HUFF_LOOKAHEAD bits long, we can obtain its length and
|
* than HUFFX_LOOKAHEAD-1 bits long, we can obtain its length,
|
||||||
* the corresponding symbol directly from these tables.
|
* the corresponding symbol, and the encoded coefficient value
|
||||||
|
* directly from these tables.
|
||||||
*/
|
*/
|
||||||
int look_nbits[1<<HUFF_LOOKAHEAD]; /* # bits, or 0 if too long */
|
UINT8 lookx_nbits[1<<HUFFX_LOOKAHEAD]; /* # bits, or 0 if too long */
|
||||||
UINT8 look_sym[1<<HUFF_LOOKAHEAD]; /* symbol, or unused */
|
INT16 lookx_val[1<<HUFFX_LOOKAHEAD]; /* coefficient value, or unused */
|
||||||
|
UINT8 lookx_sym[1<<HUFFX_LOOKAHEAD]; /* symbol, or unused */
|
||||||
} d_derived_tbl;
|
} d_derived_tbl;
|
||||||
|
|
||||||
/* Expand a Huffman table definition into the derived format */
|
/* Expand a Huffman table definition into the derived format */
|
||||||
EXTERN(void) jpeg_make_d_derived_tbl JPP((j_decompress_ptr cinfo,
|
EXTERN(void) jpeg_make_d_derived_tbl
|
||||||
JHUFF_TBL * htbl, d_derived_tbl ** pdtbl));
|
JPP((j_decompress_ptr cinfo, boolean isDC, int tblno,
|
||||||
|
d_derived_tbl ** pdtbl));
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
@@ -70,30 +83,43 @@ typedef INT32 bit_buf_type; /* type of bit-extraction buffer */
|
|||||||
|
|
||||||
/* If long is > 32 bits on your machine, and shifting/masking longs is
|
/* If long is > 32 bits on your machine, and shifting/masking longs is
|
||||||
* reasonably fast, making bit_buf_type be long and setting BIT_BUF_SIZE
|
* reasonably fast, making bit_buf_type be long and setting BIT_BUF_SIZE
|
||||||
* appropriately should be a win. Unfortunately we can't do this with
|
* appropriately should be a win. Unfortunately we can't define the size
|
||||||
* something like #define BIT_BUF_SIZE (sizeof(bit_buf_type)*8)
|
* with something like #define BIT_BUF_SIZE (sizeof(bit_buf_type)*8)
|
||||||
* because not all machines measure sizeof in 8-bit bytes.
|
* because not all machines measure sizeof in 8-bit bytes.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
#ifdef SLOW_SHIFT_32
|
||||||
|
#define MIN_GET_BITS 15 /* minimum allowable value */
|
||||||
|
#else
|
||||||
|
#define MIN_GET_BITS (BIT_BUF_SIZE-7)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* On most machines MIN_GET_BITS should be 25 to allow the full 32-bit width
|
||||||
|
* of get_buffer to be used. (On machines with wider words, an even larger
|
||||||
|
* buffer could be used.) However, on some machines 32-bit shifts are
|
||||||
|
* quite slow and take time proportional to the number of places shifted.
|
||||||
|
* (This is true with most PC compilers, for instance.) In this case it may
|
||||||
|
* be a win to set MIN_GET_BITS to the minimum value of 15. This reduces the
|
||||||
|
* average shift distance at the cost of more calls to jpeg_fill_bit_buffer.
|
||||||
|
*/
|
||||||
|
|
||||||
typedef struct { /* Bitreading state saved across MCUs */
|
typedef struct { /* Bitreading state saved across MCUs */
|
||||||
bit_buf_type get_buffer; /* current bit-extraction buffer */
|
bit_buf_type get_buffer; /* current bit-extraction buffer */
|
||||||
int bits_left; /* # of unused bits in it */
|
int bits_left; /* # of unused bits in it */
|
||||||
boolean printed_eod; /* flag to suppress multiple warning msgs */
|
|
||||||
} bitread_perm_state;
|
} bitread_perm_state;
|
||||||
|
|
||||||
typedef struct { /* Bitreading working state within an MCU */
|
typedef struct { /* Bitreading working state within an MCU */
|
||||||
/* current data source state */
|
/* Current data source location */
|
||||||
|
/* We need a copy, rather than munging the original, in case of suspension */
|
||||||
const JOCTET * next_input_byte; /* => next byte to read from source */
|
const JOCTET * next_input_byte; /* => next byte to read from source */
|
||||||
size_t bytes_in_buffer; /* # of bytes remaining in source buffer */
|
size_t bytes_in_buffer; /* # of bytes remaining in source buffer */
|
||||||
int unread_marker; /* nonzero if we have hit a marker */
|
/* Bit input buffer --- note these values are kept in register variables,
|
||||||
/* bit input buffer --- note these values are kept in register variables,
|
|
||||||
* not in this struct, inside the inner loops.
|
* not in this struct, inside the inner loops.
|
||||||
*/
|
*/
|
||||||
bit_buf_type get_buffer; /* current bit-extraction buffer */
|
bit_buf_type get_buffer; /* current bit-extraction buffer */
|
||||||
int bits_left; /* # of unused bits in it */
|
int bits_left; /* # of unused bits in it */
|
||||||
/* pointers needed by jpeg_fill_bit_buffer */
|
/* Pointer needed by jpeg_fill_bit_buffer. */
|
||||||
j_decompress_ptr cinfo; /* back link to decompress master record */
|
j_decompress_ptr cinfo; /* back link to decompress master record */
|
||||||
boolean * printed_eod_ptr; /* => flag in permanent state */
|
|
||||||
} bitread_working_state;
|
} bitread_working_state;
|
||||||
|
|
||||||
/* Macros to declare and load/save bitread local variables. */
|
/* Macros to declare and load/save bitread local variables. */
|
||||||
@@ -106,15 +132,12 @@ typedef struct { /* Bitreading working state within an MCU */
|
|||||||
br_state.cinfo = cinfop; \
|
br_state.cinfo = cinfop; \
|
||||||
br_state.next_input_byte = cinfop->src->next_input_byte; \
|
br_state.next_input_byte = cinfop->src->next_input_byte; \
|
||||||
br_state.bytes_in_buffer = cinfop->src->bytes_in_buffer; \
|
br_state.bytes_in_buffer = cinfop->src->bytes_in_buffer; \
|
||||||
br_state.unread_marker = cinfop->unread_marker; \
|
|
||||||
get_buffer = permstate.get_buffer; \
|
get_buffer = permstate.get_buffer; \
|
||||||
bits_left = permstate.bits_left; \
|
bits_left = permstate.bits_left
|
||||||
br_state.printed_eod_ptr = & permstate.printed_eod
|
|
||||||
|
|
||||||
#define BITREAD_SAVE_STATE(cinfop,permstate) \
|
#define BITREAD_SAVE_STATE(cinfop,permstate) \
|
||||||
cinfop->src->next_input_byte = br_state.next_input_byte; \
|
cinfop->src->next_input_byte = br_state.next_input_byte; \
|
||||||
cinfop->src->bytes_in_buffer = br_state.bytes_in_buffer; \
|
cinfop->src->bytes_in_buffer = br_state.bytes_in_buffer; \
|
||||||
cinfop->unread_marker = br_state.unread_marker; \
|
|
||||||
permstate.get_buffer = get_buffer; \
|
permstate.get_buffer = get_buffer; \
|
||||||
permstate.bits_left = bits_left
|
permstate.bits_left = bits_left
|
||||||
|
|
||||||
@@ -156,47 +179,14 @@ EXTERN(boolean) jpeg_fill_bit_buffer
|
|||||||
JPP((bitread_working_state * state, register bit_buf_type get_buffer,
|
JPP((bitread_working_state * state, register bit_buf_type get_buffer,
|
||||||
register int bits_left, int nbits));
|
register int bits_left, int nbits));
|
||||||
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Code for extracting next Huffman-coded symbol from input bit stream.
|
|
||||||
* Again, this is time-critical and we make the main paths be macros.
|
|
||||||
*
|
|
||||||
* We use a lookahead table to process codes of up to HUFF_LOOKAHEAD bits
|
|
||||||
* without looping. Usually, more than 95% of the Huffman codes will be 8
|
|
||||||
* or fewer bits long. The few overlength codes are handled with a loop,
|
|
||||||
* which need not be inline code.
|
|
||||||
*
|
|
||||||
* Notes about the HUFF_DECODE macro:
|
|
||||||
* 1. Near the end of the data segment, we may fail to get enough bits
|
|
||||||
* for a lookahead. In that case, we do it the hard way.
|
|
||||||
* 2. If the lookahead table contains no entry, the next code must be
|
|
||||||
* more than HUFF_LOOKAHEAD bits long.
|
|
||||||
* 3. jpeg_huff_decode returns -1 if forced to suspend.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#define HUFF_DECODE(result,state,htbl,failaction,slowlabel) \
|
|
||||||
{ register int nb, look; \
|
|
||||||
if (bits_left < HUFF_LOOKAHEAD) { \
|
|
||||||
if (! jpeg_fill_bit_buffer(&state,get_buffer,bits_left, 0)) {failaction;} \
|
|
||||||
get_buffer = state.get_buffer; bits_left = state.bits_left; \
|
|
||||||
if (bits_left < HUFF_LOOKAHEAD) { \
|
|
||||||
nb = 1; goto slowlabel; \
|
|
||||||
} \
|
|
||||||
} \
|
|
||||||
look = PEEK_BITS(HUFF_LOOKAHEAD); \
|
|
||||||
if ((nb = htbl->look_nbits[look]) != 0) { \
|
|
||||||
DROP_BITS(nb); \
|
|
||||||
result = htbl->look_sym[look]; \
|
|
||||||
} else { \
|
|
||||||
nb = HUFF_LOOKAHEAD+1; \
|
|
||||||
slowlabel: \
|
|
||||||
if ((result=jpeg_huff_decode(&state,get_buffer,bits_left,htbl,nb)) < 0) \
|
|
||||||
{ failaction; } \
|
|
||||||
get_buffer = state.get_buffer; bits_left = state.bits_left; \
|
|
||||||
} \
|
|
||||||
}
|
|
||||||
|
|
||||||
/* Out-of-line case for Huffman code fetching */
|
/* Out-of-line case for Huffman code fetching */
|
||||||
EXTERN(int) jpeg_huff_decode
|
EXTERN(int) jpeg_huff_decode
|
||||||
JPP((bitread_working_state * state, register bit_buf_type get_buffer,
|
JPP((bitread_working_state * state, register bit_buf_type get_buffer,
|
||||||
register int bits_left, d_derived_tbl * htbl, int min_bits));
|
register int bits_left, d_derived_tbl * htbl, int min_bits));
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Figure F.12: extend sign bit.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jdinput.c
|
* jdinput.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -301,7 +301,7 @@ consume_markers (j_decompress_ptr cinfo)
|
|||||||
initial_setup(cinfo);
|
initial_setup(cinfo);
|
||||||
inputctl->inheaders = FALSE;
|
inputctl->inheaders = FALSE;
|
||||||
/* Note: start_input_pass must be called by jdmaster.c
|
/* Note: start_input_pass must be called by jdmaster.c
|
||||||
* before any more input can be consumed. jdapi.c is
|
* before any more input can be consumed. jdapimin.c is
|
||||||
* responsible for enforcing this sequencing.
|
* responsible for enforcing this sequencing.
|
||||||
*/
|
*/
|
||||||
} else { /* 2nd or later SOS marker */
|
} else { /* 2nd or later SOS marker */
|
||||||
|
|||||||
577
jdmarker.c
577
jdmarker.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jdmarker.c
|
* jdmarker.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -85,6 +85,28 @@ typedef enum { /* JPEG marker codes */
|
|||||||
} JPEG_MARKER;
|
} JPEG_MARKER;
|
||||||
|
|
||||||
|
|
||||||
|
/* Private state */
|
||||||
|
|
||||||
|
typedef struct {
|
||||||
|
struct jpeg_marker_reader pub; /* public fields */
|
||||||
|
|
||||||
|
/* Application-overridable marker processing methods */
|
||||||
|
jpeg_marker_parser_method process_COM;
|
||||||
|
jpeg_marker_parser_method process_APPn[16];
|
||||||
|
|
||||||
|
/* Limit on marker data length to save for each marker type */
|
||||||
|
unsigned int length_limit_COM;
|
||||||
|
unsigned int length_limit_APPn[16];
|
||||||
|
|
||||||
|
/* Status of COM/APPn marker saving */
|
||||||
|
jpeg_saved_marker_ptr cur_marker; /* NULL if not processing a marker */
|
||||||
|
unsigned int bytes_read; /* data bytes read so far in marker */
|
||||||
|
/* Note: cur_marker is not linked into marker_list until it's all read. */
|
||||||
|
} my_marker_reader;
|
||||||
|
|
||||||
|
typedef my_marker_reader * my_marker_ptr;
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Macros for fetching data from the data source module.
|
* Macros for fetching data from the data source module.
|
||||||
*
|
*
|
||||||
@@ -104,7 +126,7 @@ typedef enum { /* JPEG marker codes */
|
|||||||
( datasrc->next_input_byte = next_input_byte, \
|
( datasrc->next_input_byte = next_input_byte, \
|
||||||
datasrc->bytes_in_buffer = bytes_in_buffer )
|
datasrc->bytes_in_buffer = bytes_in_buffer )
|
||||||
|
|
||||||
/* Reload the local copies --- seldom used except in MAKE_BYTE_AVAIL */
|
/* Reload the local copies --- used only in MAKE_BYTE_AVAIL */
|
||||||
#define INPUT_RELOAD(cinfo) \
|
#define INPUT_RELOAD(cinfo) \
|
||||||
( next_input_byte = datasrc->next_input_byte, \
|
( next_input_byte = datasrc->next_input_byte, \
|
||||||
bytes_in_buffer = datasrc->bytes_in_buffer )
|
bytes_in_buffer = datasrc->bytes_in_buffer )
|
||||||
@@ -118,14 +140,14 @@ typedef enum { /* JPEG marker codes */
|
|||||||
if (! (*datasrc->fill_input_buffer) (cinfo)) \
|
if (! (*datasrc->fill_input_buffer) (cinfo)) \
|
||||||
{ action; } \
|
{ action; } \
|
||||||
INPUT_RELOAD(cinfo); \
|
INPUT_RELOAD(cinfo); \
|
||||||
} \
|
}
|
||||||
bytes_in_buffer--
|
|
||||||
|
|
||||||
/* Read a byte into variable V.
|
/* Read a byte into variable V.
|
||||||
* If must suspend, take the specified action (typically "return FALSE").
|
* If must suspend, take the specified action (typically "return FALSE").
|
||||||
*/
|
*/
|
||||||
#define INPUT_BYTE(cinfo,V,action) \
|
#define INPUT_BYTE(cinfo,V,action) \
|
||||||
MAKESTMT( MAKE_BYTE_AVAIL(cinfo,action); \
|
MAKESTMT( MAKE_BYTE_AVAIL(cinfo,action); \
|
||||||
|
bytes_in_buffer--; \
|
||||||
V = GETJOCTET(*next_input_byte++); )
|
V = GETJOCTET(*next_input_byte++); )
|
||||||
|
|
||||||
/* As above, but read two bytes interpreted as an unsigned 16-bit integer.
|
/* As above, but read two bytes interpreted as an unsigned 16-bit integer.
|
||||||
@@ -133,8 +155,10 @@ typedef enum { /* JPEG marker codes */
|
|||||||
*/
|
*/
|
||||||
#define INPUT_2BYTES(cinfo,V,action) \
|
#define INPUT_2BYTES(cinfo,V,action) \
|
||||||
MAKESTMT( MAKE_BYTE_AVAIL(cinfo,action); \
|
MAKESTMT( MAKE_BYTE_AVAIL(cinfo,action); \
|
||||||
|
bytes_in_buffer--; \
|
||||||
V = ((unsigned int) GETJOCTET(*next_input_byte++)) << 8; \
|
V = ((unsigned int) GETJOCTET(*next_input_byte++)) << 8; \
|
||||||
MAKE_BYTE_AVAIL(cinfo,action); \
|
MAKE_BYTE_AVAIL(cinfo,action); \
|
||||||
|
bytes_in_buffer--; \
|
||||||
V += GETJOCTET(*next_input_byte++); )
|
V += GETJOCTET(*next_input_byte++); )
|
||||||
|
|
||||||
|
|
||||||
@@ -150,11 +174,18 @@ typedef enum { /* JPEG marker codes */
|
|||||||
* marker parameters; restart point has not been moved. Same routine
|
* marker parameters; restart point has not been moved. Same routine
|
||||||
* will be called again after application supplies more input data.
|
* will be called again after application supplies more input data.
|
||||||
*
|
*
|
||||||
* This approach to suspension assumes that all of a marker's parameters can
|
* This approach to suspension assumes that all of a marker's parameters
|
||||||
* fit into a single input bufferload. This should hold for "normal"
|
* can fit into a single input bufferload. This should hold for "normal"
|
||||||
* markers. Some COM/APPn markers might have large parameter segments,
|
* markers. Some COM/APPn markers might have large parameter segments
|
||||||
* but we use skip_input_data to get past those, and thereby put the problem
|
* that might not fit. If we are simply dropping such a marker, we use
|
||||||
* on the source manager's shoulders.
|
* skip_input_data to get past it, and thereby put the problem on the
|
||||||
|
* source manager's shoulders. If we are saving the marker's contents
|
||||||
|
* into memory, we use a slightly different convention: when forced to
|
||||||
|
* suspend, the marker processor updates the restart point to the end of
|
||||||
|
* what it's consumed (ie, the end of the buffer) before returning FALSE.
|
||||||
|
* On resumption, cinfo->unread_marker still contains the marker code,
|
||||||
|
* but the data source will point to the next chunk of marker data.
|
||||||
|
* The marker processor must retain internal state to deal with this.
|
||||||
*
|
*
|
||||||
* Note that we don't bother to avoid duplicate trace messages if a
|
* Note that we don't bother to avoid duplicate trace messages if a
|
||||||
* suspension occurs within marker parameters. Other side effects
|
* suspension occurs within marker parameters. Other side effects
|
||||||
@@ -188,7 +219,9 @@ get_soi (j_decompress_ptr cinfo)
|
|||||||
cinfo->CCIR601_sampling = FALSE; /* Assume non-CCIR sampling??? */
|
cinfo->CCIR601_sampling = FALSE; /* Assume non-CCIR sampling??? */
|
||||||
|
|
||||||
cinfo->saw_JFIF_marker = FALSE;
|
cinfo->saw_JFIF_marker = FALSE;
|
||||||
cinfo->density_unit = 0; /* set default JFIF APP0 values */
|
cinfo->JFIF_major_version = 1; /* set default JFIF APP0 values */
|
||||||
|
cinfo->JFIF_minor_version = 1;
|
||||||
|
cinfo->density_unit = 0;
|
||||||
cinfo->X_density = 1;
|
cinfo->X_density = 1;
|
||||||
cinfo->Y_density = 1;
|
cinfo->Y_density = 1;
|
||||||
cinfo->saw_Adobe_marker = FALSE;
|
cinfo->saw_Adobe_marker = FALSE;
|
||||||
@@ -280,11 +313,11 @@ get_sos (j_decompress_ptr cinfo)
|
|||||||
|
|
||||||
INPUT_BYTE(cinfo, n, return FALSE); /* Number of components */
|
INPUT_BYTE(cinfo, n, return FALSE); /* Number of components */
|
||||||
|
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_SOS, n);
|
||||||
|
|
||||||
if (length != (n * 2 + 6) || n < 1 || n > MAX_COMPS_IN_SCAN)
|
if (length != (n * 2 + 6) || n < 1 || n > MAX_COMPS_IN_SCAN)
|
||||||
ERREXIT(cinfo, JERR_BAD_LENGTH);
|
ERREXIT(cinfo, JERR_BAD_LENGTH);
|
||||||
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_SOS, n);
|
|
||||||
|
|
||||||
cinfo->comps_in_scan = n;
|
cinfo->comps_in_scan = n;
|
||||||
|
|
||||||
/* Collect the component-spec parameters */
|
/* Collect the component-spec parameters */
|
||||||
@@ -334,111 +367,7 @@ get_sos (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
METHODDEF(boolean)
|
#ifdef D_ARITH_CODING_SUPPORTED
|
||||||
get_app0 (j_decompress_ptr cinfo)
|
|
||||||
/* Process an APP0 marker */
|
|
||||||
{
|
|
||||||
#define JFIF_LEN 14
|
|
||||||
INT32 length;
|
|
||||||
UINT8 b[JFIF_LEN];
|
|
||||||
int buffp;
|
|
||||||
INPUT_VARS(cinfo);
|
|
||||||
|
|
||||||
INPUT_2BYTES(cinfo, length, return FALSE);
|
|
||||||
length -= 2;
|
|
||||||
|
|
||||||
/* See if a JFIF APP0 marker is present */
|
|
||||||
|
|
||||||
if (length >= JFIF_LEN) {
|
|
||||||
for (buffp = 0; buffp < JFIF_LEN; buffp++)
|
|
||||||
INPUT_BYTE(cinfo, b[buffp], return FALSE);
|
|
||||||
length -= JFIF_LEN;
|
|
||||||
|
|
||||||
if (b[0]==0x4A && b[1]==0x46 && b[2]==0x49 && b[3]==0x46 && b[4]==0) {
|
|
||||||
/* Found JFIF APP0 marker: check version */
|
|
||||||
/* Major version must be 1, anything else signals an incompatible change.
|
|
||||||
* We used to treat this as an error, but now it's a nonfatal warning,
|
|
||||||
* because some bozo at Hijaak couldn't read the spec.
|
|
||||||
* Minor version should be 0..2, but process anyway if newer.
|
|
||||||
*/
|
|
||||||
if (b[5] != 1)
|
|
||||||
WARNMS2(cinfo, JWRN_JFIF_MAJOR, b[5], b[6]);
|
|
||||||
else if (b[6] > 2)
|
|
||||||
TRACEMS2(cinfo, 1, JTRC_JFIF_MINOR, b[5], b[6]);
|
|
||||||
/* Save info */
|
|
||||||
cinfo->saw_JFIF_marker = TRUE;
|
|
||||||
cinfo->density_unit = b[7];
|
|
||||||
cinfo->X_density = (b[8] << 8) + b[9];
|
|
||||||
cinfo->Y_density = (b[10] << 8) + b[11];
|
|
||||||
TRACEMS3(cinfo, 1, JTRC_JFIF,
|
|
||||||
cinfo->X_density, cinfo->Y_density, cinfo->density_unit);
|
|
||||||
if (b[12] | b[13])
|
|
||||||
TRACEMS2(cinfo, 1, JTRC_JFIF_THUMBNAIL, b[12], b[13]);
|
|
||||||
if (length != ((INT32) b[12] * (INT32) b[13] * (INT32) 3))
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_JFIF_BADTHUMBNAILSIZE, (int) length);
|
|
||||||
} else {
|
|
||||||
/* Start of APP0 does not match "JFIF" */
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_APP0, (int) length + JFIF_LEN);
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
/* Too short to be JFIF marker */
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_APP0, (int) length);
|
|
||||||
}
|
|
||||||
|
|
||||||
INPUT_SYNC(cinfo);
|
|
||||||
if (length > 0) /* skip any remaining data -- could be lots */
|
|
||||||
(*cinfo->src->skip_input_data) (cinfo, (long) length);
|
|
||||||
|
|
||||||
return TRUE;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
METHODDEF(boolean)
|
|
||||||
get_app14 (j_decompress_ptr cinfo)
|
|
||||||
/* Process an APP14 marker */
|
|
||||||
{
|
|
||||||
#define ADOBE_LEN 12
|
|
||||||
INT32 length;
|
|
||||||
UINT8 b[ADOBE_LEN];
|
|
||||||
int buffp;
|
|
||||||
unsigned int version, flags0, flags1, transform;
|
|
||||||
INPUT_VARS(cinfo);
|
|
||||||
|
|
||||||
INPUT_2BYTES(cinfo, length, return FALSE);
|
|
||||||
length -= 2;
|
|
||||||
|
|
||||||
/* See if an Adobe APP14 marker is present */
|
|
||||||
|
|
||||||
if (length >= ADOBE_LEN) {
|
|
||||||
for (buffp = 0; buffp < ADOBE_LEN; buffp++)
|
|
||||||
INPUT_BYTE(cinfo, b[buffp], return FALSE);
|
|
||||||
length -= ADOBE_LEN;
|
|
||||||
|
|
||||||
if (b[0]==0x41 && b[1]==0x64 && b[2]==0x6F && b[3]==0x62 && b[4]==0x65) {
|
|
||||||
/* Found Adobe APP14 marker */
|
|
||||||
version = (b[5] << 8) + b[6];
|
|
||||||
flags0 = (b[7] << 8) + b[8];
|
|
||||||
flags1 = (b[9] << 8) + b[10];
|
|
||||||
transform = b[11];
|
|
||||||
TRACEMS4(cinfo, 1, JTRC_ADOBE, version, flags0, flags1, transform);
|
|
||||||
cinfo->saw_Adobe_marker = TRUE;
|
|
||||||
cinfo->Adobe_transform = (UINT8) transform;
|
|
||||||
} else {
|
|
||||||
/* Start of APP14 does not match "Adobe" */
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_APP14, (int) length + ADOBE_LEN);
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
/* Too short to be Adobe marker */
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_APP14, (int) length);
|
|
||||||
}
|
|
||||||
|
|
||||||
INPUT_SYNC(cinfo);
|
|
||||||
if (length > 0) /* skip any remaining data -- could be lots */
|
|
||||||
(*cinfo->src->skip_input_data) (cinfo, (long) length);
|
|
||||||
|
|
||||||
return TRUE;
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
LOCAL(boolean)
|
LOCAL(boolean)
|
||||||
get_dac (j_decompress_ptr cinfo)
|
get_dac (j_decompress_ptr cinfo)
|
||||||
@@ -472,10 +401,19 @@ get_dac (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (length != 0)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_LENGTH);
|
||||||
|
|
||||||
INPUT_SYNC(cinfo);
|
INPUT_SYNC(cinfo);
|
||||||
return TRUE;
|
return TRUE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#else /* ! D_ARITH_CODING_SUPPORTED */
|
||||||
|
|
||||||
|
#define get_dac(cinfo) skip_variable(cinfo)
|
||||||
|
|
||||||
|
#endif /* D_ARITH_CODING_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
LOCAL(boolean)
|
LOCAL(boolean)
|
||||||
get_dht (j_decompress_ptr cinfo)
|
get_dht (j_decompress_ptr cinfo)
|
||||||
@@ -491,7 +429,7 @@ get_dht (j_decompress_ptr cinfo)
|
|||||||
INPUT_2BYTES(cinfo, length, return FALSE);
|
INPUT_2BYTES(cinfo, length, return FALSE);
|
||||||
length -= 2;
|
length -= 2;
|
||||||
|
|
||||||
while (length > 0) {
|
while (length > 16) {
|
||||||
INPUT_BYTE(cinfo, index, return FALSE);
|
INPUT_BYTE(cinfo, index, return FALSE);
|
||||||
|
|
||||||
TRACEMS1(cinfo, 1, JTRC_DHT, index);
|
TRACEMS1(cinfo, 1, JTRC_DHT, index);
|
||||||
@@ -512,8 +450,11 @@ get_dht (j_decompress_ptr cinfo)
|
|||||||
bits[9], bits[10], bits[11], bits[12],
|
bits[9], bits[10], bits[11], bits[12],
|
||||||
bits[13], bits[14], bits[15], bits[16]);
|
bits[13], bits[14], bits[15], bits[16]);
|
||||||
|
|
||||||
|
/* Here we just do minimal validation of the counts to avoid walking
|
||||||
|
* off the end of our table space. jdhuff.c will check more carefully.
|
||||||
|
*/
|
||||||
if (count > 256 || ((INT32) count) > length)
|
if (count > 256 || ((INT32) count) > length)
|
||||||
ERREXIT(cinfo, JERR_DHT_COUNTS);
|
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
|
||||||
|
|
||||||
for (i = 0; i < count; i++)
|
for (i = 0; i < count; i++)
|
||||||
INPUT_BYTE(cinfo, huffval[i], return FALSE);
|
INPUT_BYTE(cinfo, huffval[i], return FALSE);
|
||||||
@@ -537,6 +478,9 @@ get_dht (j_decompress_ptr cinfo)
|
|||||||
MEMCOPY((*htblptr)->huffval, huffval, SIZEOF((*htblptr)->huffval));
|
MEMCOPY((*htblptr)->huffval, huffval, SIZEOF((*htblptr)->huffval));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (length != 0)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_LENGTH);
|
||||||
|
|
||||||
INPUT_SYNC(cinfo);
|
INPUT_SYNC(cinfo);
|
||||||
return TRUE;
|
return TRUE;
|
||||||
}
|
}
|
||||||
@@ -592,6 +536,9 @@ get_dqt (j_decompress_ptr cinfo)
|
|||||||
if (prec) length -= DCTSIZE2;
|
if (prec) length -= DCTSIZE2;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (length != 0)
|
||||||
|
ERREXIT(cinfo, JERR_BAD_LENGTH);
|
||||||
|
|
||||||
INPUT_SYNC(cinfo);
|
INPUT_SYNC(cinfo);
|
||||||
return TRUE;
|
return TRUE;
|
||||||
}
|
}
|
||||||
@@ -621,6 +568,279 @@ get_dri (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Routines for processing APPn and COM markers.
|
||||||
|
* These are either saved in memory or discarded, per application request.
|
||||||
|
* APP0 and APP14 are specially checked to see if they are
|
||||||
|
* JFIF and Adobe markers, respectively.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define APP0_DATA_LEN 14 /* Length of interesting data in APP0 */
|
||||||
|
#define APP14_DATA_LEN 12 /* Length of interesting data in APP14 */
|
||||||
|
#define APPN_DATA_LEN 14 /* Must be the largest of the above!! */
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
examine_app0 (j_decompress_ptr cinfo, JOCTET FAR * data,
|
||||||
|
unsigned int datalen, INT32 remaining)
|
||||||
|
/* Examine first few bytes from an APP0.
|
||||||
|
* Take appropriate action if it is a JFIF marker.
|
||||||
|
* datalen is # of bytes at data[], remaining is length of rest of marker data.
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
INT32 totallen = (INT32) datalen + remaining;
|
||||||
|
|
||||||
|
if (datalen >= APP0_DATA_LEN &&
|
||||||
|
GETJOCTET(data[0]) == 0x4A &&
|
||||||
|
GETJOCTET(data[1]) == 0x46 &&
|
||||||
|
GETJOCTET(data[2]) == 0x49 &&
|
||||||
|
GETJOCTET(data[3]) == 0x46 &&
|
||||||
|
GETJOCTET(data[4]) == 0) {
|
||||||
|
/* Found JFIF APP0 marker: save info */
|
||||||
|
cinfo->saw_JFIF_marker = TRUE;
|
||||||
|
cinfo->JFIF_major_version = GETJOCTET(data[5]);
|
||||||
|
cinfo->JFIF_minor_version = GETJOCTET(data[6]);
|
||||||
|
cinfo->density_unit = GETJOCTET(data[7]);
|
||||||
|
cinfo->X_density = (GETJOCTET(data[8]) << 8) + GETJOCTET(data[9]);
|
||||||
|
cinfo->Y_density = (GETJOCTET(data[10]) << 8) + GETJOCTET(data[11]);
|
||||||
|
/* Check version.
|
||||||
|
* Major version must be 1, anything else signals an incompatible change.
|
||||||
|
* (We used to treat this as an error, but now it's a nonfatal warning,
|
||||||
|
* because some bozo at Hijaak couldn't read the spec.)
|
||||||
|
* Minor version should be 0..2, but process anyway if newer.
|
||||||
|
*/
|
||||||
|
if (cinfo->JFIF_major_version != 1)
|
||||||
|
WARNMS2(cinfo, JWRN_JFIF_MAJOR,
|
||||||
|
cinfo->JFIF_major_version, cinfo->JFIF_minor_version);
|
||||||
|
/* Generate trace messages */
|
||||||
|
TRACEMS5(cinfo, 1, JTRC_JFIF,
|
||||||
|
cinfo->JFIF_major_version, cinfo->JFIF_minor_version,
|
||||||
|
cinfo->X_density, cinfo->Y_density, cinfo->density_unit);
|
||||||
|
/* Validate thumbnail dimensions and issue appropriate messages */
|
||||||
|
if (GETJOCTET(data[12]) | GETJOCTET(data[13]))
|
||||||
|
TRACEMS2(cinfo, 1, JTRC_JFIF_THUMBNAIL,
|
||||||
|
GETJOCTET(data[12]), GETJOCTET(data[13]));
|
||||||
|
totallen -= APP0_DATA_LEN;
|
||||||
|
if (totallen !=
|
||||||
|
((INT32)GETJOCTET(data[12]) * (INT32)GETJOCTET(data[13]) * (INT32) 3))
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_JFIF_BADTHUMBNAILSIZE, (int) totallen);
|
||||||
|
} else if (datalen >= 6 &&
|
||||||
|
GETJOCTET(data[0]) == 0x4A &&
|
||||||
|
GETJOCTET(data[1]) == 0x46 &&
|
||||||
|
GETJOCTET(data[2]) == 0x58 &&
|
||||||
|
GETJOCTET(data[3]) == 0x58 &&
|
||||||
|
GETJOCTET(data[4]) == 0) {
|
||||||
|
/* Found JFIF "JFXX" extension APP0 marker */
|
||||||
|
/* The library doesn't actually do anything with these,
|
||||||
|
* but we try to produce a helpful trace message.
|
||||||
|
*/
|
||||||
|
switch (GETJOCTET(data[5])) {
|
||||||
|
case 0x10:
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_THUMB_JPEG, (int) totallen);
|
||||||
|
break;
|
||||||
|
case 0x11:
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_THUMB_PALETTE, (int) totallen);
|
||||||
|
break;
|
||||||
|
case 0x13:
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_THUMB_RGB, (int) totallen);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
TRACEMS2(cinfo, 1, JTRC_JFIF_EXTENSION,
|
||||||
|
GETJOCTET(data[5]), (int) totallen);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
/* Start of APP0 does not match "JFIF" or "JFXX", or too short */
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_APP0, (int) totallen);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
LOCAL(void)
|
||||||
|
examine_app14 (j_decompress_ptr cinfo, JOCTET FAR * data,
|
||||||
|
unsigned int datalen, INT32 remaining)
|
||||||
|
/* Examine first few bytes from an APP14.
|
||||||
|
* Take appropriate action if it is an Adobe marker.
|
||||||
|
* datalen is # of bytes at data[], remaining is length of rest of marker data.
|
||||||
|
*/
|
||||||
|
{
|
||||||
|
unsigned int version, flags0, flags1, transform;
|
||||||
|
|
||||||
|
if (datalen >= APP14_DATA_LEN &&
|
||||||
|
GETJOCTET(data[0]) == 0x41 &&
|
||||||
|
GETJOCTET(data[1]) == 0x64 &&
|
||||||
|
GETJOCTET(data[2]) == 0x6F &&
|
||||||
|
GETJOCTET(data[3]) == 0x62 &&
|
||||||
|
GETJOCTET(data[4]) == 0x65) {
|
||||||
|
/* Found Adobe APP14 marker */
|
||||||
|
version = (GETJOCTET(data[5]) << 8) + GETJOCTET(data[6]);
|
||||||
|
flags0 = (GETJOCTET(data[7]) << 8) + GETJOCTET(data[8]);
|
||||||
|
flags1 = (GETJOCTET(data[9]) << 8) + GETJOCTET(data[10]);
|
||||||
|
transform = GETJOCTET(data[11]);
|
||||||
|
TRACEMS4(cinfo, 1, JTRC_ADOBE, version, flags0, flags1, transform);
|
||||||
|
cinfo->saw_Adobe_marker = TRUE;
|
||||||
|
cinfo->Adobe_transform = (UINT8) transform;
|
||||||
|
} else {
|
||||||
|
/* Start of APP14 does not match "Adobe", or too short */
|
||||||
|
TRACEMS1(cinfo, 1, JTRC_APP14, (int) (datalen + remaining));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
METHODDEF(boolean)
|
||||||
|
get_interesting_appn (j_decompress_ptr cinfo)
|
||||||
|
/* Process an APP0 or APP14 marker without saving it */
|
||||||
|
{
|
||||||
|
INT32 length;
|
||||||
|
JOCTET b[APPN_DATA_LEN];
|
||||||
|
unsigned int i, numtoread;
|
||||||
|
INPUT_VARS(cinfo);
|
||||||
|
|
||||||
|
INPUT_2BYTES(cinfo, length, return FALSE);
|
||||||
|
length -= 2;
|
||||||
|
|
||||||
|
/* get the interesting part of the marker data */
|
||||||
|
if (length >= APPN_DATA_LEN)
|
||||||
|
numtoread = APPN_DATA_LEN;
|
||||||
|
else if (length > 0)
|
||||||
|
numtoread = (unsigned int) length;
|
||||||
|
else
|
||||||
|
numtoread = 0;
|
||||||
|
for (i = 0; i < numtoread; i++)
|
||||||
|
INPUT_BYTE(cinfo, b[i], return FALSE);
|
||||||
|
length -= numtoread;
|
||||||
|
|
||||||
|
/* process it */
|
||||||
|
switch (cinfo->unread_marker) {
|
||||||
|
case M_APP0:
|
||||||
|
examine_app0(cinfo, (JOCTET FAR *) b, numtoread, length);
|
||||||
|
break;
|
||||||
|
case M_APP14:
|
||||||
|
examine_app14(cinfo, (JOCTET FAR *) b, numtoread, length);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
/* can't get here unless jpeg_save_markers chooses wrong processor */
|
||||||
|
ERREXIT1(cinfo, JERR_UNKNOWN_MARKER, cinfo->unread_marker);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* skip any remaining data -- could be lots */
|
||||||
|
INPUT_SYNC(cinfo);
|
||||||
|
if (length > 0)
|
||||||
|
(*cinfo->src->skip_input_data) (cinfo, (long) length);
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef SAVE_MARKERS_SUPPORTED
|
||||||
|
|
||||||
|
METHODDEF(boolean)
|
||||||
|
save_marker (j_decompress_ptr cinfo)
|
||||||
|
/* Save an APPn or COM marker into the marker list */
|
||||||
|
{
|
||||||
|
my_marker_ptr marker = (my_marker_ptr) cinfo->marker;
|
||||||
|
jpeg_saved_marker_ptr cur_marker = marker->cur_marker;
|
||||||
|
unsigned int bytes_read, data_length;
|
||||||
|
JOCTET FAR * data;
|
||||||
|
INT32 length = 0;
|
||||||
|
INPUT_VARS(cinfo);
|
||||||
|
|
||||||
|
if (cur_marker == NULL) {
|
||||||
|
/* begin reading a marker */
|
||||||
|
INPUT_2BYTES(cinfo, length, return FALSE);
|
||||||
|
length -= 2;
|
||||||
|
if (length >= 0) { /* watch out for bogus length word */
|
||||||
|
/* figure out how much we want to save */
|
||||||
|
unsigned int limit;
|
||||||
|
if (cinfo->unread_marker == (int) M_COM)
|
||||||
|
limit = marker->length_limit_COM;
|
||||||
|
else
|
||||||
|
limit = marker->length_limit_APPn[cinfo->unread_marker - (int) M_APP0];
|
||||||
|
if ((unsigned int) length < limit)
|
||||||
|
limit = (unsigned int) length;
|
||||||
|
/* allocate and initialize the marker item */
|
||||||
|
cur_marker = (jpeg_saved_marker_ptr)
|
||||||
|
(*cinfo->mem->alloc_large) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
|
SIZEOF(struct jpeg_marker_struct) + limit);
|
||||||
|
cur_marker->next = NULL;
|
||||||
|
cur_marker->marker = (UINT8) cinfo->unread_marker;
|
||||||
|
cur_marker->original_length = (unsigned int) length;
|
||||||
|
cur_marker->data_length = limit;
|
||||||
|
/* data area is just beyond the jpeg_marker_struct */
|
||||||
|
data = cur_marker->data = (JOCTET FAR *) (cur_marker + 1);
|
||||||
|
marker->cur_marker = cur_marker;
|
||||||
|
marker->bytes_read = 0;
|
||||||
|
bytes_read = 0;
|
||||||
|
data_length = limit;
|
||||||
|
} else {
|
||||||
|
/* deal with bogus length word */
|
||||||
|
bytes_read = data_length = 0;
|
||||||
|
data = NULL;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
/* resume reading a marker */
|
||||||
|
bytes_read = marker->bytes_read;
|
||||||
|
data_length = cur_marker->data_length;
|
||||||
|
data = cur_marker->data + bytes_read;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (bytes_read < data_length) {
|
||||||
|
INPUT_SYNC(cinfo); /* move the restart point to here */
|
||||||
|
marker->bytes_read = bytes_read;
|
||||||
|
/* If there's not at least one byte in buffer, suspend */
|
||||||
|
MAKE_BYTE_AVAIL(cinfo, return FALSE);
|
||||||
|
/* Copy bytes with reasonable rapidity */
|
||||||
|
while (bytes_read < data_length && bytes_in_buffer > 0) {
|
||||||
|
*data++ = *next_input_byte++;
|
||||||
|
bytes_in_buffer--;
|
||||||
|
bytes_read++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Done reading what we want to read */
|
||||||
|
if (cur_marker != NULL) { /* will be NULL if bogus length word */
|
||||||
|
/* Add new marker to end of list */
|
||||||
|
if (cinfo->marker_list == NULL) {
|
||||||
|
cinfo->marker_list = cur_marker;
|
||||||
|
} else {
|
||||||
|
jpeg_saved_marker_ptr prev = cinfo->marker_list;
|
||||||
|
while (prev->next != NULL)
|
||||||
|
prev = prev->next;
|
||||||
|
prev->next = cur_marker;
|
||||||
|
}
|
||||||
|
/* Reset pointer & calc remaining data length */
|
||||||
|
data = cur_marker->data;
|
||||||
|
length = cur_marker->original_length - data_length;
|
||||||
|
}
|
||||||
|
/* Reset to initial state for next marker */
|
||||||
|
marker->cur_marker = NULL;
|
||||||
|
|
||||||
|
/* Process the marker if interesting; else just make a generic trace msg */
|
||||||
|
switch (cinfo->unread_marker) {
|
||||||
|
case M_APP0:
|
||||||
|
examine_app0(cinfo, data, data_length, length);
|
||||||
|
break;
|
||||||
|
case M_APP14:
|
||||||
|
examine_app14(cinfo, data, data_length, length);
|
||||||
|
break;
|
||||||
|
default:
|
||||||
|
TRACEMS2(cinfo, 1, JTRC_MISC_MARKER, cinfo->unread_marker,
|
||||||
|
(int) (data_length + length));
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* skip any remaining data -- could be lots */
|
||||||
|
INPUT_SYNC(cinfo); /* do before skip_input_data */
|
||||||
|
if (length > 0)
|
||||||
|
(*cinfo->src->skip_input_data) (cinfo, (long) length);
|
||||||
|
|
||||||
|
return TRUE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* SAVE_MARKERS_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
METHODDEF(boolean)
|
METHODDEF(boolean)
|
||||||
skip_variable (j_decompress_ptr cinfo)
|
skip_variable (j_decompress_ptr cinfo)
|
||||||
/* Skip over an unknown or uninteresting variable-length marker */
|
/* Skip over an unknown or uninteresting variable-length marker */
|
||||||
@@ -629,11 +849,13 @@ skip_variable (j_decompress_ptr cinfo)
|
|||||||
INPUT_VARS(cinfo);
|
INPUT_VARS(cinfo);
|
||||||
|
|
||||||
INPUT_2BYTES(cinfo, length, return FALSE);
|
INPUT_2BYTES(cinfo, length, return FALSE);
|
||||||
|
length -= 2;
|
||||||
|
|
||||||
TRACEMS2(cinfo, 1, JTRC_MISC_MARKER, cinfo->unread_marker, (int) length);
|
TRACEMS2(cinfo, 1, JTRC_MISC_MARKER, cinfo->unread_marker, (int) length);
|
||||||
|
|
||||||
INPUT_SYNC(cinfo); /* do before skip_input_data */
|
INPUT_SYNC(cinfo); /* do before skip_input_data */
|
||||||
(*cinfo->src->skip_input_data) (cinfo, (long) length - 2L);
|
if (length > 0)
|
||||||
|
(*cinfo->src->skip_input_data) (cinfo, (long) length);
|
||||||
|
|
||||||
return TRUE;
|
return TRUE;
|
||||||
}
|
}
|
||||||
@@ -833,12 +1055,13 @@ read_markers (j_decompress_ptr cinfo)
|
|||||||
case M_APP13:
|
case M_APP13:
|
||||||
case M_APP14:
|
case M_APP14:
|
||||||
case M_APP15:
|
case M_APP15:
|
||||||
if (! (*cinfo->marker->process_APPn[cinfo->unread_marker - (int) M_APP0]) (cinfo))
|
if (! (*((my_marker_ptr) cinfo->marker)->process_APPn[
|
||||||
|
cinfo->unread_marker - (int) M_APP0]) (cinfo))
|
||||||
return JPEG_SUSPENDED;
|
return JPEG_SUSPENDED;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
case M_COM:
|
case M_COM:
|
||||||
if (! (*cinfo->marker->process_COM) (cinfo))
|
if (! (*((my_marker_ptr) cinfo->marker)->process_COM) (cinfo))
|
||||||
return JPEG_SUSPENDED;
|
return JPEG_SUSPENDED;
|
||||||
break;
|
break;
|
||||||
|
|
||||||
@@ -1018,12 +1241,15 @@ jpeg_resync_to_restart (j_decompress_ptr cinfo, int desired)
|
|||||||
METHODDEF(void)
|
METHODDEF(void)
|
||||||
reset_marker_reader (j_decompress_ptr cinfo)
|
reset_marker_reader (j_decompress_ptr cinfo)
|
||||||
{
|
{
|
||||||
|
my_marker_ptr marker = (my_marker_ptr) cinfo->marker;
|
||||||
|
|
||||||
cinfo->comp_info = NULL; /* until allocated by get_sof */
|
cinfo->comp_info = NULL; /* until allocated by get_sof */
|
||||||
cinfo->input_scan_number = 0; /* no SOS seen yet */
|
cinfo->input_scan_number = 0; /* no SOS seen yet */
|
||||||
cinfo->unread_marker = 0; /* no pending marker */
|
cinfo->unread_marker = 0; /* no pending marker */
|
||||||
cinfo->marker->saw_SOI = FALSE; /* set internal state too */
|
marker->pub.saw_SOI = FALSE; /* set internal state too */
|
||||||
cinfo->marker->saw_SOF = FALSE;
|
marker->pub.saw_SOF = FALSE;
|
||||||
cinfo->marker->discarded_bytes = 0;
|
marker->pub.discarded_bytes = 0;
|
||||||
|
marker->cur_marker = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -1035,21 +1261,100 @@ reset_marker_reader (j_decompress_ptr cinfo)
|
|||||||
GLOBAL(void)
|
GLOBAL(void)
|
||||||
jinit_marker_reader (j_decompress_ptr cinfo)
|
jinit_marker_reader (j_decompress_ptr cinfo)
|
||||||
{
|
{
|
||||||
|
my_marker_ptr marker;
|
||||||
int i;
|
int i;
|
||||||
|
|
||||||
/* Create subobject in permanent pool */
|
/* Create subobject in permanent pool */
|
||||||
cinfo->marker = (struct jpeg_marker_reader *)
|
marker = (my_marker_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_PERMANENT,
|
||||||
SIZEOF(struct jpeg_marker_reader));
|
SIZEOF(my_marker_reader));
|
||||||
/* Initialize method pointers */
|
cinfo->marker = (struct jpeg_marker_reader *) marker;
|
||||||
cinfo->marker->reset_marker_reader = reset_marker_reader;
|
/* Initialize public method pointers */
|
||||||
cinfo->marker->read_markers = read_markers;
|
marker->pub.reset_marker_reader = reset_marker_reader;
|
||||||
cinfo->marker->read_restart_marker = read_restart_marker;
|
marker->pub.read_markers = read_markers;
|
||||||
cinfo->marker->process_COM = skip_variable;
|
marker->pub.read_restart_marker = read_restart_marker;
|
||||||
for (i = 0; i < 16; i++)
|
/* Initialize COM/APPn processing.
|
||||||
cinfo->marker->process_APPn[i] = skip_variable;
|
* By default, we examine and then discard APP0 and APP14,
|
||||||
cinfo->marker->process_APPn[0] = get_app0;
|
* but simply discard COM and all other APPn.
|
||||||
cinfo->marker->process_APPn[14] = get_app14;
|
*/
|
||||||
|
marker->process_COM = skip_variable;
|
||||||
|
marker->length_limit_COM = 0;
|
||||||
|
for (i = 0; i < 16; i++) {
|
||||||
|
marker->process_APPn[i] = skip_variable;
|
||||||
|
marker->length_limit_APPn[i] = 0;
|
||||||
|
}
|
||||||
|
marker->process_APPn[0] = get_interesting_appn;
|
||||||
|
marker->process_APPn[14] = get_interesting_appn;
|
||||||
/* Reset marker processing state */
|
/* Reset marker processing state */
|
||||||
reset_marker_reader(cinfo);
|
reset_marker_reader(cinfo);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Control saving of COM and APPn markers into marker_list.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef SAVE_MARKERS_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(void)
|
||||||
|
jpeg_save_markers (j_decompress_ptr cinfo, int marker_code,
|
||||||
|
unsigned int length_limit)
|
||||||
|
{
|
||||||
|
my_marker_ptr marker = (my_marker_ptr) cinfo->marker;
|
||||||
|
long maxlength;
|
||||||
|
jpeg_marker_parser_method processor;
|
||||||
|
|
||||||
|
/* Length limit mustn't be larger than what we can allocate
|
||||||
|
* (should only be a concern in a 16-bit environment).
|
||||||
|
*/
|
||||||
|
maxlength = cinfo->mem->max_alloc_chunk - SIZEOF(struct jpeg_marker_struct);
|
||||||
|
if (((long) length_limit) > maxlength)
|
||||||
|
length_limit = (unsigned int) maxlength;
|
||||||
|
|
||||||
|
/* Choose processor routine to use.
|
||||||
|
* APP0/APP14 have special requirements.
|
||||||
|
*/
|
||||||
|
if (length_limit) {
|
||||||
|
processor = save_marker;
|
||||||
|
/* If saving APP0/APP14, save at least enough for our internal use. */
|
||||||
|
if (marker_code == (int) M_APP0 && length_limit < APP0_DATA_LEN)
|
||||||
|
length_limit = APP0_DATA_LEN;
|
||||||
|
else if (marker_code == (int) M_APP14 && length_limit < APP14_DATA_LEN)
|
||||||
|
length_limit = APP14_DATA_LEN;
|
||||||
|
} else {
|
||||||
|
processor = skip_variable;
|
||||||
|
/* If discarding APP0/APP14, use our regular on-the-fly processor. */
|
||||||
|
if (marker_code == (int) M_APP0 || marker_code == (int) M_APP14)
|
||||||
|
processor = get_interesting_appn;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (marker_code == (int) M_COM) {
|
||||||
|
marker->process_COM = processor;
|
||||||
|
marker->length_limit_COM = length_limit;
|
||||||
|
} else if (marker_code >= (int) M_APP0 && marker_code <= (int) M_APP15) {
|
||||||
|
marker->process_APPn[marker_code - (int) M_APP0] = processor;
|
||||||
|
marker->length_limit_APPn[marker_code - (int) M_APP0] = length_limit;
|
||||||
|
} else
|
||||||
|
ERREXIT1(cinfo, JERR_UNKNOWN_MARKER, marker_code);
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* SAVE_MARKERS_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Install a special processing method for COM or APPn markers.
|
||||||
|
*/
|
||||||
|
|
||||||
|
GLOBAL(void)
|
||||||
|
jpeg_set_marker_processor (j_decompress_ptr cinfo, int marker_code,
|
||||||
|
jpeg_marker_parser_method routine)
|
||||||
|
{
|
||||||
|
my_marker_ptr marker = (my_marker_ptr) cinfo->marker;
|
||||||
|
|
||||||
|
if (marker_code == (int) M_COM)
|
||||||
|
marker->process_COM = routine;
|
||||||
|
else if (marker_code >= (int) M_APP0 && marker_code <= (int) M_APP15)
|
||||||
|
marker->process_APPn[marker_code - (int) M_APP0] = routine;
|
||||||
|
else
|
||||||
|
ERREXIT1(cinfo, JERR_UNKNOWN_MARKER, marker_code);
|
||||||
|
}
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jdmaster.c
|
* jdmaster.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -84,8 +84,10 @@ GLOBAL(void)
|
|||||||
jpeg_calc_output_dimensions (j_decompress_ptr cinfo)
|
jpeg_calc_output_dimensions (j_decompress_ptr cinfo)
|
||||||
/* Do computations that are needed before master selection phase */
|
/* Do computations that are needed before master selection phase */
|
||||||
{
|
{
|
||||||
|
#ifdef IDCT_SCALING_SUPPORTED
|
||||||
int ci;
|
int ci;
|
||||||
jpeg_component_info *compptr;
|
jpeg_component_info *compptr;
|
||||||
|
#endif
|
||||||
|
|
||||||
/* Prevent application from calling me at wrong times */
|
/* Prevent application from calling me at wrong times */
|
||||||
if (cinfo->global_state != DSTATE_READY)
|
if (cinfo->global_state != DSTATE_READY)
|
||||||
@@ -429,7 +431,7 @@ master_selection (j_decompress_ptr cinfo)
|
|||||||
* modules will be active during this pass and give them appropriate
|
* modules will be active during this pass and give them appropriate
|
||||||
* start_pass calls. We also set is_dummy_pass to indicate whether this
|
* start_pass calls. We also set is_dummy_pass to indicate whether this
|
||||||
* is a "real" output pass or a dummy pass for color quantization.
|
* is a "real" output pass or a dummy pass for color quantization.
|
||||||
* (In the latter case, jdapi.c will crank the pass to completion.)
|
* (In the latter case, jdapistd.c will crank the pass to completion.)
|
||||||
*/
|
*/
|
||||||
|
|
||||||
METHODDEF(void)
|
METHODDEF(void)
|
||||||
|
|||||||
105
jdmerge.c
105
jdmerge.c
@@ -5,6 +5,13 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 5, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains code for merged upsampling/color conversion.
|
* This file contains code for merged upsampling/color conversion.
|
||||||
*
|
*
|
||||||
* This file combines functions from jdsample.c and jdcolor.c;
|
* This file combines functions from jdsample.c and jdcolor.c;
|
||||||
@@ -35,6 +42,7 @@
|
|||||||
#define JPEG_INTERNALS
|
#define JPEG_INTERNALS
|
||||||
#include "jinclude.h"
|
#include "jinclude.h"
|
||||||
#include "jpeglib.h"
|
#include "jpeglib.h"
|
||||||
|
#include "jcolsamp.h" /* Private declarations */
|
||||||
|
|
||||||
#ifdef UPSAMPLE_MERGING_SUPPORTED
|
#ifdef UPSAMPLE_MERGING_SUPPORTED
|
||||||
|
|
||||||
@@ -218,6 +226,17 @@ merged_1v_upsample (j_decompress_ptr cinfo,
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
/* offset of filler byte */
|
||||||
|
#define RGB_FILLER (6 - (RGB_RED) - (RGB_GREEN) - (RGB_BLUE))
|
||||||
|
/* byte pattern to fill with */
|
||||||
|
#ifdef RGBX_FILLER_0XFF
|
||||||
|
#define RGB_FILLER_BYTE 0xFF
|
||||||
|
#else
|
||||||
|
#define RGB_FILLER_BYTE 0x00
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 4 */
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
|
* Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
|
||||||
*/
|
*/
|
||||||
@@ -258,11 +277,17 @@ h2v1_merged_upsample (j_decompress_ptr cinfo,
|
|||||||
outptr[RGB_RED] = range_limit[y + cred];
|
outptr[RGB_RED] = range_limit[y + cred];
|
||||||
outptr[RGB_GREEN] = range_limit[y + cgreen];
|
outptr[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr[RGB_BLUE] = range_limit[y + cblue];
|
outptr[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr += RGB_PIXELSIZE;
|
outptr += RGB_PIXELSIZE;
|
||||||
y = GETJSAMPLE(*inptr0++);
|
y = GETJSAMPLE(*inptr0++);
|
||||||
outptr[RGB_RED] = range_limit[y + cred];
|
outptr[RGB_RED] = range_limit[y + cred];
|
||||||
outptr[RGB_GREEN] = range_limit[y + cgreen];
|
outptr[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr[RGB_BLUE] = range_limit[y + cblue];
|
outptr[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr += RGB_PIXELSIZE;
|
outptr += RGB_PIXELSIZE;
|
||||||
}
|
}
|
||||||
/* If image width is odd, do the last output column separately */
|
/* If image width is odd, do the last output column separately */
|
||||||
@@ -276,6 +301,9 @@ h2v1_merged_upsample (j_decompress_ptr cinfo,
|
|||||||
outptr[RGB_RED] = range_limit[y + cred];
|
outptr[RGB_RED] = range_limit[y + cred];
|
||||||
outptr[RGB_GREEN] = range_limit[y + cgreen];
|
outptr[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr[RGB_BLUE] = range_limit[y + cblue];
|
outptr[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -322,21 +350,33 @@ h2v2_merged_upsample (j_decompress_ptr cinfo,
|
|||||||
outptr0[RGB_RED] = range_limit[y + cred];
|
outptr0[RGB_RED] = range_limit[y + cred];
|
||||||
outptr0[RGB_GREEN] = range_limit[y + cgreen];
|
outptr0[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr0[RGB_BLUE] = range_limit[y + cblue];
|
outptr0[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr0[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr0 += RGB_PIXELSIZE;
|
outptr0 += RGB_PIXELSIZE;
|
||||||
y = GETJSAMPLE(*inptr00++);
|
y = GETJSAMPLE(*inptr00++);
|
||||||
outptr0[RGB_RED] = range_limit[y + cred];
|
outptr0[RGB_RED] = range_limit[y + cred];
|
||||||
outptr0[RGB_GREEN] = range_limit[y + cgreen];
|
outptr0[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr0[RGB_BLUE] = range_limit[y + cblue];
|
outptr0[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr0[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr0 += RGB_PIXELSIZE;
|
outptr0 += RGB_PIXELSIZE;
|
||||||
y = GETJSAMPLE(*inptr01++);
|
y = GETJSAMPLE(*inptr01++);
|
||||||
outptr1[RGB_RED] = range_limit[y + cred];
|
outptr1[RGB_RED] = range_limit[y + cred];
|
||||||
outptr1[RGB_GREEN] = range_limit[y + cgreen];
|
outptr1[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr1[RGB_BLUE] = range_limit[y + cblue];
|
outptr1[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr1[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr1 += RGB_PIXELSIZE;
|
outptr1 += RGB_PIXELSIZE;
|
||||||
y = GETJSAMPLE(*inptr01++);
|
y = GETJSAMPLE(*inptr01++);
|
||||||
outptr1[RGB_RED] = range_limit[y + cred];
|
outptr1[RGB_RED] = range_limit[y + cred];
|
||||||
outptr1[RGB_GREEN] = range_limit[y + cgreen];
|
outptr1[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr1[RGB_BLUE] = range_limit[y + cblue];
|
outptr1[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr1[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
outptr1 += RGB_PIXELSIZE;
|
outptr1 += RGB_PIXELSIZE;
|
||||||
}
|
}
|
||||||
/* If image width is odd, do the last output column separately */
|
/* If image width is odd, do the last output column separately */
|
||||||
@@ -350,10 +390,16 @@ h2v2_merged_upsample (j_decompress_ptr cinfo,
|
|||||||
outptr0[RGB_RED] = range_limit[y + cred];
|
outptr0[RGB_RED] = range_limit[y + cred];
|
||||||
outptr0[RGB_GREEN] = range_limit[y + cgreen];
|
outptr0[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr0[RGB_BLUE] = range_limit[y + cblue];
|
outptr0[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr0[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
y = GETJSAMPLE(*inptr01);
|
y = GETJSAMPLE(*inptr01);
|
||||||
outptr1[RGB_RED] = range_limit[y + cred];
|
outptr1[RGB_RED] = range_limit[y + cred];
|
||||||
outptr1[RGB_GREEN] = range_limit[y + cgreen];
|
outptr1[RGB_GREEN] = range_limit[y + cgreen];
|
||||||
outptr1[RGB_BLUE] = range_limit[y + cblue];
|
outptr1[RGB_BLUE] = range_limit[y + cblue];
|
||||||
|
#if RGB_PIXELSIZE == 4
|
||||||
|
outptr1[RGB_FILLER] = RGB_FILLER_BYTE;
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -370,6 +416,7 @@ GLOBAL(void)
|
|||||||
jinit_merged_upsampler (j_decompress_ptr cinfo)
|
jinit_merged_upsampler (j_decompress_ptr cinfo)
|
||||||
{
|
{
|
||||||
my_upsample_ptr upsample;
|
my_upsample_ptr upsample;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
upsample = (my_upsample_ptr)
|
upsample = (my_upsample_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -382,19 +429,73 @@ jinit_merged_upsampler (j_decompress_ptr cinfo)
|
|||||||
|
|
||||||
if (cinfo->max_v_samp_factor == 2) {
|
if (cinfo->max_v_samp_factor == 2) {
|
||||||
upsample->pub.upsample = merged_2v_upsample;
|
upsample->pub.upsample = merged_2v_upsample;
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JDMERGE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_merged_upsample_sse2)) {
|
||||||
|
upsample->upmethod = jpeg_h2v2_merged_upsample_sse2;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#ifdef JDMERGE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX) {
|
||||||
|
upsample->upmethod = jpeg_h2v2_merged_upsample_mmx;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
{
|
||||||
upsample->upmethod = h2v2_merged_upsample;
|
upsample->upmethod = h2v2_merged_upsample;
|
||||||
|
build_ycc_rgb_table(cinfo);
|
||||||
|
}
|
||||||
/* Allocate a spare row buffer */
|
/* Allocate a spare row buffer */
|
||||||
upsample->spare_row = (JSAMPROW)
|
upsample->spare_row = (JSAMPROW)
|
||||||
(*cinfo->mem->alloc_large) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_large) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
(size_t) (upsample->out_row_width * SIZEOF(JSAMPLE)));
|
(size_t) (upsample->out_row_width * SIZEOF(JSAMPLE)));
|
||||||
} else {
|
} else {
|
||||||
upsample->pub.upsample = merged_1v_upsample;
|
upsample->pub.upsample = merged_1v_upsample;
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JDMERGE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_merged_upsample_sse2)) {
|
||||||
|
upsample->upmethod = jpeg_h2v1_merged_upsample_sse2;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#ifdef JDMERGE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX) {
|
||||||
|
upsample->upmethod = jpeg_h2v1_merged_upsample_mmx;
|
||||||
|
} else
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
{
|
||||||
upsample->upmethod = h2v1_merged_upsample;
|
upsample->upmethod = h2v1_merged_upsample;
|
||||||
|
build_ycc_rgb_table(cinfo);
|
||||||
|
}
|
||||||
/* No spare row needed */
|
/* No spare row needed */
|
||||||
upsample->spare_row = NULL;
|
upsample->spare_row = NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
build_ycc_rgb_table(cinfo);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_merged_upsampler (j_decompress_ptr cinfo)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
#if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
#ifdef JDMERGE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_merged_upsample_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JDMERGE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
#endif /* RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4 */
|
||||||
|
|
||||||
|
return JSIMD_NONE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
#endif /* UPSAMPLE_MERGING_SUPPORTED */
|
#endif /* UPSAMPLE_MERGING_SUPPORTED */
|
||||||
|
|||||||
981
jdmermmx.asm
Normal file
981
jdmermmx.asm
Normal file
@@ -0,0 +1,981 @@
|
|||||||
|
;
|
||||||
|
; jdmermmx.asm - merged upsampling/color conversion (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
|
%ifdef UPSAMPLE_MERGING_SUPPORTED
|
||||||
|
%ifdef JDMERGE_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define SCALEBITS 16
|
||||||
|
|
||||||
|
F_0_344 equ 22554 ; FIX(0.34414)
|
||||||
|
F_0_714 equ 46802 ; FIX(0.71414)
|
||||||
|
F_1_402 equ 91881 ; FIX(1.40200)
|
||||||
|
F_1_772 equ 116130 ; FIX(1.77200)
|
||||||
|
F_0_402 equ (F_1_402 - 65536) ; FIX(1.40200) - FIX(1)
|
||||||
|
F_0_285 equ ( 65536 - F_0_714) ; FIX(1) - FIX(0.71414)
|
||||||
|
F_0_228 equ (131072 - F_1_772) ; FIX(2) - FIX(1.77200)
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_merged_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_merged_upsample_mmx):
|
||||||
|
|
||||||
|
PW_F0402 times 4 dw F_0_402
|
||||||
|
PW_MF0228 times 4 dw -F_0_228
|
||||||
|
PW_MF0344_F0285 times 2 dw -F_0_344, F_0_285
|
||||||
|
PW_ONE times 4 dw 1
|
||||||
|
PD_ONEHALF times 2 dd 1 << (SCALEBITS-1)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Upsample and color convert for the case of 2:1 horizontal and 1:1 vertical.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_merged_upsample_mmx (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
; JDIMENSION in_row_group_ctr,
|
||||||
|
; JSAMPARRAY output_buf);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPIMAGE input_buf
|
||||||
|
%define in_row_group_ctr(b) (b)+16 ; JDIMENSION in_row_group_ctr
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 3
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_merged_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_merged_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(eax)]
|
||||||
|
mov ecx, JDIMENSION [jdstruct_output_width(ecx)] ; col
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPIMAGE [input_buf(eax)]
|
||||||
|
mov ecx, JDIMENSION [in_row_group_ctr(eax)]
|
||||||
|
mov esi, JSAMPARRAY [edi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [edi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [edi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)]
|
||||||
|
mov esi, JSAMPROW [esi+ecx*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov ebx, JSAMPROW [ebx+ecx*SIZEOF_JSAMPROW] ; inptr1
|
||||||
|
mov edx, JSAMPROW [edx+ecx*SIZEOF_JSAMPROW] ; inptr2
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
|
||||||
|
pop ecx ; col
|
||||||
|
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
movpic eax, POINTER [gotptr] ; load GOT address (eax)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [ebx] ; mm6=Cb(01234567)
|
||||||
|
movq mm7, MMWORD [edx] ; mm7=Cr(01234567)
|
||||||
|
|
||||||
|
pxor mm1,mm1 ; mm1=(all 0's)
|
||||||
|
pcmpeqw mm3,mm3
|
||||||
|
psllw mm3,7 ; mm3={0xFF80 0xFF80 0xFF80 0xFF80}
|
||||||
|
|
||||||
|
movq mm4,mm6
|
||||||
|
punpckhbw mm6,mm1 ; mm6=Cb(4567)=CbH
|
||||||
|
punpcklbw mm4,mm1 ; mm4=Cb(0123)=CbL
|
||||||
|
movq mm0,mm7
|
||||||
|
punpckhbw mm7,mm1 ; mm7=Cr(4567)=CrH
|
||||||
|
punpcklbw mm0,mm1 ; mm0=Cr(0123)=CrL
|
||||||
|
|
||||||
|
paddw mm6,mm3
|
||||||
|
paddw mm4,mm3
|
||||||
|
paddw mm7,mm3
|
||||||
|
paddw mm0,mm3
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; R = Y + 1.40200 * Cr
|
||||||
|
; G = Y - 0.34414 * Cb - 0.71414 * Cr
|
||||||
|
; B = Y + 1.77200 * Cb
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; R = Y + 0.40200 * Cr + Cr
|
||||||
|
; G = Y - 0.34414 * Cb + 0.28586 * Cr - Cr
|
||||||
|
; B = Y - 0.22800 * Cb + Cb + Cb
|
||||||
|
|
||||||
|
movq mm5,mm6 ; mm5=CbH
|
||||||
|
movq mm2,mm4 ; mm2=CbL
|
||||||
|
paddw mm6,mm6 ; mm6=2*CbH
|
||||||
|
paddw mm4,mm4 ; mm4=2*CbL
|
||||||
|
movq mm1,mm7 ; mm1=CrH
|
||||||
|
movq mm3,mm0 ; mm3=CrL
|
||||||
|
paddw mm7,mm7 ; mm7=2*CrH
|
||||||
|
paddw mm0,mm0 ; mm0=2*CrL
|
||||||
|
|
||||||
|
pmulhw mm6,[GOTOFF(eax,PW_MF0228)] ; mm6=(2*CbH * -FIX(0.22800))
|
||||||
|
pmulhw mm4,[GOTOFF(eax,PW_MF0228)] ; mm4=(2*CbL * -FIX(0.22800))
|
||||||
|
pmulhw mm7,[GOTOFF(eax,PW_F0402)] ; mm7=(2*CrH * FIX(0.40200))
|
||||||
|
pmulhw mm0,[GOTOFF(eax,PW_F0402)] ; mm0=(2*CrL * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw mm6,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw mm4,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw mm6,1 ; mm6=(CbH * -FIX(0.22800))
|
||||||
|
psraw mm4,1 ; mm4=(CbL * -FIX(0.22800))
|
||||||
|
paddw mm7,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw mm0,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw mm7,1 ; mm7=(CrH * FIX(0.40200))
|
||||||
|
psraw mm0,1 ; mm0=(CrL * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw mm6,mm5
|
||||||
|
paddw mm4,mm2
|
||||||
|
paddw mm6,mm5 ; mm6=(CbH * FIX(1.77200))=(B-Y)H
|
||||||
|
paddw mm4,mm2 ; mm4=(CbL * FIX(1.77200))=(B-Y)L
|
||||||
|
paddw mm7,mm1 ; mm7=(CrH * FIX(1.40200))=(R-Y)H
|
||||||
|
paddw mm0,mm3 ; mm0=(CrL * FIX(1.40200))=(R-Y)L
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm6 ; wk(0)=(B-Y)H
|
||||||
|
movq MMWORD [wk(1)], mm7 ; wk(1)=(R-Y)H
|
||||||
|
|
||||||
|
movq mm6,mm5
|
||||||
|
movq mm7,mm2
|
||||||
|
punpcklwd mm5,mm1
|
||||||
|
punpckhwd mm6,mm1
|
||||||
|
pmaddwd mm5,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd mm6,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
punpcklwd mm2,mm3
|
||||||
|
punpckhwd mm7,mm3
|
||||||
|
pmaddwd mm2,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd mm7,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
|
||||||
|
paddd mm5,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd mm6,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad mm5,SCALEBITS
|
||||||
|
psrad mm6,SCALEBITS
|
||||||
|
paddd mm2,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd mm7,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad mm2,SCALEBITS
|
||||||
|
psrad mm7,SCALEBITS
|
||||||
|
|
||||||
|
packssdw mm5,mm6 ; mm5=CbH*-FIX(0.344)+CrH*FIX(0.285)
|
||||||
|
packssdw mm2,mm7 ; mm2=CbL*-FIX(0.344)+CrL*FIX(0.285)
|
||||||
|
psubw mm5,mm1 ; mm5=CbH*-FIX(0.344)+CrH*-FIX(0.714)=(G-Y)H
|
||||||
|
psubw mm2,mm3 ; mm2=CbL*-FIX(0.344)+CrL*-FIX(0.714)=(G-Y)L
|
||||||
|
|
||||||
|
movq MMWORD [wk(2)], mm5 ; wk(2)=(G-Y)H
|
||||||
|
|
||||||
|
mov al,2 ; Yctr
|
||||||
|
jmp short .Yloop_1st
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.Yloop_2nd:
|
||||||
|
movq mm0, MMWORD [wk(1)] ; mm0=(R-Y)H
|
||||||
|
movq mm2, MMWORD [wk(2)] ; mm2=(G-Y)H
|
||||||
|
movq mm4, MMWORD [wk(0)] ; mm4=(B-Y)H
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.Yloop_1st:
|
||||||
|
movq mm7, MMWORD [esi] ; mm7=Y(01234567)
|
||||||
|
|
||||||
|
pcmpeqw mm6,mm6
|
||||||
|
psrlw mm6,BYTE_BIT ; mm6={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
pand mm6,mm7 ; mm6=Y(0246)=YE
|
||||||
|
psrlw mm7,BYTE_BIT ; mm7=Y(1357)=YO
|
||||||
|
|
||||||
|
movq mm1,mm0 ; mm1=mm0=(R-Y)(L/H)
|
||||||
|
movq mm3,mm2 ; mm3=mm2=(G-Y)(L/H)
|
||||||
|
movq mm5,mm4 ; mm5=mm4=(B-Y)(L/H)
|
||||||
|
|
||||||
|
paddw mm0,mm6 ; mm0=((R-Y)+YE)=RE=(R0 R2 R4 R6)
|
||||||
|
paddw mm1,mm7 ; mm1=((R-Y)+YO)=RO=(R1 R3 R5 R7)
|
||||||
|
packuswb mm0,mm0 ; mm0=(R0 R2 R4 R6 ** ** ** **)
|
||||||
|
packuswb mm1,mm1 ; mm1=(R1 R3 R5 R7 ** ** ** **)
|
||||||
|
|
||||||
|
paddw mm2,mm6 ; mm2=((G-Y)+YE)=GE=(G0 G2 G4 G6)
|
||||||
|
paddw mm3,mm7 ; mm3=((G-Y)+YO)=GO=(G1 G3 G5 G7)
|
||||||
|
packuswb mm2,mm2 ; mm2=(G0 G2 G4 G6 ** ** ** **)
|
||||||
|
packuswb mm3,mm3 ; mm3=(G1 G3 G5 G7 ** ** ** **)
|
||||||
|
|
||||||
|
paddw mm4,mm6 ; mm4=((B-Y)+YE)=BE=(B0 B2 B4 B6)
|
||||||
|
paddw mm5,mm7 ; mm5=((B-Y)+YO)=BO=(B1 B3 B5 B7)
|
||||||
|
packuswb mm4,mm4 ; mm4=(B0 B2 B4 B6 ** ** ** **)
|
||||||
|
packuswb mm5,mm5 ; mm5=(B1 B3 B5 B7 ** ** ** **)
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 ; ---------------
|
||||||
|
|
||||||
|
; mmA=(00 02 04 06 ** ** ** **), mmB=(01 03 05 07 ** ** ** **)
|
||||||
|
; mmC=(10 12 14 16 ** ** ** **), mmD=(11 13 15 17 ** ** ** **)
|
||||||
|
; mmE=(20 22 24 26 ** ** ** **), mmF=(21 23 25 27 ** ** ** **)
|
||||||
|
; mmG=(** ** ** ** ** ** ** **), mmH=(** ** ** ** ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw mmA,mmC ; mmA=(00 10 02 12 04 14 06 16)
|
||||||
|
punpcklbw mmE,mmB ; mmE=(20 01 22 03 24 05 26 07)
|
||||||
|
punpcklbw mmD,mmF ; mmD=(11 21 13 23 15 25 17 27)
|
||||||
|
|
||||||
|
movq mmG,mmA
|
||||||
|
movq mmH,mmA
|
||||||
|
punpcklwd mmA,mmE ; mmA=(00 10 20 01 02 12 22 03)
|
||||||
|
punpckhwd mmG,mmE ; mmG=(04 14 24 05 06 16 26 07)
|
||||||
|
|
||||||
|
psrlq mmH,2*BYTE_BIT ; mmH=(02 12 04 14 06 16 -- --)
|
||||||
|
psrlq mmE,2*BYTE_BIT ; mmE=(22 03 24 05 26 07 -- --)
|
||||||
|
|
||||||
|
movq mmC,mmD
|
||||||
|
movq mmB,mmD
|
||||||
|
punpcklwd mmD,mmH ; mmD=(11 21 02 12 13 23 04 14)
|
||||||
|
punpckhwd mmC,mmH ; mmC=(15 25 06 16 17 27 -- --)
|
||||||
|
|
||||||
|
psrlq mmB,2*BYTE_BIT ; mmB=(13 23 15 25 17 27 -- --)
|
||||||
|
|
||||||
|
movq mmF,mmE
|
||||||
|
punpcklwd mmE,mmB ; mmE=(22 03 13 23 24 05 15 25)
|
||||||
|
punpckhwd mmF,mmB ; mmF=(26 07 17 27 -- -- -- --)
|
||||||
|
|
||||||
|
punpckldq mmA,mmD ; mmA=(00 10 20 01 11 21 02 12)
|
||||||
|
punpckldq mmE,mmG ; mmE=(22 03 13 23 04 14 24 05)
|
||||||
|
punpckldq mmC,mmF ; mmC=(15 25 06 16 26 07 17 27)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmE
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
jz short .endcolumn
|
||||||
|
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr
|
||||||
|
add esi, byte SIZEOF_MMWORD ; inptr0
|
||||||
|
dec al ; Yctr
|
||||||
|
jnz near .Yloop_2nd
|
||||||
|
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr2
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st16:
|
||||||
|
lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE
|
||||||
|
cmp ecx, byte 2*SIZEOF_MMWORD
|
||||||
|
jb short .column_st8
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmE
|
||||||
|
movq mmA,mmC
|
||||||
|
sub ecx, byte 2*SIZEOF_MMWORD
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD
|
||||||
|
jmp short .column_st4
|
||||||
|
.column_st8:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st4
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq mmA,mmE
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
add edi, byte SIZEOF_MMWORD
|
||||||
|
.column_st4:
|
||||||
|
movd eax,mmA
|
||||||
|
cmp ecx, byte SIZEOF_DWORD
|
||||||
|
jb short .column_st2
|
||||||
|
mov DWORD [edi+0*SIZEOF_DWORD], eax
|
||||||
|
psrlq mmA,DWORD_BIT
|
||||||
|
movd eax,mmA
|
||||||
|
sub ecx, byte SIZEOF_DWORD
|
||||||
|
add edi, byte SIZEOF_DWORD
|
||||||
|
.column_st2:
|
||||||
|
cmp ecx, byte SIZEOF_WORD
|
||||||
|
jb short .column_st1
|
||||||
|
mov WORD [edi+0*SIZEOF_WORD], ax
|
||||||
|
shr eax,WORD_BIT
|
||||||
|
sub ecx, byte SIZEOF_WORD
|
||||||
|
add edi, byte SIZEOF_WORD
|
||||||
|
.column_st1:
|
||||||
|
cmp ecx, byte SIZEOF_BYTE
|
||||||
|
jb short .endcolumn
|
||||||
|
mov BYTE [edi+0*SIZEOF_BYTE], al
|
||||||
|
|
||||||
|
%else ; RGB_PIXELSIZE == 4 ; -----------
|
||||||
|
|
||||||
|
%ifdef RGBX_FILLER_0XFF
|
||||||
|
pcmpeqb mm6,mm6 ; mm6=(X0 X2 X4 X6 ** ** ** **)
|
||||||
|
pcmpeqb mm7,mm7 ; mm7=(X1 X3 X5 X7 ** ** ** **)
|
||||||
|
%else
|
||||||
|
pxor mm6,mm6 ; mm6=(X0 X2 X4 X6 ** ** ** **)
|
||||||
|
pxor mm7,mm7 ; mm7=(X1 X3 X5 X7 ** ** ** **)
|
||||||
|
%endif
|
||||||
|
; mmA=(00 02 04 06 ** ** ** **), mmB=(01 03 05 07 ** ** ** **)
|
||||||
|
; mmC=(10 12 14 16 ** ** ** **), mmD=(11 13 15 17 ** ** ** **)
|
||||||
|
; mmE=(20 22 24 26 ** ** ** **), mmF=(21 23 25 27 ** ** ** **)
|
||||||
|
; mmG=(30 32 34 36 ** ** ** **), mmH=(31 33 35 37 ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw mmA,mmC ; mmA=(00 10 02 12 04 14 06 16)
|
||||||
|
punpcklbw mmE,mmG ; mmE=(20 30 22 32 24 34 26 36)
|
||||||
|
punpcklbw mmB,mmD ; mmB=(01 11 03 13 05 15 07 17)
|
||||||
|
punpcklbw mmF,mmH ; mmF=(21 31 23 33 25 35 27 37)
|
||||||
|
|
||||||
|
movq mmC,mmA
|
||||||
|
punpcklwd mmA,mmE ; mmA=(00 10 20 30 02 12 22 32)
|
||||||
|
punpckhwd mmC,mmE ; mmC=(04 14 24 34 06 16 26 36)
|
||||||
|
movq mmG,mmB
|
||||||
|
punpcklwd mmB,mmF ; mmB=(01 11 21 31 03 13 23 33)
|
||||||
|
punpckhwd mmG,mmF ; mmG=(05 15 25 35 07 17 27 37)
|
||||||
|
|
||||||
|
movq mmD,mmA
|
||||||
|
punpckldq mmA,mmB ; mmA=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq mmD,mmB ; mmD=(02 12 22 32 03 13 23 33)
|
||||||
|
movq mmH,mmC
|
||||||
|
punpckldq mmC,mmG ; mmC=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq mmH,mmG ; mmH=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmD
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
|
||||||
|
movq MMWORD [edi+3*SIZEOF_MMWORD], mmH
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
jz short .endcolumn
|
||||||
|
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr
|
||||||
|
add esi, byte SIZEOF_MMWORD ; inptr0
|
||||||
|
dec al ; Yctr
|
||||||
|
jnz near .Yloop_2nd
|
||||||
|
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr2
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st16:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/2
|
||||||
|
jb short .column_st8
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmD
|
||||||
|
movq mmA,mmC
|
||||||
|
movq mmD,mmH
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/2
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD
|
||||||
|
.column_st8:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/4
|
||||||
|
jb short .column_st4
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq mmA,mmD
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/4
|
||||||
|
add edi, byte 1*SIZEOF_MMWORD
|
||||||
|
.column_st4:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/8
|
||||||
|
jb short .endcolumn
|
||||||
|
movd DWORD [edi+0*SIZEOF_DWORD], mmA
|
||||||
|
|
||||||
|
%endif ; RGB_PIXELSIZE ; ---------------
|
||||||
|
|
||||||
|
.endcolumn:
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%ifndef USE_DEDICATED_H2V2_MERGED_UPSAMPLE_MMX
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Upsample and color convert for the case of 2:1 horizontal and 2:1 vertical.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_merged_upsample_mmx (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
; JDIMENSION in_row_group_ctr,
|
||||||
|
; JSAMPARRAY output_buf);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPIMAGE input_buf
|
||||||
|
%define in_row_group_ctr(b) (b)+16 ; JDIMENSION in_row_group_ctr
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_merged_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_merged_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
|
||||||
|
mov edi, JSAMPIMAGE [input_buf(ebp)]
|
||||||
|
mov ecx, JDIMENSION [in_row_group_ctr(ebp)]
|
||||||
|
mov esi, JSAMPARRAY [edi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [edi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [edi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)]
|
||||||
|
lea esi, [esi+ecx*SIZEOF_JSAMPROW]
|
||||||
|
|
||||||
|
push edx ; inptr2
|
||||||
|
push ebx ; inptr1
|
||||||
|
push esi ; inptr00
|
||||||
|
mov ebx,esp
|
||||||
|
|
||||||
|
push edi ; output_buf (outptr0)
|
||||||
|
push ecx ; in_row_group_ctr
|
||||||
|
push ebx ; input_buf
|
||||||
|
push eax ; cinfo
|
||||||
|
|
||||||
|
call near EXTN(jpeg_h2v1_merged_upsample_mmx)
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; inptr01
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; outptr1
|
||||||
|
mov POINTER [ebx+0*SIZEOF_POINTER], esi
|
||||||
|
mov POINTER [ebx-1*SIZEOF_POINTER], edi
|
||||||
|
|
||||||
|
call near EXTN(jpeg_h2v1_merged_upsample_mmx)
|
||||||
|
|
||||||
|
add esp, byte 7*SIZEOF_DWORD
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%else ; USE_DEDICATED_H2V2_MERGED_UPSAMPLE_MMX
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Upsample and color convert for the case of 2:1 horizontal and 2:1 vertical.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_merged_upsample_mmx (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
|
||||||
|
; JDIMENSION in_row_group_ctr,
|
||||||
|
; JSAMPARRAY output_buf);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define input_buf(b) (b)+12 ; JSAMPIMAGE input_buf
|
||||||
|
%define in_row_group_ctr(b) (b)+16 ; JDIMENSION in_row_group_ctr
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 10
|
||||||
|
%define inptr1 wk(0)-SIZEOF_JSAMPROW ; JSAMPROW inptr1
|
||||||
|
%define inptr2 inptr1-SIZEOF_JSAMPROW ; JSAMPROW inptr2
|
||||||
|
%define gotptr inptr2-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_merged_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_merged_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [inptr2]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(eax)]
|
||||||
|
mov ecx, JDIMENSION [jdstruct_output_width(ecx)] ; col
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
mov edi, JSAMPIMAGE [input_buf(eax)]
|
||||||
|
mov ecx, JDIMENSION [in_row_group_ctr(eax)]
|
||||||
|
mov esi, JSAMPARRAY [edi+0*SIZEOF_JSAMPARRAY]
|
||||||
|
mov ebx, JSAMPARRAY [edi+1*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edx, JSAMPARRAY [edi+2*SIZEOF_JSAMPARRAY]
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)]
|
||||||
|
mov eax, JSAMPROW [esi+(ecx*2+0)*SIZEOF_JSAMPROW] ; inptr00
|
||||||
|
mov esi, JSAMPROW [esi+(ecx*2+1)*SIZEOF_JSAMPROW] ; inptr01
|
||||||
|
mov ebx, JSAMPROW [ebx+ecx*SIZEOF_JSAMPROW] ; inptr1
|
||||||
|
mov edx, JSAMPROW [edx+ecx*SIZEOF_JSAMPROW] ; inptr2
|
||||||
|
|
||||||
|
pop ecx ; col
|
||||||
|
push eax ; inptr00
|
||||||
|
push esi ; inptr01
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
movpic eax, POINTER [gotptr] ; load GOT address (eax)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [ebx] ; mm6=Cb(01234567)
|
||||||
|
movq mm7, MMWORD [edx] ; mm7=Cr(01234567)
|
||||||
|
|
||||||
|
mov JSAMPROW [inptr1], ebx ; inptr1
|
||||||
|
mov JSAMPROW [inptr2], edx ; inptr2
|
||||||
|
pop edx ; edx=inptr01
|
||||||
|
pop ebx ; ebx=inptr00
|
||||||
|
|
||||||
|
pxor mm1,mm1 ; mm1=(all 0's)
|
||||||
|
pcmpeqw mm3,mm3
|
||||||
|
psllw mm3,7 ; mm3={0xFF80 0xFF80 0xFF80 0xFF80}
|
||||||
|
|
||||||
|
movq mm4,mm6
|
||||||
|
punpckhbw mm6,mm1 ; mm6=Cb(4567)=CbH
|
||||||
|
punpcklbw mm4,mm1 ; mm4=Cb(0123)=CbL
|
||||||
|
movq mm0,mm7
|
||||||
|
punpckhbw mm7,mm1 ; mm7=Cr(4567)=CrH
|
||||||
|
punpcklbw mm0,mm1 ; mm0=Cr(0123)=CrL
|
||||||
|
|
||||||
|
paddw mm6,mm3
|
||||||
|
paddw mm4,mm3
|
||||||
|
paddw mm7,mm3
|
||||||
|
paddw mm0,mm3
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; R = Y + 1.40200 * Cr
|
||||||
|
; G = Y - 0.34414 * Cb - 0.71414 * Cr
|
||||||
|
; B = Y + 1.77200 * Cb
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; R = Y + 0.40200 * Cr + Cr
|
||||||
|
; G = Y - 0.34414 * Cb + 0.28586 * Cr - Cr
|
||||||
|
; B = Y - 0.22800 * Cb + Cb + Cb
|
||||||
|
|
||||||
|
movq mm5,mm6 ; mm5=CbH
|
||||||
|
movq mm2,mm4 ; mm2=CbL
|
||||||
|
paddw mm6,mm6 ; mm6=2*CbH
|
||||||
|
paddw mm4,mm4 ; mm4=2*CbL
|
||||||
|
movq mm1,mm7 ; mm1=CrH
|
||||||
|
movq mm3,mm0 ; mm3=CrL
|
||||||
|
paddw mm7,mm7 ; mm7=2*CrH
|
||||||
|
paddw mm0,mm0 ; mm0=2*CrL
|
||||||
|
|
||||||
|
pmulhw mm6,[GOTOFF(eax,PW_MF0228)] ; mm6=(2*CbH * -FIX(0.22800))
|
||||||
|
pmulhw mm4,[GOTOFF(eax,PW_MF0228)] ; mm4=(2*CbL * -FIX(0.22800))
|
||||||
|
pmulhw mm7,[GOTOFF(eax,PW_F0402)] ; mm7=(2*CrH * FIX(0.40200))
|
||||||
|
pmulhw mm0,[GOTOFF(eax,PW_F0402)] ; mm0=(2*CrL * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw mm6,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw mm4,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw mm6,1 ; mm6=(CbH * -FIX(0.22800))
|
||||||
|
psraw mm4,1 ; mm4=(CbL * -FIX(0.22800))
|
||||||
|
paddw mm7,[GOTOFF(eax,PW_ONE)]
|
||||||
|
paddw mm0,[GOTOFF(eax,PW_ONE)]
|
||||||
|
psraw mm7,1 ; mm7=(CrH * FIX(0.40200))
|
||||||
|
psraw mm0,1 ; mm0=(CrL * FIX(0.40200))
|
||||||
|
|
||||||
|
paddw mm6,mm5
|
||||||
|
paddw mm4,mm2
|
||||||
|
paddw mm6,mm5 ; mm6=(CbH * FIX(1.77200))=(B-Y)H
|
||||||
|
paddw mm4,mm2 ; mm4=(CbL * FIX(1.77200))=(B-Y)L
|
||||||
|
paddw mm7,mm1 ; mm7=(CrH * FIX(1.40200))=(R-Y)H
|
||||||
|
paddw mm0,mm3 ; mm0=(CrL * FIX(1.40200))=(R-Y)L
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm6 ; wk(0)=(B-Y)H
|
||||||
|
movq MMWORD [wk(1)], mm7 ; wk(1)=(R-Y)H
|
||||||
|
|
||||||
|
movq mm6,mm5
|
||||||
|
movq mm7,mm2
|
||||||
|
punpcklwd mm5,mm1
|
||||||
|
punpckhwd mm6,mm1
|
||||||
|
pmaddwd mm5,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd mm6,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
punpcklwd mm2,mm3
|
||||||
|
punpckhwd mm7,mm3
|
||||||
|
pmaddwd mm2,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
pmaddwd mm7,[GOTOFF(eax,PW_MF0344_F0285)]
|
||||||
|
|
||||||
|
paddd mm5,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd mm6,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad mm5,SCALEBITS
|
||||||
|
psrad mm6,SCALEBITS
|
||||||
|
paddd mm2,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
paddd mm7,[GOTOFF(eax,PD_ONEHALF)]
|
||||||
|
psrad mm2,SCALEBITS
|
||||||
|
psrad mm7,SCALEBITS
|
||||||
|
|
||||||
|
packssdw mm5,mm6 ; mm5=CbH*-FIX(0.344)+CrH*FIX(0.285)
|
||||||
|
packssdw mm2,mm7 ; mm2=CbL*-FIX(0.344)+CrL*FIX(0.285)
|
||||||
|
psubw mm5,mm1 ; mm5=CbH*-FIX(0.344)+CrH*-FIX(0.714)=(G-Y)H
|
||||||
|
psubw mm2,mm3 ; mm2=CbL*-FIX(0.344)+CrL*-FIX(0.714)=(G-Y)L
|
||||||
|
|
||||||
|
movq MMWORD [wk(2)], mm5 ; wk(2)=(G-Y)H
|
||||||
|
|
||||||
|
mov ah,2 ; YHctr
|
||||||
|
jmp short .YHloop_1st
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.YHloop_2nd:
|
||||||
|
movq mm0, MMWORD [wk(1)] ; mm0=(R-Y)H
|
||||||
|
movq mm2, MMWORD [wk(2)] ; mm2=(G-Y)H
|
||||||
|
movq mm4, MMWORD [wk(0)] ; mm4=(B-Y)H
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.YHloop_1st:
|
||||||
|
movq MMWORD [wk(3)], mm0 ; wk(3)=(R-Y)(L/H)
|
||||||
|
movq MMWORD [wk(4)], mm2 ; wk(4)=(G-Y)(L/H)
|
||||||
|
movq MMWORD [wk(5)], mm4 ; wk(5)=(B-Y)(L/H)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [ebx] ; mm7=Y(01234567)
|
||||||
|
|
||||||
|
mov al,2 ; YVctr
|
||||||
|
jmp short .YVloop_1st
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.YVloop_2nd:
|
||||||
|
movq mm0, MMWORD [wk(3)] ; mm0=(R-Y)(L/H)
|
||||||
|
movq mm2, MMWORD [wk(4)] ; mm2=(G-Y)(L/H)
|
||||||
|
movq mm4, MMWORD [wk(5)] ; mm4=(B-Y)(L/H)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [edx] ; mm7=Y(01234567)
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.YVloop_1st:
|
||||||
|
pcmpeqw mm6,mm6
|
||||||
|
psrlw mm6,BYTE_BIT ; mm6={0xFF 0x00 0xFF 0x00 ..}
|
||||||
|
pand mm6,mm7 ; mm6=Y(0246)=YE
|
||||||
|
psrlw mm7,BYTE_BIT ; mm7=Y(1357)=YO
|
||||||
|
|
||||||
|
movq mm1,mm0 ; mm1=mm0=(R-Y)(L/H)
|
||||||
|
movq mm3,mm2 ; mm3=mm2=(G-Y)(L/H)
|
||||||
|
movq mm5,mm4 ; mm5=mm4=(B-Y)(L/H)
|
||||||
|
|
||||||
|
paddw mm0,mm6 ; mm0=((R-Y)+YE)=RE=(R0 R2 R4 R6)
|
||||||
|
paddw mm1,mm7 ; mm1=((R-Y)+YO)=RO=(R1 R3 R5 R7)
|
||||||
|
packuswb mm0,mm0 ; mm0=(R0 R2 R4 R6 ** ** ** **)
|
||||||
|
packuswb mm1,mm1 ; mm1=(R1 R3 R5 R7 ** ** ** **)
|
||||||
|
|
||||||
|
paddw mm2,mm6 ; mm2=((G-Y)+YE)=GE=(G0 G2 G4 G6)
|
||||||
|
paddw mm3,mm7 ; mm3=((G-Y)+YO)=GO=(G1 G3 G5 G7)
|
||||||
|
packuswb mm2,mm2 ; mm2=(G0 G2 G4 G6 ** ** ** **)
|
||||||
|
packuswb mm3,mm3 ; mm3=(G1 G3 G5 G7 ** ** ** **)
|
||||||
|
|
||||||
|
paddw mm4,mm6 ; mm4=((B-Y)+YE)=BE=(B0 B2 B4 B6)
|
||||||
|
paddw mm5,mm7 ; mm5=((B-Y)+YO)=BO=(B1 B3 B5 B7)
|
||||||
|
packuswb mm4,mm4 ; mm4=(B0 B2 B4 B6 ** ** ** **)
|
||||||
|
packuswb mm5,mm5 ; mm5=(B1 B3 B5 B7 ** ** ** **)
|
||||||
|
|
||||||
|
%if RGB_PIXELSIZE == 3 ; ---------------
|
||||||
|
|
||||||
|
; mmA=(00 02 04 06 ** ** ** **), mmB=(01 03 05 07 ** ** ** **)
|
||||||
|
; mmC=(10 12 14 16 ** ** ** **), mmD=(11 13 15 17 ** ** ** **)
|
||||||
|
; mmE=(20 22 24 26 ** ** ** **), mmF=(21 23 25 27 ** ** ** **)
|
||||||
|
; mmG=(** ** ** ** ** ** ** **), mmH=(** ** ** ** ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw mmA,mmC ; mmA=(00 10 02 12 04 14 06 16)
|
||||||
|
punpcklbw mmE,mmB ; mmE=(20 01 22 03 24 05 26 07)
|
||||||
|
punpcklbw mmD,mmF ; mmD=(11 21 13 23 15 25 17 27)
|
||||||
|
|
||||||
|
movq mmG,mmA
|
||||||
|
movq mmH,mmA
|
||||||
|
punpcklwd mmA,mmE ; mmA=(00 10 20 01 02 12 22 03)
|
||||||
|
punpckhwd mmG,mmE ; mmG=(04 14 24 05 06 16 26 07)
|
||||||
|
|
||||||
|
psrlq mmH,2*BYTE_BIT ; mmH=(02 12 04 14 06 16 -- --)
|
||||||
|
psrlq mmE,2*BYTE_BIT ; mmE=(22 03 24 05 26 07 -- --)
|
||||||
|
|
||||||
|
movq mmC,mmD
|
||||||
|
movq mmB,mmD
|
||||||
|
punpcklwd mmD,mmH ; mmD=(11 21 02 12 13 23 04 14)
|
||||||
|
punpckhwd mmC,mmH ; mmC=(15 25 06 16 17 27 -- --)
|
||||||
|
|
||||||
|
psrlq mmB,2*BYTE_BIT ; mmB=(13 23 15 25 17 27 -- --)
|
||||||
|
|
||||||
|
movq mmF,mmE
|
||||||
|
punpcklwd mmE,mmB ; mmE=(22 03 13 23 24 05 15 25)
|
||||||
|
punpckhwd mmF,mmB ; mmF=(26 07 17 27 -- -- -- --)
|
||||||
|
|
||||||
|
punpckldq mmA,mmD ; mmA=(00 10 20 01 11 21 02 12)
|
||||||
|
punpckldq mmE,mmG ; mmE=(22 03 13 23 04 14 24 05)
|
||||||
|
punpckldq mmC,mmF ; mmC=(15 25 06 16 26 07 17 27)
|
||||||
|
|
||||||
|
dec al ; YVctr
|
||||||
|
jz short .YVloop_break
|
||||||
|
|
||||||
|
movq MMWORD [wk(6)], mmA
|
||||||
|
movq MMWORD [wk(7)], mmE
|
||||||
|
movq MMWORD [wk(8)], mmC
|
||||||
|
|
||||||
|
jmp near .YVloop_2nd
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.YVloop_break:
|
||||||
|
movq mmH, MMWORD [wk(6)]
|
||||||
|
movq mmB, MMWORD [wk(7)]
|
||||||
|
movq mmD, MMWORD [wk(8)]
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
|
||||||
|
movq MMWORD [esi+0*SIZEOF_MMWORD], mmH
|
||||||
|
movq MMWORD [esi+1*SIZEOF_MMWORD], mmB
|
||||||
|
movq MMWORD [esi+2*SIZEOF_MMWORD], mmD
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmE
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
jz near .endcolumn
|
||||||
|
|
||||||
|
add esi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr0
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr1
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr00
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr01
|
||||||
|
dec ah ; YHctr
|
||||||
|
jnz near .YHloop_2nd
|
||||||
|
|
||||||
|
push ebx ; inptr00
|
||||||
|
push edx ; inptr01
|
||||||
|
mov ebx, JSAMPROW [inptr1] ; ebx=inptr1
|
||||||
|
mov edx, JSAMPROW [inptr2] ; edx=inptr2
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr2
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st16:
|
||||||
|
lea ecx, [ecx+ecx*2] ; imul ecx, RGB_PIXELSIZE
|
||||||
|
cmp ecx, byte 2*SIZEOF_MMWORD
|
||||||
|
jb short .column_st8
|
||||||
|
movq MMWORD [esi+0*SIZEOF_MMWORD], mmH
|
||||||
|
movq MMWORD [esi+1*SIZEOF_MMWORD], mmB
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmE
|
||||||
|
movq mmH,mmD
|
||||||
|
movq mmA,mmC
|
||||||
|
sub ecx, byte 2*SIZEOF_MMWORD
|
||||||
|
add esi, byte 2*SIZEOF_MMWORD
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD
|
||||||
|
jmp short .column_st4
|
||||||
|
.column_st8:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st4
|
||||||
|
movq MMWORD [esi+0*SIZEOF_MMWORD], mmH
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq mmH,mmB
|
||||||
|
movq mmA,mmE
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
add esi, byte SIZEOF_MMWORD
|
||||||
|
add edi, byte SIZEOF_MMWORD
|
||||||
|
.column_st4:
|
||||||
|
movd eax,mmH
|
||||||
|
movd edx,mmA
|
||||||
|
cmp ecx, byte SIZEOF_DWORD
|
||||||
|
jb short .column_st2
|
||||||
|
mov DWORD [esi+0*SIZEOF_DWORD], eax
|
||||||
|
mov DWORD [edi+0*SIZEOF_DWORD], edx
|
||||||
|
psrlq mmH,DWORD_BIT
|
||||||
|
psrlq mmA,DWORD_BIT
|
||||||
|
movd eax,mmH
|
||||||
|
movd edx,mmA
|
||||||
|
sub ecx, byte SIZEOF_DWORD
|
||||||
|
add esi, byte SIZEOF_DWORD
|
||||||
|
add edi, byte SIZEOF_DWORD
|
||||||
|
.column_st2:
|
||||||
|
cmp ecx, byte SIZEOF_WORD
|
||||||
|
jb short .column_st1
|
||||||
|
mov WORD [esi+0*SIZEOF_WORD], ax
|
||||||
|
mov WORD [edi+0*SIZEOF_WORD], dx
|
||||||
|
shr eax,WORD_BIT
|
||||||
|
shr edx,WORD_BIT
|
||||||
|
sub ecx, byte SIZEOF_WORD
|
||||||
|
add esi, byte SIZEOF_WORD
|
||||||
|
add edi, byte SIZEOF_WORD
|
||||||
|
.column_st1:
|
||||||
|
cmp ecx, byte SIZEOF_BYTE
|
||||||
|
jb short .endcolumn
|
||||||
|
mov BYTE [esi+0*SIZEOF_BYTE], al
|
||||||
|
mov BYTE [edi+0*SIZEOF_BYTE], dl
|
||||||
|
|
||||||
|
%else ; RGB_PIXELSIZE == 4 ; -----------
|
||||||
|
|
||||||
|
%ifdef RGBX_FILLER_0XFF
|
||||||
|
pcmpeqb mm6,mm6 ; mm6=(X0 X2 X4 X6 ** ** ** **)
|
||||||
|
pcmpeqb mm7,mm7 ; mm7=(X1 X3 X5 X7 ** ** ** **)
|
||||||
|
%else
|
||||||
|
pxor mm6,mm6 ; mm6=(X0 X2 X4 X6 ** ** ** **)
|
||||||
|
pxor mm7,mm7 ; mm7=(X1 X3 X5 X7 ** ** ** **)
|
||||||
|
%endif
|
||||||
|
; mmA=(00 02 04 06 ** ** ** **), mmB=(01 03 05 07 ** ** ** **)
|
||||||
|
; mmC=(10 12 14 16 ** ** ** **), mmD=(11 13 15 17 ** ** ** **)
|
||||||
|
; mmE=(20 22 24 26 ** ** ** **), mmF=(21 23 25 27 ** ** ** **)
|
||||||
|
; mmG=(30 32 34 36 ** ** ** **), mmH=(31 33 35 37 ** ** ** **)
|
||||||
|
|
||||||
|
punpcklbw mmA,mmC ; mmA=(00 10 02 12 04 14 06 16)
|
||||||
|
punpcklbw mmE,mmG ; mmE=(20 30 22 32 24 34 26 36)
|
||||||
|
punpcklbw mmB,mmD ; mmB=(01 11 03 13 05 15 07 17)
|
||||||
|
punpcklbw mmF,mmH ; mmF=(21 31 23 33 25 35 27 37)
|
||||||
|
|
||||||
|
movq mmC,mmA
|
||||||
|
punpcklwd mmA,mmE ; mmA=(00 10 20 30 02 12 22 32)
|
||||||
|
punpckhwd mmC,mmE ; mmC=(04 14 24 34 06 16 26 36)
|
||||||
|
movq mmG,mmB
|
||||||
|
punpcklwd mmB,mmF ; mmB=(01 11 21 31 03 13 23 33)
|
||||||
|
punpckhwd mmG,mmF ; mmG=(05 15 25 35 07 17 27 37)
|
||||||
|
|
||||||
|
movq mmD,mmA
|
||||||
|
punpckldq mmA,mmB ; mmA=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq mmD,mmB ; mmD=(02 12 22 32 03 13 23 33)
|
||||||
|
movq mmH,mmC
|
||||||
|
punpckldq mmC,mmG ; mmC=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq mmH,mmG ; mmH=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
dec al ; YVctr
|
||||||
|
jz short .YVloop_break
|
||||||
|
|
||||||
|
movq MMWORD [wk(6)], mmA
|
||||||
|
movq MMWORD [wk(7)], mmD
|
||||||
|
movq MMWORD [wk(8)], mmC
|
||||||
|
movq MMWORD [wk(9)], mmH
|
||||||
|
|
||||||
|
jmp near .YVloop_2nd
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.YVloop_break:
|
||||||
|
movq mmE, MMWORD [wk(6)]
|
||||||
|
movq mmF, MMWORD [wk(7)]
|
||||||
|
movq mmB, MMWORD [wk(8)]
|
||||||
|
movq mmG, MMWORD [wk(9)]
|
||||||
|
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD
|
||||||
|
jb short .column_st16
|
||||||
|
|
||||||
|
movq MMWORD [esi+0*SIZEOF_MMWORD], mmE
|
||||||
|
movq MMWORD [esi+1*SIZEOF_MMWORD], mmF
|
||||||
|
movq MMWORD [esi+2*SIZEOF_MMWORD], mmB
|
||||||
|
movq MMWORD [esi+3*SIZEOF_MMWORD], mmG
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmD
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mmC
|
||||||
|
movq MMWORD [edi+3*SIZEOF_MMWORD], mmH
|
||||||
|
|
||||||
|
sub ecx, byte SIZEOF_MMWORD
|
||||||
|
jz short .endcolumn
|
||||||
|
|
||||||
|
add esi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr0
|
||||||
|
add edi, byte RGB_PIXELSIZE*SIZEOF_MMWORD ; outptr1
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr00
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr01
|
||||||
|
dec ah ; YHctr
|
||||||
|
jnz near .YHloop_2nd
|
||||||
|
|
||||||
|
push ebx ; inptr00
|
||||||
|
push edx ; inptr01
|
||||||
|
mov ebx, JSAMPROW [inptr1] ; ebx=inptr1
|
||||||
|
mov edx, JSAMPROW [inptr2] ; edx=inptr2
|
||||||
|
add ebx, byte SIZEOF_MMWORD ; inptr1
|
||||||
|
add edx, byte SIZEOF_MMWORD ; inptr2
|
||||||
|
jmp near .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.column_st16:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/2
|
||||||
|
jb short .column_st8
|
||||||
|
movq MMWORD [esi+0*SIZEOF_MMWORD], mmE
|
||||||
|
movq MMWORD [esi+1*SIZEOF_MMWORD], mmF
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mmD
|
||||||
|
movq mmE,mmB
|
||||||
|
movq mmF,mmG
|
||||||
|
movq mmA,mmC
|
||||||
|
movq mmD,mmH
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/2
|
||||||
|
add esi, byte 2*SIZEOF_MMWORD
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD
|
||||||
|
.column_st8:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/4
|
||||||
|
jb short .column_st4
|
||||||
|
movq MMWORD [esi+0*SIZEOF_MMWORD], mmE
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mmA
|
||||||
|
movq mmE,mmF
|
||||||
|
movq mmA,mmD
|
||||||
|
sub ecx, byte SIZEOF_MMWORD/4
|
||||||
|
add esi, byte 1*SIZEOF_MMWORD
|
||||||
|
add edi, byte 1*SIZEOF_MMWORD
|
||||||
|
.column_st4:
|
||||||
|
cmp ecx, byte SIZEOF_MMWORD/8
|
||||||
|
jb short .endcolumn
|
||||||
|
movd DWORD [esi+0*SIZEOF_DWORD], mmE
|
||||||
|
movd DWORD [edi+0*SIZEOF_DWORD], mmA
|
||||||
|
|
||||||
|
%endif ; RGB_PIXELSIZE ; ---------------
|
||||||
|
|
||||||
|
.endcolumn:
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; !USE_DEDICATED_H2V2_MERGED_UPSAMPLE_MMX
|
||||||
|
|
||||||
|
%endif ; JDMERGE_MMX_SUPPORTED
|
||||||
|
%endif ; UPSAMPLE_MERGING_SUPPORTED
|
||||||
|
%endif ; RGB_PIXELSIZE == 3 || RGB_PIXELSIZE == 4
|
||||||
1272
jdmerss2.asm
Normal file
1272
jdmerss2.asm
Normal file
File diff suppressed because it is too large
Load Diff
333
jdphuff.c
333
jdphuff.c
@@ -1,10 +1,17 @@
|
|||||||
/*
|
/*
|
||||||
* jdphuff.c
|
* jdphuff.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1995-1996, Thomas G. Lane.
|
* Copyright (C) 1995-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified to improve performance.
|
||||||
|
* Last Modified : October 31, 2004
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains Huffman entropy decoding routines for progressive JPEG.
|
* This file contains Huffman entropy decoding routines for progressive JPEG.
|
||||||
*
|
*
|
||||||
* Much of the complexity here has to do with supporting input suspension.
|
* Much of the complexity here has to do with supporting input suspension.
|
||||||
@@ -69,6 +76,7 @@ typedef struct {
|
|||||||
d_derived_tbl * derived_tbls[NUM_HUFF_TBLS];
|
d_derived_tbl * derived_tbls[NUM_HUFF_TBLS];
|
||||||
|
|
||||||
d_derived_tbl * ac_derived_tbl; /* active table during an AC scan */
|
d_derived_tbl * ac_derived_tbl; /* active table during an AC scan */
|
||||||
|
d_derived_tbl * dc_derived_tbls[MAX_COMPS_IN_SCAN];
|
||||||
} phuff_entropy_decoder;
|
} phuff_entropy_decoder;
|
||||||
|
|
||||||
typedef phuff_entropy_decoder * phuff_entropy_ptr;
|
typedef phuff_entropy_decoder * phuff_entropy_ptr;
|
||||||
@@ -119,6 +127,12 @@ start_pass_phuff_decoder (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
if (cinfo->Al > 13) /* need not check for < 0 */
|
if (cinfo->Al > 13) /* need not check for < 0 */
|
||||||
bad = TRUE;
|
bad = TRUE;
|
||||||
|
/* Arguably the maximum Al value should be less than 13 for 8-bit precision,
|
||||||
|
* but the spec doesn't say so, and we try to be liberal about what we
|
||||||
|
* accept. Note: large Al values could result in out-of-range DC
|
||||||
|
* coefficients during early scans, leading to bizarre displays due to
|
||||||
|
* overflows in the IDCT math. But we won't crash.
|
||||||
|
*/
|
||||||
if (bad)
|
if (bad)
|
||||||
ERREXIT4(cinfo, JERR_BAD_PROGRESSION,
|
ERREXIT4(cinfo, JERR_BAD_PROGRESSION,
|
||||||
cinfo->Ss, cinfo->Se, cinfo->Ah, cinfo->Al);
|
cinfo->Ss, cinfo->Se, cinfo->Ah, cinfo->Al);
|
||||||
@@ -160,18 +174,13 @@ start_pass_phuff_decoder (j_decompress_ptr cinfo)
|
|||||||
if (is_DC_band) {
|
if (is_DC_band) {
|
||||||
if (cinfo->Ah == 0) { /* DC refinement needs no table */
|
if (cinfo->Ah == 0) { /* DC refinement needs no table */
|
||||||
tbl = compptr->dc_tbl_no;
|
tbl = compptr->dc_tbl_no;
|
||||||
if (tbl < 0 || tbl >= NUM_HUFF_TBLS ||
|
jpeg_make_d_derived_tbl(cinfo, TRUE, tbl,
|
||||||
cinfo->dc_huff_tbl_ptrs[tbl] == NULL)
|
|
||||||
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tbl);
|
|
||||||
jpeg_make_d_derived_tbl(cinfo, cinfo->dc_huff_tbl_ptrs[tbl],
|
|
||||||
& entropy->derived_tbls[tbl]);
|
& entropy->derived_tbls[tbl]);
|
||||||
|
entropy->dc_derived_tbls[ci] = entropy->derived_tbls[tbl];
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
tbl = compptr->ac_tbl_no;
|
tbl = compptr->ac_tbl_no;
|
||||||
if (tbl < 0 || tbl >= NUM_HUFF_TBLS ||
|
jpeg_make_d_derived_tbl(cinfo, FALSE, tbl,
|
||||||
cinfo->ac_huff_tbl_ptrs[tbl] == NULL)
|
|
||||||
ERREXIT1(cinfo, JERR_NO_HUFF_TABLE, tbl);
|
|
||||||
jpeg_make_d_derived_tbl(cinfo, cinfo->ac_huff_tbl_ptrs[tbl],
|
|
||||||
& entropy->derived_tbls[tbl]);
|
& entropy->derived_tbls[tbl]);
|
||||||
/* remember the single active table */
|
/* remember the single active table */
|
||||||
entropy->ac_derived_tbl = entropy->derived_tbls[tbl];
|
entropy->ac_derived_tbl = entropy->derived_tbls[tbl];
|
||||||
@@ -183,7 +192,7 @@ start_pass_phuff_decoder (j_decompress_ptr cinfo)
|
|||||||
/* Initialize bitread state variables */
|
/* Initialize bitread state variables */
|
||||||
entropy->bitstate.bits_left = 0;
|
entropy->bitstate.bits_left = 0;
|
||||||
entropy->bitstate.get_buffer = 0; /* unnecessary, but keeps Purify quiet */
|
entropy->bitstate.get_buffer = 0; /* unnecessary, but keeps Purify quiet */
|
||||||
entropy->bitstate.printed_eod = FALSE;
|
entropy->pub.insufficient_data = FALSE;
|
||||||
|
|
||||||
/* Initialize private state variables */
|
/* Initialize private state variables */
|
||||||
entropy->saved.EOBRUN = 0;
|
entropy->saved.EOBRUN = 0;
|
||||||
@@ -193,32 +202,6 @@ start_pass_phuff_decoder (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
/*
|
|
||||||
* Figure F.12: extend sign bit.
|
|
||||||
* On some machines, a shift and add will be faster than a table lookup.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#ifdef AVOID_TABLES
|
|
||||||
|
|
||||||
#define HUFF_EXTEND(x,s) ((x) < (1<<((s)-1)) ? (x) + (((-1)<<(s)) + 1) : (x))
|
|
||||||
|
|
||||||
#else
|
|
||||||
|
|
||||||
#define HUFF_EXTEND(x,s) ((x) < extend_test[s] ? (x) + extend_offset[s] : (x))
|
|
||||||
|
|
||||||
static const int extend_test[16] = /* entry n is 2**(n-1) */
|
|
||||||
{ 0, 0x0001, 0x0002, 0x0004, 0x0008, 0x0010, 0x0020, 0x0040, 0x0080,
|
|
||||||
0x0100, 0x0200, 0x0400, 0x0800, 0x1000, 0x2000, 0x4000 };
|
|
||||||
|
|
||||||
static const int extend_offset[16] = /* entry n is (-1 << n) + 1 */
|
|
||||||
{ 0, ((-1)<<1) + 1, ((-1)<<2) + 1, ((-1)<<3) + 1, ((-1)<<4) + 1,
|
|
||||||
((-1)<<5) + 1, ((-1)<<6) + 1, ((-1)<<7) + 1, ((-1)<<8) + 1,
|
|
||||||
((-1)<<9) + 1, ((-1)<<10) + 1, ((-1)<<11) + 1, ((-1)<<12) + 1,
|
|
||||||
((-1)<<13) + 1, ((-1)<<14) + 1, ((-1)<<15) + 1 };
|
|
||||||
|
|
||||||
#endif /* AVOID_TABLES */
|
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Check for a restart marker & resynchronize decoder.
|
* Check for a restart marker & resynchronize decoder.
|
||||||
* Returns FALSE if must suspend.
|
* Returns FALSE if must suspend.
|
||||||
@@ -248,8 +231,13 @@ process_restart (j_decompress_ptr cinfo)
|
|||||||
/* Reset restart counter */
|
/* Reset restart counter */
|
||||||
entropy->restarts_to_go = cinfo->restart_interval;
|
entropy->restarts_to_go = cinfo->restart_interval;
|
||||||
|
|
||||||
/* Next segment can get another out-of-data warning */
|
/* Reset out-of-data flag, unless read_restart_marker left us smack up
|
||||||
entropy->bitstate.printed_eod = FALSE;
|
* against a marker. In that case we will end up treating the next data
|
||||||
|
* segment as empty, and we can avoid producing bogus output pixels by
|
||||||
|
* leaving the flag set.
|
||||||
|
*/
|
||||||
|
if (cinfo->unread_marker == 0)
|
||||||
|
entropy->pub.insufficient_data = FALSE;
|
||||||
|
|
||||||
return TRUE;
|
return TRUE;
|
||||||
}
|
}
|
||||||
@@ -282,13 +270,9 @@ decode_mcu_DC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
{
|
{
|
||||||
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
||||||
int Al = cinfo->Al;
|
int Al = cinfo->Al;
|
||||||
register int s, r;
|
int blkn;
|
||||||
int blkn, ci;
|
|
||||||
JBLOCKROW block;
|
|
||||||
BITREAD_STATE_VARS;
|
BITREAD_STATE_VARS;
|
||||||
savable_state state;
|
savable_state state;
|
||||||
d_derived_tbl * tbl;
|
|
||||||
jpeg_component_info * compptr;
|
|
||||||
|
|
||||||
/* Process restart marker if needed; may have to suspend */
|
/* Process restart marker if needed; may have to suspend */
|
||||||
if (cinfo->restart_interval) {
|
if (cinfo->restart_interval) {
|
||||||
@@ -297,6 +281,11 @@ decode_mcu_DC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
return FALSE;
|
return FALSE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* If we've run out of data, just leave the MCU set to zeroes.
|
||||||
|
* This way, we return uniform gray for the remainder of the segment.
|
||||||
|
*/
|
||||||
|
if (! entropy->pub.insufficient_data) {
|
||||||
|
|
||||||
/* Load up working state */
|
/* Load up working state */
|
||||||
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
||||||
ASSIGN_STATE(state, entropy->saved);
|
ASSIGN_STATE(state, entropy->saved);
|
||||||
@@ -304,31 +293,78 @@ decode_mcu_DC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
/* Outer loop handles each block in the MCU */
|
/* Outer loop handles each block in the MCU */
|
||||||
|
|
||||||
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
block = MCU_data[blkn];
|
JBLOCKROW block = MCU_data[blkn];
|
||||||
ci = cinfo->MCU_membership[blkn];
|
int ci = cinfo->MCU_membership[blkn];
|
||||||
compptr = cinfo->cur_comp_info[ci];
|
d_derived_tbl * tbl = entropy->dc_derived_tbls[ci];
|
||||||
tbl = entropy->derived_tbls[compptr->dc_tbl_no];
|
register int s;
|
||||||
|
|
||||||
/* Decode a single block's worth of coefficients */
|
/* Decode a single block's worth of coefficients */
|
||||||
|
|
||||||
/* Section F.2.2.1: decode the DC coefficient difference */
|
/* Section F.2.2.1: decode the DC coefficient difference */
|
||||||
HUFF_DECODE(s, br_state, tbl, return FALSE, label1);
|
{ /* HUFFX_DECODE */
|
||||||
|
register int nb, look, t;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
register const JOCTET * next_input_byte = br_state.next_input_byte;
|
||||||
|
register size_t bytes_in_buffer = br_state.bytes_in_buffer;
|
||||||
|
if (cinfo->unread_marker == 0) {
|
||||||
|
while (bits_left < MIN_GET_BITS) {
|
||||||
|
register int c;
|
||||||
|
if (bytes_in_buffer == 0 ||
|
||||||
|
(c = GETJOCTET(*next_input_byte)) == 0xFF) {
|
||||||
|
goto label11; }
|
||||||
|
bytes_in_buffer--; next_input_byte++;
|
||||||
|
get_buffer = (get_buffer << 8) | c;
|
||||||
|
bits_left += 8;
|
||||||
|
}
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
} else {
|
||||||
|
label11:
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
if (! jpeg_fill_bit_buffer(&br_state,get_buffer,bits_left, 0)) {
|
||||||
|
return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
nb = 1; goto label1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
look = PEEK_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
if ((nb = tbl->lookx_nbits[look]) != 0) {
|
||||||
|
s = tbl->lookx_val[look];
|
||||||
|
if (nb <= HUFFX_LOOKAHEAD) {
|
||||||
|
DROP_BITS(nb);
|
||||||
|
} else {
|
||||||
|
DROP_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
nb -= HUFFX_LOOKAHEAD;
|
||||||
|
CHECK_BIT_BUFFER(br_state, nb, return FALSE);
|
||||||
|
s += GET_BITS(nb);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nb = HUFFX_LOOKAHEAD;
|
||||||
|
label1:
|
||||||
|
if ((s=jpeg_huff_decode(&br_state,get_buffer,bits_left,tbl,nb))
|
||||||
|
< 0) { return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
if (s) {
|
if (s) {
|
||||||
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
||||||
r = GET_BITS(s);
|
t = GET_BITS(s);
|
||||||
s = HUFF_EXTEND(r, s);
|
s = HUFF_EXTEND(t, s);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Convert DC difference to actual value, update last_dc_val */
|
/* Convert DC difference to actual value, update last_dc_val */
|
||||||
s += state.last_dc_val[ci];
|
s += state.last_dc_val[ci];
|
||||||
state.last_dc_val[ci] = s;
|
state.last_dc_val[ci] = s;
|
||||||
/* Scale and output the DC coefficient (assumes jpeg_natural_order[0]=0) */
|
/* Scale and output the coefficient (assumes jpeg_natural_order[0]=0) */
|
||||||
(*block)[0] = (JCOEF) (s << Al);
|
(*block)[0] = (JCOEF) (s << Al);
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Completed MCU, so update state */
|
/* Completed MCU, so update state */
|
||||||
BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
|
BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
|
||||||
ASSIGN_STATE(entropy->saved, state);
|
ASSIGN_STATE(entropy->saved, state);
|
||||||
|
}
|
||||||
|
|
||||||
/* Account for restart interval (no-op if not using restarts) */
|
/* Account for restart interval (no-op if not using restarts) */
|
||||||
entropy->restarts_to_go--;
|
entropy->restarts_to_go--;
|
||||||
@@ -348,11 +384,8 @@ decode_mcu_AC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
||||||
int Se = cinfo->Se;
|
int Se = cinfo->Se;
|
||||||
int Al = cinfo->Al;
|
int Al = cinfo->Al;
|
||||||
register int s, k, r;
|
|
||||||
unsigned int EOBRUN;
|
unsigned int EOBRUN;
|
||||||
JBLOCKROW block;
|
|
||||||
BITREAD_STATE_VARS;
|
BITREAD_STATE_VARS;
|
||||||
d_derived_tbl * tbl;
|
|
||||||
|
|
||||||
/* Process restart marker if needed; may have to suspend */
|
/* Process restart marker if needed; may have to suspend */
|
||||||
if (cinfo->restart_interval) {
|
if (cinfo->restart_interval) {
|
||||||
@@ -361,29 +394,86 @@ decode_mcu_AC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
return FALSE;
|
return FALSE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* If we've run out of data, just leave the MCU set to zeroes.
|
||||||
|
* This way, we return uniform gray for the remainder of the segment.
|
||||||
|
*/
|
||||||
|
if (! entropy->pub.insufficient_data) {
|
||||||
|
|
||||||
/* Load up working state.
|
/* Load up working state.
|
||||||
* We can avoid loading/saving bitread state if in an EOB run.
|
* We can avoid loading/saving bitread state if in an EOB run.
|
||||||
*/
|
*/
|
||||||
EOBRUN = entropy->saved.EOBRUN; /* only part of saved state we care about */
|
EOBRUN = entropy->saved.EOBRUN; /* only part of saved state we need */
|
||||||
|
|
||||||
/* There is always only one block per MCU */
|
/* There is always only one block per MCU */
|
||||||
|
|
||||||
if (EOBRUN > 0) /* if it's a band of zeroes... */
|
if (EOBRUN > 0) { /* if it's a band of zeroes... */
|
||||||
EOBRUN--; /* ...process it now (we do nothing) */
|
EOBRUN--; /* ...process it now (we do nothing) */
|
||||||
else {
|
} else {
|
||||||
|
JBLOCKROW block = MCU_data[0];
|
||||||
|
d_derived_tbl * tbl = entropy->ac_derived_tbl;
|
||||||
|
register int s, k, r;
|
||||||
|
|
||||||
|
/* Load up working state */
|
||||||
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
||||||
block = MCU_data[0];
|
|
||||||
tbl = entropy->ac_derived_tbl;
|
|
||||||
|
|
||||||
for (k = cinfo->Ss; k <= Se; k++) {
|
for (k = cinfo->Ss; k <= Se; k++) {
|
||||||
HUFF_DECODE(s, br_state, tbl, return FALSE, label2);
|
{ /* HUFFX_DECODE */
|
||||||
r = s >> 4;
|
register int nb, look, t;
|
||||||
s &= 15;
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
register const JOCTET * next_input_byte = br_state.next_input_byte;
|
||||||
|
register size_t bytes_in_buffer = br_state.bytes_in_buffer;
|
||||||
|
if (cinfo->unread_marker == 0) {
|
||||||
|
while (bits_left < MIN_GET_BITS) {
|
||||||
|
register int c;
|
||||||
|
if (bytes_in_buffer == 0 ||
|
||||||
|
(c = GETJOCTET(*next_input_byte)) == 0xFF) {
|
||||||
|
goto label21; }
|
||||||
|
bytes_in_buffer--; next_input_byte++;
|
||||||
|
get_buffer = (get_buffer << 8) | c;
|
||||||
|
bits_left += 8;
|
||||||
|
}
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
} else {
|
||||||
|
label21:
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
if (! jpeg_fill_bit_buffer(&br_state,get_buffer,bits_left, 0)) {
|
||||||
|
return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
nb = 1; goto label2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
look = PEEK_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
if ((nb = tbl->lookx_nbits[look]) != 0) {
|
||||||
|
s = tbl->lookx_val[look];
|
||||||
|
r = tbl->lookx_sym[look] >> 4;
|
||||||
|
if (nb <= HUFFX_LOOKAHEAD) {
|
||||||
|
DROP_BITS(nb);
|
||||||
|
} else {
|
||||||
|
DROP_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
nb -= HUFFX_LOOKAHEAD;
|
||||||
|
CHECK_BIT_BUFFER(br_state, nb, return FALSE);
|
||||||
|
s += GET_BITS(nb);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nb = HUFFX_LOOKAHEAD;
|
||||||
|
label2:
|
||||||
|
if ((s=jpeg_huff_decode(&br_state,get_buffer,bits_left,tbl,nb))
|
||||||
|
< 0) { return FALSE; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
r = s >> 4; s &= 15;
|
||||||
|
if (s) {
|
||||||
|
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
||||||
|
t = GET_BITS(s);
|
||||||
|
s = HUFF_EXTEND(t, s);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
if (s) {
|
if (s) {
|
||||||
k += r;
|
k += r;
|
||||||
CHECK_BIT_BUFFER(br_state, s, return FALSE);
|
|
||||||
r = GET_BITS(s);
|
|
||||||
s = HUFF_EXTEND(r, s);
|
|
||||||
/* Scale and output coefficient in natural (dezigzagged) order */
|
/* Scale and output coefficient in natural (dezigzagged) order */
|
||||||
(*block)[jpeg_natural_order[k]] = (JCOEF) (s << Al);
|
(*block)[jpeg_natural_order[k]] = (JCOEF) (s << Al);
|
||||||
} else {
|
} else {
|
||||||
@@ -406,7 +496,8 @@ decode_mcu_AC_first (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* Completed MCU, so update state */
|
/* Completed MCU, so update state */
|
||||||
entropy->saved.EOBRUN = EOBRUN; /* only part of saved state we care about */
|
entropy->saved.EOBRUN = EOBRUN; /* only part of saved state we need */
|
||||||
|
}
|
||||||
|
|
||||||
/* Account for restart interval (no-op if not using restarts) */
|
/* Account for restart interval (no-op if not using restarts) */
|
||||||
entropy->restarts_to_go--;
|
entropy->restarts_to_go--;
|
||||||
@@ -427,7 +518,6 @@ decode_mcu_DC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
||||||
int p1 = 1 << cinfo->Al; /* 1 in the bit position being coded */
|
int p1 = 1 << cinfo->Al; /* 1 in the bit position being coded */
|
||||||
int blkn;
|
int blkn;
|
||||||
JBLOCKROW block;
|
|
||||||
BITREAD_STATE_VARS;
|
BITREAD_STATE_VARS;
|
||||||
|
|
||||||
/* Process restart marker if needed; may have to suspend */
|
/* Process restart marker if needed; may have to suspend */
|
||||||
@@ -437,13 +527,17 @@ decode_mcu_DC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
return FALSE;
|
return FALSE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Not worth the cycles to check insufficient_data here,
|
||||||
|
* since we will not change the data anyway if we read zeroes.
|
||||||
|
*/
|
||||||
|
|
||||||
/* Load up working state */
|
/* Load up working state */
|
||||||
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
||||||
|
|
||||||
/* Outer loop handles each block in the MCU */
|
/* Outer loop handles each block in the MCU */
|
||||||
|
|
||||||
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
for (blkn = 0; blkn < cinfo->blocks_in_MCU; blkn++) {
|
||||||
block = MCU_data[blkn];
|
JBLOCKROW block = MCU_data[blkn];
|
||||||
|
|
||||||
/* Encoded data is simply the next bit of the two's-complement DC value */
|
/* Encoded data is simply the next bit of the two's-complement DC value */
|
||||||
CHECK_BIT_BUFFER(br_state, 1, return FALSE);
|
CHECK_BIT_BUFFER(br_state, 1, return FALSE);
|
||||||
@@ -471,14 +565,14 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
{
|
{
|
||||||
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
phuff_entropy_ptr entropy = (phuff_entropy_ptr) cinfo->entropy;
|
||||||
int Se = cinfo->Se;
|
int Se = cinfo->Se;
|
||||||
int p1 = 1 << cinfo->Al; /* 1 in the bit position being coded */
|
int Al = cinfo->Al;
|
||||||
int m1 = (-1) << cinfo->Al; /* -1 in the bit position being coded */
|
|
||||||
register int s, k, r;
|
register int s, k, r;
|
||||||
unsigned int EOBRUN;
|
unsigned int EOBRUN;
|
||||||
JBLOCKROW block;
|
JBLOCKROW block;
|
||||||
JCOEFPTR thiscoef;
|
JCOEFPTR thiscoef;
|
||||||
BITREAD_STATE_VARS;
|
BITREAD_STATE_VARS;
|
||||||
d_derived_tbl * tbl;
|
d_derived_tbl * tbl;
|
||||||
|
int pm1[2];
|
||||||
int num_newnz;
|
int num_newnz;
|
||||||
int newnz_pos[DCTSIZE2];
|
int newnz_pos[DCTSIZE2];
|
||||||
|
|
||||||
@@ -489,19 +583,30 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
return FALSE;
|
return FALSE;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* If we've run out of data, don't modify the MCU.
|
||||||
|
*/
|
||||||
|
if (! entropy->pub.insufficient_data) {
|
||||||
|
|
||||||
/* Load up working state */
|
/* Load up working state */
|
||||||
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
BITREAD_LOAD_STATE(cinfo,entropy->bitstate);
|
||||||
EOBRUN = entropy->saved.EOBRUN; /* only part of saved state we care about */
|
EOBRUN = entropy->saved.EOBRUN; /* only part of saved state we need */
|
||||||
|
|
||||||
/* There is always only one block per MCU */
|
/* There is always only one block per MCU */
|
||||||
block = MCU_data[0];
|
block = MCU_data[0];
|
||||||
tbl = entropy->ac_derived_tbl;
|
tbl = entropy->ac_derived_tbl;
|
||||||
|
|
||||||
|
/* The pm1[] array is indexed by a value from relational operator.
|
||||||
|
* This method eliminates conditional branches depending on random data,
|
||||||
|
* which result in lower performance on recent processors.
|
||||||
|
*/
|
||||||
|
pm1[0] = 1 << cinfo->Al; /* +1 in the bit position being coded */
|
||||||
|
pm1[1] = (-1) << cinfo->Al; /* -1 in the bit position being coded */
|
||||||
|
|
||||||
/* If we are forced to suspend, we must undo the assignments to any newly
|
/* If we are forced to suspend, we must undo the assignments to any newly
|
||||||
* nonzero coefficients in the block, because otherwise we'd get confused
|
* nonzero coefficients in the block, because otherwise we'd get confused
|
||||||
* next time about which coefficients were already nonzero.
|
* next time about which coefficients were already nonzero.
|
||||||
* But we need not undo addition of bits to already-nonzero coefficients;
|
* But we need not undo addition of bits to already-nonzero coefficients;
|
||||||
* instead, we can test the current bit position to see if we already did it.
|
* instead, we can test the current bit to see if we already did it.
|
||||||
*/
|
*/
|
||||||
num_newnz = 0;
|
num_newnz = 0;
|
||||||
|
|
||||||
@@ -510,18 +615,63 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
|
|
||||||
if (EOBRUN == 0) {
|
if (EOBRUN == 0) {
|
||||||
for (; k <= Se; k++) {
|
for (; k <= Se; k++) {
|
||||||
HUFF_DECODE(s, br_state, tbl, goto undoit, label3);
|
{ /* HUFFX_DECODE */
|
||||||
r = s >> 4;
|
register int nb, look, t;
|
||||||
s &= 15;
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
register const JOCTET * next_input_byte = br_state.next_input_byte;
|
||||||
|
register size_t bytes_in_buffer = br_state.bytes_in_buffer;
|
||||||
|
if (cinfo->unread_marker == 0) {
|
||||||
|
while (bits_left < MIN_GET_BITS) {
|
||||||
|
register int c;
|
||||||
|
if (bytes_in_buffer == 0 ||
|
||||||
|
(c = GETJOCTET(*next_input_byte)) == 0xFF) {
|
||||||
|
goto label31; }
|
||||||
|
bytes_in_buffer--; next_input_byte++;
|
||||||
|
get_buffer = (get_buffer << 8) | c;
|
||||||
|
bits_left += 8;
|
||||||
|
}
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
} else {
|
||||||
|
label31:
|
||||||
|
br_state.next_input_byte = next_input_byte;
|
||||||
|
br_state.bytes_in_buffer = bytes_in_buffer;
|
||||||
|
if (! jpeg_fill_bit_buffer(&br_state,get_buffer,bits_left, 0)) {
|
||||||
|
goto undoit; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
if (bits_left < HUFFX_LOOKAHEAD) {
|
||||||
|
nb = 1; goto label3;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
look = PEEK_BITS(HUFFX_LOOKAHEAD);
|
||||||
|
if ((nb = tbl->lookx_nbits[look]) != 0) {
|
||||||
|
t = tbl->lookx_sym[look];
|
||||||
|
s = tbl->lookx_val[look];
|
||||||
|
r = t >> 4; t &= 15;
|
||||||
|
if (t <= 1) {
|
||||||
|
DROP_BITS(nb);
|
||||||
|
} else { /* size of new coef should always be 1 */
|
||||||
|
WARNMS(cinfo, JWRN_HUFF_BAD_CODE);
|
||||||
|
DROP_BITS(nb - (t - 1));
|
||||||
|
s = (s >= 0) ? 1 : -1;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
nb = HUFFX_LOOKAHEAD;
|
||||||
|
label3:
|
||||||
|
if ((s=jpeg_huff_decode(&br_state,get_buffer,bits_left,tbl,nb))
|
||||||
|
< 0) { goto undoit; }
|
||||||
|
get_buffer = br_state.get_buffer; bits_left = br_state.bits_left;
|
||||||
|
r = s >> 4; s &= 15;
|
||||||
if (s) {
|
if (s) {
|
||||||
if (s != 1) /* size of new coef should always be 1 */
|
if (s != 1) /* size of new coef should always be 1 */
|
||||||
WARNMS(cinfo, JWRN_HUFF_BAD_CODE);
|
WARNMS(cinfo, JWRN_HUFF_BAD_CODE);
|
||||||
CHECK_BIT_BUFFER(br_state, 1, goto undoit);
|
CHECK_BIT_BUFFER(br_state, 1, goto undoit);
|
||||||
if (GET_BITS(1))
|
s = GET_BITS(1) ? 1 : -1;
|
||||||
s = p1; /* newly nonzero coef is positive */
|
}
|
||||||
else
|
}
|
||||||
s = m1; /* newly nonzero coef is negative */
|
}
|
||||||
} else {
|
if (s == 0) {
|
||||||
if (r != 15) {
|
if (r != 15) {
|
||||||
EOBRUN = 1 << r; /* EOBr, run length is 2^r + appended bits */
|
EOBRUN = 1 << r; /* EOBr, run length is 2^r + appended bits */
|
||||||
if (r) {
|
if (r) {
|
||||||
@@ -542,12 +692,8 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
if (*thiscoef != 0) {
|
if (*thiscoef != 0) {
|
||||||
CHECK_BIT_BUFFER(br_state, 1, goto undoit);
|
CHECK_BIT_BUFFER(br_state, 1, goto undoit);
|
||||||
if (GET_BITS(1)) {
|
if (GET_BITS(1)) {
|
||||||
if ((*thiscoef & p1) == 0) { /* do nothing if already changed it */
|
if ((*thiscoef & pm1[0]) == 0) /* do nothing if already set it */
|
||||||
if (*thiscoef >= 0)
|
*thiscoef += pm1[(*thiscoef < 0)];
|
||||||
*thiscoef += p1;
|
|
||||||
else
|
|
||||||
*thiscoef += m1;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
if (--r < 0)
|
if (--r < 0)
|
||||||
@@ -558,7 +704,7 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
if (s) {
|
if (s) {
|
||||||
int pos = jpeg_natural_order[k];
|
int pos = jpeg_natural_order[k];
|
||||||
/* Output newly nonzero coefficient */
|
/* Output newly nonzero coefficient */
|
||||||
(*block)[pos] = (JCOEF) s;
|
(*block)[pos] = (JCOEF) (s << Al);
|
||||||
/* Remember its position in case we have to suspend */
|
/* Remember its position in case we have to suspend */
|
||||||
newnz_pos[num_newnz++] = pos;
|
newnz_pos[num_newnz++] = pos;
|
||||||
}
|
}
|
||||||
@@ -576,12 +722,8 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
if (*thiscoef != 0) {
|
if (*thiscoef != 0) {
|
||||||
CHECK_BIT_BUFFER(br_state, 1, goto undoit);
|
CHECK_BIT_BUFFER(br_state, 1, goto undoit);
|
||||||
if (GET_BITS(1)) {
|
if (GET_BITS(1)) {
|
||||||
if ((*thiscoef & p1) == 0) { /* do nothing if already changed it */
|
if ((*thiscoef & pm1[0]) == 0) /* do nothing if already set it */
|
||||||
if (*thiscoef >= 0)
|
*thiscoef += pm1[(*thiscoef < 0)];
|
||||||
*thiscoef += p1;
|
|
||||||
else
|
|
||||||
*thiscoef += m1;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -591,7 +733,8 @@ decode_mcu_AC_refine (j_decompress_ptr cinfo, JBLOCKROW *MCU_data)
|
|||||||
|
|
||||||
/* Completed MCU, so update state */
|
/* Completed MCU, so update state */
|
||||||
BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
|
BITREAD_SAVE_STATE(cinfo,entropy->bitstate);
|
||||||
entropy->saved.EOBRUN = EOBRUN; /* only part of saved state we care about */
|
entropy->saved.EOBRUN = EOBRUN; /* only part of saved state we need */
|
||||||
|
}
|
||||||
|
|
||||||
/* Account for restart interval (no-op if not using restarts) */
|
/* Account for restart interval (no-op if not using restarts) */
|
||||||
entropy->restarts_to_go--;
|
entropy->restarts_to_go--;
|
||||||
|
|||||||
893
jdsammmx.asm
Normal file
893
jdsammmx.asm
Normal file
@@ -0,0 +1,893 @@
|
|||||||
|
;
|
||||||
|
; jdsammmx.asm - upsampling (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%ifdef JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fancy_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_fancy_upsample_mmx):
|
||||||
|
|
||||||
|
PW_ONE times 4 dw 1
|
||||||
|
PW_TWO times 4 dw 2
|
||||||
|
PW_THREE times 4 dw 3
|
||||||
|
PW_SEVEN times 4 dw 7
|
||||||
|
PW_EIGHT times 4 dw 8
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Fancy processing for the common case of 2:1 horizontal and 1:1 vertical.
|
||||||
|
;
|
||||||
|
; The upsampling algorithm is linear interpolation between pixel centers,
|
||||||
|
; also known as a "triangle filter". This is a good compromise between
|
||||||
|
; speed and visual quality. The centers of the output pixels are 1/4 and 3/4
|
||||||
|
; of the way between input pixel centers.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_fancy_upsample_mmx (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_fancy_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_fancy_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_downsampled_width(eax)] ; colctr
|
||||||
|
test eax,eax
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax ; colctr
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
|
||||||
|
test eax, SIZEOF_MMWORD-1
|
||||||
|
jz short .skip
|
||||||
|
mov dl, JSAMPLE [esi+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [esi+eax*SIZEOF_JSAMPLE], dl ; insert a dummy sample
|
||||||
|
.skip:
|
||||||
|
pxor mm0,mm0 ; mm0=(all 0's)
|
||||||
|
pcmpeqb mm7,mm7
|
||||||
|
psrlq mm7,(SIZEOF_MMWORD-1)*BYTE_BIT
|
||||||
|
pand mm7, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
add eax, byte SIZEOF_MMWORD-1
|
||||||
|
and eax, byte -SIZEOF_MMWORD
|
||||||
|
cmp eax, byte SIZEOF_MMWORD
|
||||||
|
ja short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop_last:
|
||||||
|
pcmpeqb mm6,mm6
|
||||||
|
psllq mm6,(SIZEOF_MMWORD-1)*BYTE_BIT
|
||||||
|
pand mm6, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
jmp short .upsample
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movq mm6, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
psllq mm6,(SIZEOF_MMWORD-1)*BYTE_BIT
|
||||||
|
|
||||||
|
.upsample:
|
||||||
|
movq mm1, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
movq mm2,mm1
|
||||||
|
movq mm3,mm1 ; mm1=( 0 1 2 3 4 5 6 7)
|
||||||
|
psllq mm2,BYTE_BIT ; mm2=( - 0 1 2 3 4 5 6)
|
||||||
|
psrlq mm3,BYTE_BIT ; mm3=( 1 2 3 4 5 6 7 -)
|
||||||
|
|
||||||
|
por mm2,mm7 ; mm2=(-1 0 1 2 3 4 5 6)
|
||||||
|
por mm3,mm6 ; mm3=( 1 2 3 4 5 6 7 8)
|
||||||
|
|
||||||
|
movq mm7,mm1
|
||||||
|
psrlq mm7,(SIZEOF_MMWORD-1)*BYTE_BIT ; mm7=( 7 - - - - - - -)
|
||||||
|
|
||||||
|
movq mm4,mm1
|
||||||
|
punpcklbw mm1,mm0 ; mm1=( 0 1 2 3)
|
||||||
|
punpckhbw mm4,mm0 ; mm4=( 4 5 6 7)
|
||||||
|
movq mm5,mm2
|
||||||
|
punpcklbw mm2,mm0 ; mm2=(-1 0 1 2)
|
||||||
|
punpckhbw mm5,mm0 ; mm5=( 3 4 5 6)
|
||||||
|
movq mm6,mm3
|
||||||
|
punpcklbw mm3,mm0 ; mm3=( 1 2 3 4)
|
||||||
|
punpckhbw mm6,mm0 ; mm6=( 5 6 7 8)
|
||||||
|
|
||||||
|
pmullw mm1,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw mm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw mm2,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw mm5,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw mm3,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
paddw mm6,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
|
||||||
|
paddw mm2,mm1
|
||||||
|
paddw mm5,mm4
|
||||||
|
psrlw mm2,2 ; mm2=OutLE=( 0 2 4 6)
|
||||||
|
psrlw mm5,2 ; mm5=OutHE=( 8 10 12 14)
|
||||||
|
paddw mm3,mm1
|
||||||
|
paddw mm6,mm4
|
||||||
|
psrlw mm3,2 ; mm3=OutLO=( 1 3 5 7)
|
||||||
|
psrlw mm6,2 ; mm6=OutHO=( 9 11 13 15)
|
||||||
|
|
||||||
|
psllw mm3,BYTE_BIT
|
||||||
|
psllw mm6,BYTE_BIT
|
||||||
|
por mm2,mm3 ; mm2=OutL=( 0 1 2 3 4 5 6 7)
|
||||||
|
por mm5,mm6 ; mm5=OutH=( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm2
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mm5
|
||||||
|
|
||||||
|
sub eax, byte SIZEOF_MMWORD
|
||||||
|
add esi, byte 1*SIZEOF_MMWORD ; inptr
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD ; outptr
|
||||||
|
cmp eax, byte SIZEOF_MMWORD
|
||||||
|
ja near .columnloop
|
||||||
|
test eax,eax
|
||||||
|
jnz near .columnloop_last
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec ecx ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Fancy processing for the common case of 2:1 horizontal and 2:1 vertical.
|
||||||
|
; Again a triangle filter; see comments for h2v1 case, above.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_fancy_upsample_mmx (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 4
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_fancy_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_fancy_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov edx,eax ; edx = original ebp
|
||||||
|
mov eax, POINTER [compptr(edx)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_downsampled_width(eax)] ; colctr
|
||||||
|
test eax,eax
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(edx)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(edx)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(edx)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax ; colctr
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov ecx, JSAMPROW [esi-1*SIZEOF_JSAMPROW] ; inptr1(above)
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1(below)
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
|
||||||
|
test eax, SIZEOF_MMWORD-1
|
||||||
|
jz short .skip
|
||||||
|
push edx
|
||||||
|
mov dl, JSAMPLE [ecx+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [ecx+eax*SIZEOF_JSAMPLE], dl
|
||||||
|
mov dl, JSAMPLE [ebx+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [ebx+eax*SIZEOF_JSAMPLE], dl
|
||||||
|
mov dl, JSAMPLE [esi+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [esi+eax*SIZEOF_JSAMPLE], dl ; insert a dummy sample
|
||||||
|
pop edx
|
||||||
|
.skip:
|
||||||
|
; -- process the first column block
|
||||||
|
|
||||||
|
movq mm0, MMWORD [ebx+0*SIZEOF_MMWORD] ; mm0=row[ 0][0]
|
||||||
|
movq mm1, MMWORD [ecx+0*SIZEOF_MMWORD] ; mm1=row[-1][0]
|
||||||
|
movq mm2, MMWORD [esi+0*SIZEOF_MMWORD] ; mm2=row[+1][0]
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
pxor mm3,mm3 ; mm3=(all 0's)
|
||||||
|
movq mm4,mm0
|
||||||
|
punpcklbw mm0,mm3 ; mm0=row[ 0][0]( 0 1 2 3)
|
||||||
|
punpckhbw mm4,mm3 ; mm4=row[ 0][0]( 4 5 6 7)
|
||||||
|
movq mm5,mm1
|
||||||
|
punpcklbw mm1,mm3 ; mm1=row[-1][0]( 0 1 2 3)
|
||||||
|
punpckhbw mm5,mm3 ; mm5=row[-1][0]( 4 5 6 7)
|
||||||
|
movq mm6,mm2
|
||||||
|
punpcklbw mm2,mm3 ; mm2=row[+1][0]( 0 1 2 3)
|
||||||
|
punpckhbw mm6,mm3 ; mm6=row[+1][0]( 4 5 6 7)
|
||||||
|
|
||||||
|
pmullw mm0,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw mm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
|
||||||
|
pcmpeqb mm7,mm7
|
||||||
|
psrlq mm7,(SIZEOF_MMWORD-2)*BYTE_BIT
|
||||||
|
|
||||||
|
paddw mm1,mm0 ; mm1=Int0L=( 0 1 2 3)
|
||||||
|
paddw mm5,mm4 ; mm5=Int0H=( 4 5 6 7)
|
||||||
|
paddw mm2,mm0 ; mm2=Int1L=( 0 1 2 3)
|
||||||
|
paddw mm6,mm4 ; mm6=Int1H=( 4 5 6 7)
|
||||||
|
|
||||||
|
movq MMWORD [edx+0*SIZEOF_MMWORD], mm1 ; temporarily save
|
||||||
|
movq MMWORD [edx+1*SIZEOF_MMWORD], mm5 ; the intermediate data
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm2
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mm6
|
||||||
|
|
||||||
|
pand mm1,mm7 ; mm1=( 0 - - -)
|
||||||
|
pand mm2,mm7 ; mm2=( 0 - - -)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm1
|
||||||
|
movq MMWORD [wk(1)], mm2
|
||||||
|
|
||||||
|
poppic ebx
|
||||||
|
|
||||||
|
add eax, byte SIZEOF_MMWORD-1
|
||||||
|
and eax, byte -SIZEOF_MMWORD
|
||||||
|
cmp eax, byte SIZEOF_MMWORD
|
||||||
|
ja short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop_last:
|
||||||
|
; -- process the last column block
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
pcmpeqb mm1,mm1
|
||||||
|
psllq mm1,(SIZEOF_MMWORD-2)*BYTE_BIT
|
||||||
|
movq mm2,mm1
|
||||||
|
|
||||||
|
pand mm1, MMWORD [edx+1*SIZEOF_MMWORD] ; mm1=( - - - 7)
|
||||||
|
pand mm2, MMWORD [edi+1*SIZEOF_MMWORD] ; mm2=( - - - 7)
|
||||||
|
|
||||||
|
movq MMWORD [wk(2)], mm1
|
||||||
|
movq MMWORD [wk(3)], mm2
|
||||||
|
|
||||||
|
jmp short .upsample
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
; -- process the next column block
|
||||||
|
|
||||||
|
movq mm0, MMWORD [ebx+1*SIZEOF_MMWORD] ; mm0=row[ 0][1]
|
||||||
|
movq mm1, MMWORD [ecx+1*SIZEOF_MMWORD] ; mm1=row[-1][1]
|
||||||
|
movq mm2, MMWORD [esi+1*SIZEOF_MMWORD] ; mm2=row[+1][1]
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
pxor mm3,mm3 ; mm3=(all 0's)
|
||||||
|
movq mm4,mm0
|
||||||
|
punpcklbw mm0,mm3 ; mm0=row[ 0][1]( 0 1 2 3)
|
||||||
|
punpckhbw mm4,mm3 ; mm4=row[ 0][1]( 4 5 6 7)
|
||||||
|
movq mm5,mm1
|
||||||
|
punpcklbw mm1,mm3 ; mm1=row[-1][1]( 0 1 2 3)
|
||||||
|
punpckhbw mm5,mm3 ; mm5=row[-1][1]( 4 5 6 7)
|
||||||
|
movq mm6,mm2
|
||||||
|
punpcklbw mm2,mm3 ; mm2=row[+1][1]( 0 1 2 3)
|
||||||
|
punpckhbw mm6,mm3 ; mm6=row[+1][1]( 4 5 6 7)
|
||||||
|
|
||||||
|
pmullw mm0,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw mm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
|
||||||
|
paddw mm1,mm0 ; mm1=Int0L=( 0 1 2 3)
|
||||||
|
paddw mm5,mm4 ; mm5=Int0H=( 4 5 6 7)
|
||||||
|
paddw mm2,mm0 ; mm2=Int1L=( 0 1 2 3)
|
||||||
|
paddw mm6,mm4 ; mm6=Int1H=( 4 5 6 7)
|
||||||
|
|
||||||
|
movq MMWORD [edx+2*SIZEOF_MMWORD], mm1 ; temporarily save
|
||||||
|
movq MMWORD [edx+3*SIZEOF_MMWORD], mm5 ; the intermediate data
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mm2
|
||||||
|
movq MMWORD [edi+3*SIZEOF_MMWORD], mm6
|
||||||
|
|
||||||
|
psllq mm1,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm1=( - - - 0)
|
||||||
|
psllq mm2,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm2=( - - - 0)
|
||||||
|
|
||||||
|
movq MMWORD [wk(2)], mm1
|
||||||
|
movq MMWORD [wk(3)], mm2
|
||||||
|
|
||||||
|
.upsample:
|
||||||
|
; -- process the upper row
|
||||||
|
|
||||||
|
movq mm7, MMWORD [edx+0*SIZEOF_MMWORD] ; mm7=Int0L=( 0 1 2 3)
|
||||||
|
movq mm3, MMWORD [edx+1*SIZEOF_MMWORD] ; mm3=Int0H=( 4 5 6 7)
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
movq mm4,mm3
|
||||||
|
psrlq mm0,2*BYTE_BIT ; mm0=( 1 2 3 -)
|
||||||
|
psllq mm4,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm4=( - - - 4)
|
||||||
|
movq mm5,mm7
|
||||||
|
movq mm6,mm3
|
||||||
|
psrlq mm5,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm5=( 3 - - -)
|
||||||
|
psllq mm6,2*BYTE_BIT ; mm6=( - 4 5 6)
|
||||||
|
|
||||||
|
por mm0,mm4 ; mm0=( 1 2 3 4)
|
||||||
|
por mm5,mm6 ; mm5=( 3 4 5 6)
|
||||||
|
|
||||||
|
movq mm1,mm7
|
||||||
|
movq mm2,mm3
|
||||||
|
psllq mm1,2*BYTE_BIT ; mm1=( - 0 1 2)
|
||||||
|
psrlq mm2,2*BYTE_BIT ; mm2=( 5 6 7 -)
|
||||||
|
movq mm4,mm3
|
||||||
|
psrlq mm4,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm4=( 7 - - -)
|
||||||
|
|
||||||
|
por mm1, MMWORD [wk(0)] ; mm1=(-1 0 1 2)
|
||||||
|
por mm2, MMWORD [wk(2)] ; mm2=( 5 6 7 8)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4
|
||||||
|
|
||||||
|
pmullw mm7,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw mm3,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw mm1,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw mm5,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw mm0,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
paddw mm2,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
|
||||||
|
paddw mm1,mm7
|
||||||
|
paddw mm5,mm3
|
||||||
|
psrlw mm1,4 ; mm1=Out0LE=( 0 2 4 6)
|
||||||
|
psrlw mm5,4 ; mm5=Out0HE=( 8 10 12 14)
|
||||||
|
paddw mm0,mm7
|
||||||
|
paddw mm2,mm3
|
||||||
|
psrlw mm0,4 ; mm0=Out0LO=( 1 3 5 7)
|
||||||
|
psrlw mm2,4 ; mm2=Out0HO=( 9 11 13 15)
|
||||||
|
|
||||||
|
psllw mm0,BYTE_BIT
|
||||||
|
psllw mm2,BYTE_BIT
|
||||||
|
por mm1,mm0 ; mm1=Out0L=( 0 1 2 3 4 5 6 7)
|
||||||
|
por mm5,mm2 ; mm5=Out0H=( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
movq MMWORD [edx+0*SIZEOF_MMWORD], mm1
|
||||||
|
movq MMWORD [edx+1*SIZEOF_MMWORD], mm5
|
||||||
|
|
||||||
|
; -- process the lower row
|
||||||
|
|
||||||
|
movq mm6, MMWORD [edi+0*SIZEOF_MMWORD] ; mm6=Int1L=( 0 1 2 3)
|
||||||
|
movq mm4, MMWORD [edi+1*SIZEOF_MMWORD] ; mm4=Int1H=( 4 5 6 7)
|
||||||
|
|
||||||
|
movq mm7,mm6
|
||||||
|
movq mm3,mm4
|
||||||
|
psrlq mm7,2*BYTE_BIT ; mm7=( 1 2 3 -)
|
||||||
|
psllq mm3,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm3=( - - - 4)
|
||||||
|
movq mm0,mm6
|
||||||
|
movq mm2,mm4
|
||||||
|
psrlq mm0,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm0=( 3 - - -)
|
||||||
|
psllq mm2,2*BYTE_BIT ; mm2=( - 4 5 6)
|
||||||
|
|
||||||
|
por mm7,mm3 ; mm7=( 1 2 3 4)
|
||||||
|
por mm0,mm2 ; mm0=( 3 4 5 6)
|
||||||
|
|
||||||
|
movq mm1,mm6
|
||||||
|
movq mm5,mm4
|
||||||
|
psllq mm1,2*BYTE_BIT ; mm1=( - 0 1 2)
|
||||||
|
psrlq mm5,2*BYTE_BIT ; mm5=( 5 6 7 -)
|
||||||
|
movq mm3,mm4
|
||||||
|
psrlq mm3,(SIZEOF_MMWORD-2)*BYTE_BIT ; mm3=( 7 - - -)
|
||||||
|
|
||||||
|
por mm1, MMWORD [wk(1)] ; mm1=(-1 0 1 2)
|
||||||
|
por mm5, MMWORD [wk(3)] ; mm5=( 5 6 7 8)
|
||||||
|
|
||||||
|
movq MMWORD [wk(1)], mm3
|
||||||
|
|
||||||
|
pmullw mm6,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw mm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw mm1,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw mm0,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw mm7,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
paddw mm5,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
|
||||||
|
paddw mm1,mm6
|
||||||
|
paddw mm0,mm4
|
||||||
|
psrlw mm1,4 ; mm1=Out1LE=( 0 2 4 6)
|
||||||
|
psrlw mm0,4 ; mm0=Out1HE=( 8 10 12 14)
|
||||||
|
paddw mm7,mm6
|
||||||
|
paddw mm5,mm4
|
||||||
|
psrlw mm7,4 ; mm7=Out1LO=( 1 3 5 7)
|
||||||
|
psrlw mm5,4 ; mm5=Out1HO=( 9 11 13 15)
|
||||||
|
|
||||||
|
psllw mm7,BYTE_BIT
|
||||||
|
psllw mm5,BYTE_BIT
|
||||||
|
por mm1,mm7 ; mm1=Out1L=( 0 1 2 3 4 5 6 7)
|
||||||
|
por mm0,mm5 ; mm0=Out1H=( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm1
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mm0
|
||||||
|
|
||||||
|
poppic ebx
|
||||||
|
|
||||||
|
sub eax, byte SIZEOF_MMWORD
|
||||||
|
add ecx, byte 1*SIZEOF_MMWORD ; inptr1(above)
|
||||||
|
add ebx, byte 1*SIZEOF_MMWORD ; inptr0
|
||||||
|
add esi, byte 1*SIZEOF_MMWORD ; inptr1(below)
|
||||||
|
add edx, byte 2*SIZEOF_MMWORD ; outptr0
|
||||||
|
add edi, byte 2*SIZEOF_MMWORD ; outptr1
|
||||||
|
cmp eax, byte SIZEOF_MMWORD
|
||||||
|
ja near .columnloop
|
||||||
|
test eax,eax
|
||||||
|
jnz near .columnloop_last
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte 1*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW ; output_data
|
||||||
|
sub ecx, byte 2 ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%ifdef UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Fancy processing for the common case of 1:1 horizontal and 2:1 vertical.
|
||||||
|
; Again a triangle filter; see comments for h2v1 case, above.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h1v2_fancy_upsample_mmx (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
%define gotptr ebp-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h1v2_fancy_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h1v2_fancy_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_downsampled_width(eax)] ; colctr
|
||||||
|
add eax, byte SIZEOF_MMWORD-1
|
||||||
|
and eax, byte -SIZEOF_MMWORD
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax ; colctr
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov ecx, JSAMPROW [esi-1*SIZEOF_JSAMPROW] ; inptr1(above)
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1(below)
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
|
||||||
|
pxor mm0,mm0 ; mm0=(all 0's)
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movq mm1, MMWORD [ebx] ; mm1=row[ 0]( 0 1 2 3 4 5 6 7)
|
||||||
|
movq mm2, MMWORD [ecx] ; mm2=row[-1]( 0 1 2 3 4 5 6 7)
|
||||||
|
movq mm3, MMWORD [esi] ; mm3=row[+1]( 0 1 2 3 4 5 6 7)
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
movq mm4,mm1
|
||||||
|
punpcklbw mm1,mm0 ; mm1=row[ 0]( 0 1 2 3)
|
||||||
|
punpckhbw mm4,mm0 ; mm4=row[ 0]( 4 5 6 7)
|
||||||
|
movq mm5,mm2
|
||||||
|
punpcklbw mm2,mm0 ; mm2=row[-1]( 0 1 2 3)
|
||||||
|
punpckhbw mm5,mm0 ; mm5=row[-1]( 4 5 6 7)
|
||||||
|
movq mm6,mm3
|
||||||
|
punpcklbw mm3,mm0 ; mm3=row[+1]( 0 1 2 3)
|
||||||
|
punpckhbw mm6,mm0 ; mm6=row[+1]( 4 5 6 7)
|
||||||
|
|
||||||
|
pmullw mm1,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw mm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw mm2,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw mm5,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw mm3,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
paddw mm6,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
|
||||||
|
paddw mm2,mm1
|
||||||
|
paddw mm5,mm4
|
||||||
|
psrlw mm2,2 ; mm2=Out0L=( 0 1 2 3)
|
||||||
|
psrlw mm5,2 ; mm5=Out0H=( 4 5 6 7)
|
||||||
|
paddw mm3,mm1
|
||||||
|
paddw mm6,mm4
|
||||||
|
psrlw mm3,2 ; mm3=Out1L=( 0 1 2 3)
|
||||||
|
psrlw mm6,2 ; mm6=Out1H=( 4 5 6 7)
|
||||||
|
|
||||||
|
packuswb mm2,mm5 ; mm2=Out0=( 0 1 2 3 4 5 6 7)
|
||||||
|
packuswb mm3,mm6 ; mm3=Out1=( 0 1 2 3 4 5 6 7)
|
||||||
|
|
||||||
|
movq MMWORD [edx], mm2
|
||||||
|
movq MMWORD [edi], mm3
|
||||||
|
|
||||||
|
poppic ebx
|
||||||
|
|
||||||
|
add ecx, byte 1*SIZEOF_MMWORD ; inptr1(above)
|
||||||
|
add ebx, byte 1*SIZEOF_MMWORD ; inptr0
|
||||||
|
add esi, byte 1*SIZEOF_MMWORD ; inptr1(below)
|
||||||
|
add edx, byte 1*SIZEOF_MMWORD ; outptr0
|
||||||
|
add edi, byte 1*SIZEOF_MMWORD ; outptr1
|
||||||
|
sub eax, byte SIZEOF_MMWORD
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte 1*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW ; output_data
|
||||||
|
sub ecx, byte 2 ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
poppic eax ; remove gotptr
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
%endif ; JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
|
||||||
|
%ifdef JDSAMPLE_SIMPLE_MMX_SUPPORTED
|
||||||
|
|
||||||
|
%ifndef JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
%endif
|
||||||
|
;
|
||||||
|
; Fast processing for the common case of 2:1 horizontal and 1:1 vertical.
|
||||||
|
; It's still a box filter.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_upsample_mmx (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jdstruct_output_width(edx)]
|
||||||
|
add edx, byte (2*SIZEOF_MMWORD)-1
|
||||||
|
and edx, byte -(2*SIZEOF_MMWORD)
|
||||||
|
jz short .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz short .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
mov eax,edx ; colctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
movq mm1,mm0
|
||||||
|
punpcklbw mm0,mm0
|
||||||
|
punpckhbw mm1,mm1
|
||||||
|
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm0
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mm1
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_MMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
movq mm2, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
movq mm3,mm2
|
||||||
|
punpcklbw mm2,mm2
|
||||||
|
punpckhbw mm3,mm3
|
||||||
|
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mm2
|
||||||
|
movq MMWORD [edi+3*SIZEOF_MMWORD], mm3
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_MMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_MMWORD ; inptr
|
||||||
|
add edi, byte 4*SIZEOF_MMWORD ; outptr
|
||||||
|
jmp short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec ecx ; rowctr
|
||||||
|
jg short .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Fast processing for the common case of 2:1 horizontal and 2:1 vertical.
|
||||||
|
; It's still a box filter.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_upsample_mmx (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_upsample_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_upsample_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jdstruct_output_width(edx)]
|
||||||
|
add edx, byte (2*SIZEOF_MMWORD)-1
|
||||||
|
and edx, byte -(2*SIZEOF_MMWORD)
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz short .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov ebx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
mov eax,edx ; colctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [esi+0*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
movq mm1,mm0
|
||||||
|
punpcklbw mm0,mm0
|
||||||
|
punpckhbw mm1,mm1
|
||||||
|
|
||||||
|
movq MMWORD [ebx+0*SIZEOF_MMWORD], mm0
|
||||||
|
movq MMWORD [ebx+1*SIZEOF_MMWORD], mm1
|
||||||
|
movq MMWORD [edi+0*SIZEOF_MMWORD], mm0
|
||||||
|
movq MMWORD [edi+1*SIZEOF_MMWORD], mm1
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_MMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
movq mm2, MMWORD [esi+1*SIZEOF_MMWORD]
|
||||||
|
|
||||||
|
movq mm3,mm2
|
||||||
|
punpcklbw mm2,mm2
|
||||||
|
punpckhbw mm3,mm3
|
||||||
|
|
||||||
|
movq MMWORD [ebx+2*SIZEOF_MMWORD], mm2
|
||||||
|
movq MMWORD [ebx+3*SIZEOF_MMWORD], mm3
|
||||||
|
movq MMWORD [edi+2*SIZEOF_MMWORD], mm2
|
||||||
|
movq MMWORD [edi+3*SIZEOF_MMWORD], mm3
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_MMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_MMWORD ; inptr
|
||||||
|
add ebx, byte 4*SIZEOF_MMWORD ; outptr0
|
||||||
|
add edi, byte 4*SIZEOF_MMWORD ; outptr1
|
||||||
|
jmp short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
|
||||||
|
add esi, byte 1*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW ; output_data
|
||||||
|
sub ecx, byte 2 ; rowctr
|
||||||
|
jg short .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JDSAMPLE_SIMPLE_MMX_SUPPORTED
|
||||||
200
jdsample.c
200
jdsample.c
@@ -5,6 +5,13 @@
|
|||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
* x86 SIMD extension for IJG JPEG library
|
||||||
|
* Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
* This file has been modified for SIMD extension.
|
||||||
|
* Last Modified : January 5, 2006
|
||||||
|
* ---------------------------------------------------------------------
|
||||||
|
*
|
||||||
* This file contains upsampling routines.
|
* This file contains upsampling routines.
|
||||||
*
|
*
|
||||||
* Upsampling input data is counted in "row groups". A row group
|
* Upsampling input data is counted in "row groups". A row group
|
||||||
@@ -21,6 +28,7 @@
|
|||||||
#define JPEG_INTERNALS
|
#define JPEG_INTERNALS
|
||||||
#include "jinclude.h"
|
#include "jinclude.h"
|
||||||
#include "jpeglib.h"
|
#include "jpeglib.h"
|
||||||
|
#include "jcolsamp.h" /* Private declarations */
|
||||||
|
|
||||||
|
|
||||||
/* Pointer to routine to upsample a single component */
|
/* Pointer to routine to upsample a single component */
|
||||||
@@ -285,6 +293,37 @@ h2v2_upsample (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Fast processing for the common case of 1:1 horizontal and 2:1 vertical.
|
||||||
|
* It's still a box filter.
|
||||||
|
*
|
||||||
|
* SIMD Ext: This routine is for files that are rotated or transposed
|
||||||
|
* by jpegtran.
|
||||||
|
*/
|
||||||
|
|
||||||
|
METHODDEF(void)
|
||||||
|
h1v2_upsample (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr)
|
||||||
|
{
|
||||||
|
JSAMPARRAY output_data = *output_data_ptr;
|
||||||
|
int inrow, outrow;
|
||||||
|
|
||||||
|
inrow = outrow = 0;
|
||||||
|
while (outrow < cinfo->max_v_samp_factor) {
|
||||||
|
jcopy_sample_rows(input_data, inrow, output_data, outrow,
|
||||||
|
1, cinfo->output_width);
|
||||||
|
jcopy_sample_rows(input_data, inrow, output_data, outrow+1,
|
||||||
|
1, cinfo->output_width);
|
||||||
|
inrow++;
|
||||||
|
outrow += 2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* UPSAMPLE_H1V2_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Fancy processing for the common case of 2:1 horizontal and 1:1 vertical.
|
* Fancy processing for the common case of 2:1 horizontal and 1:1 vertical.
|
||||||
*
|
*
|
||||||
@@ -391,6 +430,52 @@ h2v2_fancy_upsample (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifdef UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Fancy processing for the common case of 1:1 horizontal and 2:1 vertical.
|
||||||
|
* Again a triangle filter; see comments for h2v1 case, above.
|
||||||
|
*
|
||||||
|
* It is OK for us to reference the adjacent input rows because we demanded
|
||||||
|
* context from the main buffer controller (see initialization code).
|
||||||
|
*
|
||||||
|
* SIMD Ext: This routine is for files that are rotated or transposed
|
||||||
|
* by jpegtran.
|
||||||
|
*/
|
||||||
|
|
||||||
|
METHODDEF(void)
|
||||||
|
h1v2_fancy_upsample (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
JSAMPARRAY input_data, JSAMPARRAY * output_data_ptr)
|
||||||
|
{
|
||||||
|
JSAMPARRAY output_data = *output_data_ptr;
|
||||||
|
register JSAMPROW inptr0, inptr1, outptr;
|
||||||
|
register int colsum;
|
||||||
|
register JDIMENSION colctr;
|
||||||
|
int inrow, outrow, v;
|
||||||
|
|
||||||
|
inrow = outrow = 0;
|
||||||
|
while (outrow < cinfo->max_v_samp_factor) {
|
||||||
|
for (v = 0; v < 2; v++) {
|
||||||
|
/* inptr0 points to nearest input row, inptr1 points to next nearest */
|
||||||
|
inptr0 = input_data[inrow];
|
||||||
|
if (v == 0) /* next nearest is row above */
|
||||||
|
inptr1 = input_data[inrow-1];
|
||||||
|
else /* next nearest is row below */
|
||||||
|
inptr1 = input_data[inrow+1];
|
||||||
|
outptr = output_data[outrow++];
|
||||||
|
|
||||||
|
for (colctr = compptr->downsampled_width; colctr > 0; colctr--) {
|
||||||
|
colsum = GETJSAMPLE(*inptr0++) * 3 + GETJSAMPLE(*inptr1++);
|
||||||
|
*outptr++ = (JSAMPLE) ((colsum + v + 1) >> 2);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
inrow++;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* UPSAMPLE_H1V2_SUPPORTED */
|
||||||
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Module initialization routine for upsampling.
|
* Module initialization routine for upsampling.
|
||||||
*/
|
*/
|
||||||
@@ -403,6 +488,7 @@ jinit_upsampler (j_decompress_ptr cinfo)
|
|||||||
jpeg_component_info * compptr;
|
jpeg_component_info * compptr;
|
||||||
boolean need_buffer, do_fancy;
|
boolean need_buffer, do_fancy;
|
||||||
int h_in_group, v_in_group, h_out_group, v_out_group;
|
int h_in_group, v_in_group, h_out_group, v_out_group;
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
upsample = (my_upsample_ptr)
|
upsample = (my_upsample_ptr)
|
||||||
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
(*cinfo->mem->alloc_small) ((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
@@ -447,18 +533,83 @@ jinit_upsampler (j_decompress_ptr cinfo)
|
|||||||
} else if (h_in_group * 2 == h_out_group &&
|
} else if (h_in_group * 2 == h_out_group &&
|
||||||
v_in_group == v_out_group) {
|
v_in_group == v_out_group) {
|
||||||
/* Special cases for 2h1v upsampling */
|
/* Special cases for 2h1v upsampling */
|
||||||
if (do_fancy && compptr->downsampled_width > 2)
|
if (do_fancy && compptr->downsampled_width > 2) {
|
||||||
upsample->methods[ci] = h2v1_fancy_upsample;
|
#ifdef JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fancy_upsample_sse2))
|
||||||
|
upsample->methods[ci] = jpeg_h2v1_fancy_upsample_sse2;
|
||||||
else
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
upsample->methods[ci] = jpeg_h2v1_fancy_upsample_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
upsample->methods[ci] = h2v1_fancy_upsample;
|
||||||
|
} else {
|
||||||
|
#ifdef JDSAMPLE_SIMPLE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2)
|
||||||
|
upsample->methods[ci] = jpeg_h2v1_upsample_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_SIMPLE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
upsample->methods[ci] = jpeg_h2v1_upsample_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
upsample->methods[ci] = h2v1_upsample;
|
upsample->methods[ci] = h2v1_upsample;
|
||||||
|
}
|
||||||
} else if (h_in_group * 2 == h_out_group &&
|
} else if (h_in_group * 2 == h_out_group &&
|
||||||
v_in_group * 2 == v_out_group) {
|
v_in_group * 2 == v_out_group) {
|
||||||
/* Special cases for 2h2v upsampling */
|
/* Special cases for 2h2v upsampling */
|
||||||
if (do_fancy && compptr->downsampled_width > 2) {
|
if (do_fancy && compptr->downsampled_width > 2) {
|
||||||
|
#ifdef JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fancy_upsample_sse2))
|
||||||
|
upsample->methods[ci] = jpeg_h2v2_fancy_upsample_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
upsample->methods[ci] = jpeg_h2v2_fancy_upsample_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
upsample->methods[ci] = h2v2_fancy_upsample;
|
upsample->methods[ci] = h2v2_fancy_upsample;
|
||||||
upsample->pub.need_context_rows = TRUE;
|
upsample->pub.need_context_rows = TRUE;
|
||||||
} else
|
} else {
|
||||||
|
#ifdef JDSAMPLE_SIMPLE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2)
|
||||||
|
upsample->methods[ci] = jpeg_h2v2_upsample_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_SIMPLE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
upsample->methods[ci] = jpeg_h2v2_upsample_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
upsample->methods[ci] = h2v2_upsample;
|
upsample->methods[ci] = h2v2_upsample;
|
||||||
|
}
|
||||||
|
#ifdef UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
} else if (h_in_group == h_out_group &&
|
||||||
|
v_in_group * 2 == v_out_group) {
|
||||||
|
/* Special cases for 1h2v upsampling */
|
||||||
|
if (do_fancy) {
|
||||||
|
#ifdef JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fancy_upsample_sse2))
|
||||||
|
upsample->methods[ci] = jpeg_h1v2_fancy_upsample_sse2;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
upsample->methods[ci] = jpeg_h1v2_fancy_upsample_mmx;
|
||||||
|
else
|
||||||
|
#endif
|
||||||
|
upsample->methods[ci] = h1v2_fancy_upsample;
|
||||||
|
upsample->pub.need_context_rows = TRUE;
|
||||||
|
} else
|
||||||
|
upsample->methods[ci] = h1v2_upsample;
|
||||||
|
#endif /* UPSAMPLE_H1V2_SUPPORTED */
|
||||||
} else if ((h_out_group % h_in_group) == 0 &&
|
} else if ((h_out_group % h_in_group) == 0 &&
|
||||||
(v_out_group % v_in_group) == 0) {
|
(v_out_group % v_in_group) == 0) {
|
||||||
/* Generic integral-factors upsampling method */
|
/* Generic integral-factors upsampling method */
|
||||||
@@ -468,11 +619,52 @@ jinit_upsampler (j_decompress_ptr cinfo)
|
|||||||
} else
|
} else
|
||||||
ERREXIT(cinfo, JERR_FRACT_SAMPLE_NOTIMPL);
|
ERREXIT(cinfo, JERR_FRACT_SAMPLE_NOTIMPL);
|
||||||
if (need_buffer) {
|
if (need_buffer) {
|
||||||
|
enum { SIZEOF_XMMWORD = 16 }; /* from jsimdext.inc */
|
||||||
upsample->color_buf[ci] = (*cinfo->mem->alloc_sarray)
|
upsample->color_buf[ci] = (*cinfo->mem->alloc_sarray)
|
||||||
((j_common_ptr) cinfo, JPOOL_IMAGE,
|
((j_common_ptr) cinfo, JPOOL_IMAGE,
|
||||||
(JDIMENSION) jround_up((long) cinfo->output_width,
|
(JDIMENSION) jround_up(jround_up((long) cinfo->output_width,
|
||||||
(long) cinfo->max_h_samp_factor),
|
(long) cinfo->max_h_samp_factor),
|
||||||
|
(long) (2 * SIZEOF_XMMWORD)),
|
||||||
(JDIMENSION) cinfo->max_v_samp_factor);
|
(JDIMENSION) cinfo->max_v_samp_factor);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef JSIMD_MODEINFO_NOT_SUPPORTED
|
||||||
|
|
||||||
|
GLOBAL(unsigned int)
|
||||||
|
jpeg_simd_upsampler (j_decompress_ptr cinfo, int do_fancy)
|
||||||
|
{
|
||||||
|
unsigned int simd = jpeg_simd_support((j_common_ptr) cinfo);
|
||||||
|
|
||||||
|
#ifdef UPSAMPLE_MERGING_SUPPORTED
|
||||||
|
if (!do_fancy)
|
||||||
|
return jpeg_simd_merged_upsampler(cinfo);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
if (do_fancy) {
|
||||||
|
#ifdef JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2 &&
|
||||||
|
IS_CONST_ALIGNED_16(jconst_fancy_upsample_sse2))
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_FANCY_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
} else {
|
||||||
|
#ifdef JDSAMPLE_SIMPLE_SSE2_SUPPORTED
|
||||||
|
if (simd & JSIMD_SSE2)
|
||||||
|
return JSIMD_SSE2;
|
||||||
|
#endif
|
||||||
|
#ifdef JDSAMPLE_SIMPLE_MMX_SUPPORTED
|
||||||
|
if (simd & JSIMD_MMX)
|
||||||
|
return JSIMD_MMX;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
return JSIMD_NONE;
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif /* !JSIMD_MODEINFO_NOT_SUPPORTED */
|
||||||
|
|||||||
883
jdsamss2.asm
Normal file
883
jdsamss2.asm
Normal file
@@ -0,0 +1,883 @@
|
|||||||
|
;
|
||||||
|
; jdsamss2.asm - upsampling (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jcolsamp.inc"
|
||||||
|
|
||||||
|
%ifdef JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fancy_upsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_fancy_upsample_sse2):
|
||||||
|
|
||||||
|
PW_ONE times 8 dw 1
|
||||||
|
PW_TWO times 8 dw 2
|
||||||
|
PW_THREE times 8 dw 3
|
||||||
|
PW_SEVEN times 8 dw 7
|
||||||
|
PW_EIGHT times 8 dw 8
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Fancy processing for the common case of 2:1 horizontal and 1:1 vertical.
|
||||||
|
;
|
||||||
|
; The upsampling algorithm is linear interpolation between pixel centers,
|
||||||
|
; also known as a "triangle filter". This is a good compromise between
|
||||||
|
; speed and visual quality. The centers of the output pixels are 1/4 and 3/4
|
||||||
|
; of the way between input pixel centers.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_fancy_upsample_sse2 (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_fancy_upsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_fancy_upsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_downsampled_width(eax)] ; colctr
|
||||||
|
test eax,eax
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax ; colctr
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
|
||||||
|
test eax, SIZEOF_XMMWORD-1
|
||||||
|
jz short .skip
|
||||||
|
mov dl, JSAMPLE [esi+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [esi+eax*SIZEOF_JSAMPLE], dl ; insert a dummy sample
|
||||||
|
.skip:
|
||||||
|
pxor xmm0,xmm0 ; xmm0=(all 0's)
|
||||||
|
pcmpeqb xmm7,xmm7
|
||||||
|
psrldq xmm7,(SIZEOF_XMMWORD-1)
|
||||||
|
pand xmm7, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
add eax, byte SIZEOF_XMMWORD-1
|
||||||
|
and eax, byte -SIZEOF_XMMWORD
|
||||||
|
cmp eax, byte SIZEOF_XMMWORD
|
||||||
|
ja short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop_last:
|
||||||
|
pcmpeqb xmm6,xmm6
|
||||||
|
pslldq xmm6,(SIZEOF_XMMWORD-1)
|
||||||
|
pand xmm6, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
jmp short .upsample
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movdqa xmm6, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
pslldq xmm6,(SIZEOF_XMMWORD-1)
|
||||||
|
|
||||||
|
.upsample:
|
||||||
|
movdqa xmm1, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm2,xmm1
|
||||||
|
movdqa xmm3,xmm1 ; xmm1=( 0 1 2 ... 13 14 15)
|
||||||
|
pslldq xmm2,1 ; xmm2=(-- 0 1 ... 12 13 14)
|
||||||
|
psrldq xmm3,1 ; xmm3=( 1 2 3 ... 14 15 --)
|
||||||
|
|
||||||
|
por xmm2,xmm7 ; xmm2=(-1 0 1 ... 12 13 14)
|
||||||
|
por xmm3,xmm6 ; xmm3=( 1 2 3 ... 14 15 16)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm1
|
||||||
|
psrldq xmm7,(SIZEOF_XMMWORD-1) ; xmm7=(15 -- -- ... -- -- --)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1
|
||||||
|
punpcklbw xmm1,xmm0 ; xmm1=( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm4,xmm0 ; xmm4=( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm5,xmm2
|
||||||
|
punpcklbw xmm2,xmm0 ; xmm2=(-1 0 1 2 3 4 5 6)
|
||||||
|
punpckhbw xmm5,xmm0 ; xmm5=( 7 8 9 10 11 12 13 14)
|
||||||
|
movdqa xmm6,xmm3
|
||||||
|
punpcklbw xmm3,xmm0 ; xmm3=( 1 2 3 4 5 6 7 8)
|
||||||
|
punpckhbw xmm6,xmm0 ; xmm6=( 9 10 11 12 13 14 15 16)
|
||||||
|
|
||||||
|
pmullw xmm1,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw xmm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw xmm2,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw xmm5,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw xmm3,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
paddw xmm6,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
|
||||||
|
paddw xmm2,xmm1
|
||||||
|
paddw xmm5,xmm4
|
||||||
|
psrlw xmm2,2 ; xmm2=OutLE=( 0 2 4 6 8 10 12 14)
|
||||||
|
psrlw xmm5,2 ; xmm5=OutHE=(16 18 20 22 24 26 28 30)
|
||||||
|
paddw xmm3,xmm1
|
||||||
|
paddw xmm6,xmm4
|
||||||
|
psrlw xmm3,2 ; xmm3=OutLO=( 1 3 5 7 9 11 13 15)
|
||||||
|
psrlw xmm6,2 ; xmm6=OutHO=(17 19 21 23 25 27 29 31)
|
||||||
|
|
||||||
|
psllw xmm3,BYTE_BIT
|
||||||
|
psllw xmm6,BYTE_BIT
|
||||||
|
por xmm2,xmm3 ; xmm2=OutL=( 0 1 2 ... 13 14 15)
|
||||||
|
por xmm5,xmm6 ; xmm5=OutH=(16 17 18 ... 29 30 31)
|
||||||
|
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm2
|
||||||
|
movdqa XMMWORD [edi+1*SIZEOF_XMMWORD], xmm5
|
||||||
|
|
||||||
|
sub eax, byte SIZEOF_XMMWORD
|
||||||
|
add esi, byte 1*SIZEOF_XMMWORD ; inptr
|
||||||
|
add edi, byte 2*SIZEOF_XMMWORD ; outptr
|
||||||
|
cmp eax, byte SIZEOF_XMMWORD
|
||||||
|
ja near .columnloop
|
||||||
|
test eax,eax
|
||||||
|
jnz near .columnloop_last
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec ecx ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Fancy processing for the common case of 2:1 horizontal and 2:1 vertical.
|
||||||
|
; Again a triangle filter; see comments for h2v1 case, above.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_fancy_upsample_sse2 (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 4
|
||||||
|
%define gotptr wk(0)-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_fancy_upsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_fancy_upsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov edx,eax ; edx = original ebp
|
||||||
|
mov eax, POINTER [compptr(edx)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_downsampled_width(eax)] ; colctr
|
||||||
|
test eax,eax
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(edx)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(edx)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(edx)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax ; colctr
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov ecx, JSAMPROW [esi-1*SIZEOF_JSAMPROW] ; inptr1(above)
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1(below)
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
|
||||||
|
test eax, SIZEOF_XMMWORD-1
|
||||||
|
jz short .skip
|
||||||
|
push edx
|
||||||
|
mov dl, JSAMPLE [ecx+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [ecx+eax*SIZEOF_JSAMPLE], dl
|
||||||
|
mov dl, JSAMPLE [ebx+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [ebx+eax*SIZEOF_JSAMPLE], dl
|
||||||
|
mov dl, JSAMPLE [esi+(eax-1)*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [esi+eax*SIZEOF_JSAMPLE], dl ; insert a dummy sample
|
||||||
|
pop edx
|
||||||
|
.skip:
|
||||||
|
; -- process the first column block
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [ebx+0*SIZEOF_XMMWORD] ; xmm0=row[ 0][0]
|
||||||
|
movdqa xmm1, XMMWORD [ecx+0*SIZEOF_XMMWORD] ; xmm1=row[-1][0]
|
||||||
|
movdqa xmm2, XMMWORD [esi+0*SIZEOF_XMMWORD] ; xmm2=row[+1][0]
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
pxor xmm3,xmm3 ; xmm3=(all 0's)
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
punpcklbw xmm0,xmm3 ; xmm0=row[ 0]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm4,xmm3 ; xmm4=row[ 0]( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
punpcklbw xmm1,xmm3 ; xmm1=row[-1]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm5,xmm3 ; xmm5=row[-1]( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm6,xmm2
|
||||||
|
punpcklbw xmm2,xmm3 ; xmm2=row[+1]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm6,xmm3 ; xmm6=row[+1]( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
pmullw xmm0,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw xmm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
|
||||||
|
pcmpeqb xmm7,xmm7
|
||||||
|
psrldq xmm7,(SIZEOF_XMMWORD-2)
|
||||||
|
|
||||||
|
paddw xmm1,xmm0 ; xmm1=Int0L=( 0 1 2 3 4 5 6 7)
|
||||||
|
paddw xmm5,xmm4 ; xmm5=Int0H=( 8 9 10 11 12 13 14 15)
|
||||||
|
paddw xmm2,xmm0 ; xmm2=Int1L=( 0 1 2 3 4 5 6 7)
|
||||||
|
paddw xmm6,xmm4 ; xmm6=Int1H=( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
movdqa XMMWORD [edx+0*SIZEOF_XMMWORD], xmm1 ; temporarily save
|
||||||
|
movdqa XMMWORD [edx+1*SIZEOF_XMMWORD], xmm5 ; the intermediate data
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm2
|
||||||
|
movdqa XMMWORD [edi+1*SIZEOF_XMMWORD], xmm6
|
||||||
|
|
||||||
|
pand xmm1,xmm7 ; xmm1=( 0 -- -- -- -- -- -- --)
|
||||||
|
pand xmm2,xmm7 ; xmm2=( 0 -- -- -- -- -- -- --)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm1
|
||||||
|
movdqa XMMWORD [wk(1)], xmm2
|
||||||
|
|
||||||
|
poppic ebx
|
||||||
|
|
||||||
|
add eax, byte SIZEOF_XMMWORD-1
|
||||||
|
and eax, byte -SIZEOF_XMMWORD
|
||||||
|
cmp eax, byte SIZEOF_XMMWORD
|
||||||
|
ja short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop_last:
|
||||||
|
; -- process the last column block
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
pcmpeqb xmm1,xmm1
|
||||||
|
pslldq xmm1,(SIZEOF_XMMWORD-2)
|
||||||
|
movdqa xmm2,xmm1
|
||||||
|
|
||||||
|
pand xmm1, XMMWORD [edx+1*SIZEOF_XMMWORD]
|
||||||
|
pand xmm2, XMMWORD [edi+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(2)], xmm1 ; xmm1=(-- -- -- -- -- -- -- 15)
|
||||||
|
movdqa XMMWORD [wk(3)], xmm2 ; xmm2=(-- -- -- -- -- -- -- 15)
|
||||||
|
|
||||||
|
jmp near .upsample
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
; -- process the next column block
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [ebx+1*SIZEOF_XMMWORD] ; xmm0=row[ 0][1]
|
||||||
|
movdqa xmm1, XMMWORD [ecx+1*SIZEOF_XMMWORD] ; xmm1=row[-1][1]
|
||||||
|
movdqa xmm2, XMMWORD [esi+1*SIZEOF_XMMWORD] ; xmm2=row[+1][1]
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
pxor xmm3,xmm3 ; xmm3=(all 0's)
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
punpcklbw xmm0,xmm3 ; xmm0=row[ 0]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm4,xmm3 ; xmm4=row[ 0]( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
punpcklbw xmm1,xmm3 ; xmm1=row[-1]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm5,xmm3 ; xmm5=row[-1]( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm6,xmm2
|
||||||
|
punpcklbw xmm2,xmm3 ; xmm2=row[+1]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm6,xmm3 ; xmm6=row[+1]( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
pmullw xmm0,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw xmm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
|
||||||
|
paddw xmm1,xmm0 ; xmm1=Int0L=( 0 1 2 3 4 5 6 7)
|
||||||
|
paddw xmm5,xmm4 ; xmm5=Int0H=( 8 9 10 11 12 13 14 15)
|
||||||
|
paddw xmm2,xmm0 ; xmm2=Int1L=( 0 1 2 3 4 5 6 7)
|
||||||
|
paddw xmm6,xmm4 ; xmm6=Int1H=( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
movdqa XMMWORD [edx+2*SIZEOF_XMMWORD], xmm1 ; temporarily save
|
||||||
|
movdqa XMMWORD [edx+3*SIZEOF_XMMWORD], xmm5 ; the intermediate data
|
||||||
|
movdqa XMMWORD [edi+2*SIZEOF_XMMWORD], xmm2
|
||||||
|
movdqa XMMWORD [edi+3*SIZEOF_XMMWORD], xmm6
|
||||||
|
|
||||||
|
pslldq xmm1,(SIZEOF_XMMWORD-2) ; xmm1=(-- -- -- -- -- -- -- 0)
|
||||||
|
pslldq xmm2,(SIZEOF_XMMWORD-2) ; xmm2=(-- -- -- -- -- -- -- 0)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(2)], xmm1
|
||||||
|
movdqa XMMWORD [wk(3)], xmm2
|
||||||
|
|
||||||
|
.upsample:
|
||||||
|
; -- process the upper row
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [edx+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm3, XMMWORD [edx+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa xmm0,xmm7 ; xmm7=Int0L=( 0 1 2 3 4 5 6 7)
|
||||||
|
movdqa xmm4,xmm3 ; xmm3=Int0H=( 8 9 10 11 12 13 14 15)
|
||||||
|
psrldq xmm0,2 ; xmm0=( 1 2 3 4 5 6 7 --)
|
||||||
|
pslldq xmm4,(SIZEOF_XMMWORD-2) ; xmm4=(-- -- -- -- -- -- -- 8)
|
||||||
|
movdqa xmm5,xmm7
|
||||||
|
movdqa xmm6,xmm3
|
||||||
|
psrldq xmm5,(SIZEOF_XMMWORD-2) ; xmm5=( 7 -- -- -- -- -- -- --)
|
||||||
|
pslldq xmm6,2 ; xmm6=(-- 8 9 10 11 12 13 14)
|
||||||
|
|
||||||
|
por xmm0,xmm4 ; xmm0=( 1 2 3 4 5 6 7 8)
|
||||||
|
por xmm5,xmm6 ; xmm5=( 7 8 9 10 11 12 13 14)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm7
|
||||||
|
movdqa xmm2,xmm3
|
||||||
|
pslldq xmm1,2 ; xmm1=(-- 0 1 2 3 4 5 6)
|
||||||
|
psrldq xmm2,2 ; xmm2=( 9 10 11 12 13 14 15 --)
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
psrldq xmm4,(SIZEOF_XMMWORD-2) ; xmm4=(15 -- -- -- -- -- -- --)
|
||||||
|
|
||||||
|
por xmm1, XMMWORD [wk(0)] ; xmm1=(-1 0 1 2 3 4 5 6)
|
||||||
|
por xmm2, XMMWORD [wk(2)] ; xmm2=( 9 10 11 12 13 14 15 16)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm4
|
||||||
|
|
||||||
|
pmullw xmm7,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw xmm3,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw xmm1,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw xmm5,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw xmm0,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
paddw xmm2,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
|
||||||
|
paddw xmm1,xmm7
|
||||||
|
paddw xmm5,xmm3
|
||||||
|
psrlw xmm1,4 ; xmm1=Out0LE=( 0 2 4 6 8 10 12 14)
|
||||||
|
psrlw xmm5,4 ; xmm5=Out0HE=(16 18 20 22 24 26 28 30)
|
||||||
|
paddw xmm0,xmm7
|
||||||
|
paddw xmm2,xmm3
|
||||||
|
psrlw xmm0,4 ; xmm0=Out0LO=( 1 3 5 7 9 11 13 15)
|
||||||
|
psrlw xmm2,4 ; xmm2=Out0HO=(17 19 21 23 25 27 29 31)
|
||||||
|
|
||||||
|
psllw xmm0,BYTE_BIT
|
||||||
|
psllw xmm2,BYTE_BIT
|
||||||
|
por xmm1,xmm0 ; xmm1=Out0L=( 0 1 2 ... 13 14 15)
|
||||||
|
por xmm5,xmm2 ; xmm5=Out0H=(16 17 18 ... 29 30 31)
|
||||||
|
|
||||||
|
movdqa XMMWORD [edx+0*SIZEOF_XMMWORD], xmm1
|
||||||
|
movdqa XMMWORD [edx+1*SIZEOF_XMMWORD], xmm5
|
||||||
|
|
||||||
|
; -- process the lower row
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [edi+0*SIZEOF_XMMWORD]
|
||||||
|
movdqa xmm4, XMMWORD [edi+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa xmm7,xmm6 ; xmm6=Int1L=( 0 1 2 3 4 5 6 7)
|
||||||
|
movdqa xmm3,xmm4 ; xmm4=Int1H=( 8 9 10 11 12 13 14 15)
|
||||||
|
psrldq xmm7,2 ; xmm7=( 1 2 3 4 5 6 7 --)
|
||||||
|
pslldq xmm3,(SIZEOF_XMMWORD-2) ; xmm3=(-- -- -- -- -- -- -- 8)
|
||||||
|
movdqa xmm0,xmm6
|
||||||
|
movdqa xmm2,xmm4
|
||||||
|
psrldq xmm0,(SIZEOF_XMMWORD-2) ; xmm0=( 7 -- -- -- -- -- -- --)
|
||||||
|
pslldq xmm2,2 ; xmm2=(-- 8 9 10 11 12 13 14)
|
||||||
|
|
||||||
|
por xmm7,xmm3 ; xmm7=( 1 2 3 4 5 6 7 8)
|
||||||
|
por xmm0,xmm2 ; xmm0=( 7 8 9 10 11 12 13 14)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm6
|
||||||
|
movdqa xmm5,xmm4
|
||||||
|
pslldq xmm1,2 ; xmm1=(-- 0 1 2 3 4 5 6)
|
||||||
|
psrldq xmm5,2 ; xmm5=( 9 10 11 12 13 14 15 --)
|
||||||
|
movdqa xmm3,xmm4
|
||||||
|
psrldq xmm3,(SIZEOF_XMMWORD-2) ; xmm3=(15 -- -- -- -- -- -- --)
|
||||||
|
|
||||||
|
por xmm1, XMMWORD [wk(1)] ; xmm1=(-1 0 1 2 3 4 5 6)
|
||||||
|
por xmm5, XMMWORD [wk(3)] ; xmm5=( 9 10 11 12 13 14 15 16)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(1)], xmm3
|
||||||
|
|
||||||
|
pmullw xmm6,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw xmm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw xmm1,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw xmm0,[GOTOFF(ebx,PW_EIGHT)]
|
||||||
|
paddw xmm7,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
paddw xmm5,[GOTOFF(ebx,PW_SEVEN)]
|
||||||
|
|
||||||
|
paddw xmm1,xmm6
|
||||||
|
paddw xmm0,xmm4
|
||||||
|
psrlw xmm1,4 ; xmm1=Out1LE=( 0 2 4 6 8 10 12 14)
|
||||||
|
psrlw xmm0,4 ; xmm0=Out1HE=(16 18 20 22 24 26 28 30)
|
||||||
|
paddw xmm7,xmm6
|
||||||
|
paddw xmm5,xmm4
|
||||||
|
psrlw xmm7,4 ; xmm7=Out1LO=( 1 3 5 7 9 11 13 15)
|
||||||
|
psrlw xmm5,4 ; xmm5=Out1HO=(17 19 21 23 25 27 29 31)
|
||||||
|
|
||||||
|
psllw xmm7,BYTE_BIT
|
||||||
|
psllw xmm5,BYTE_BIT
|
||||||
|
por xmm1,xmm7 ; xmm1=Out1L=( 0 1 2 ... 13 14 15)
|
||||||
|
por xmm0,xmm5 ; xmm0=Out1H=(16 17 18 ... 29 30 31)
|
||||||
|
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm1
|
||||||
|
movdqa XMMWORD [edi+1*SIZEOF_XMMWORD], xmm0
|
||||||
|
|
||||||
|
poppic ebx
|
||||||
|
|
||||||
|
sub eax, byte SIZEOF_XMMWORD
|
||||||
|
add ecx, byte 1*SIZEOF_XMMWORD ; inptr1(above)
|
||||||
|
add ebx, byte 1*SIZEOF_XMMWORD ; inptr0
|
||||||
|
add esi, byte 1*SIZEOF_XMMWORD ; inptr1(below)
|
||||||
|
add edx, byte 2*SIZEOF_XMMWORD ; outptr0
|
||||||
|
add edi, byte 2*SIZEOF_XMMWORD ; outptr1
|
||||||
|
cmp eax, byte SIZEOF_XMMWORD
|
||||||
|
ja near .columnloop
|
||||||
|
test eax,eax
|
||||||
|
jnz near .columnloop_last
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte 1*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW ; output_data
|
||||||
|
sub ecx, byte 2 ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%ifdef UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Fancy processing for the common case of 1:1 horizontal and 2:1 vertical.
|
||||||
|
; Again a triangle filter; see comments for h2v1 case, above.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h1v2_fancy_upsample_sse2 (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
%define gotptr ebp-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h1v2_fancy_upsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h1v2_fancy_upsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
mov eax, POINTER [compptr(ebp)]
|
||||||
|
mov eax, JDIMENSION [jcompinfo_downsampled_width(eax)] ; colctr
|
||||||
|
add eax, byte SIZEOF_XMMWORD-1
|
||||||
|
and eax, byte -SIZEOF_XMMWORD
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push eax ; colctr
|
||||||
|
push ecx
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov ecx, JSAMPROW [esi-1*SIZEOF_JSAMPROW] ; inptr1(above)
|
||||||
|
mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; inptr0
|
||||||
|
mov esi, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; inptr1(below)
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
|
||||||
|
pxor xmm0,xmm0 ; xmm0=(all 0's)
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnloop:
|
||||||
|
movdqa xmm1, XMMWORD [ebx] ; xmm1=row[ 0]( 0 1 2 ... 13 14 15)
|
||||||
|
movdqa xmm2, XMMWORD [ecx] ; xmm2=row[-1]( 0 1 2 ... 13 14 15)
|
||||||
|
movdqa xmm3, XMMWORD [esi] ; xmm3=row[+1]( 0 1 2 ... 13 14 15)
|
||||||
|
|
||||||
|
pushpic ebx
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1
|
||||||
|
punpcklbw xmm1,xmm0 ; xmm1=row[ 0]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm4,xmm0 ; xmm4=row[ 0]( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm5,xmm2
|
||||||
|
punpcklbw xmm2,xmm0 ; xmm2=row[-1]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm5,xmm0 ; xmm5=row[-1]( 8 9 10 11 12 13 14 15)
|
||||||
|
movdqa xmm6,xmm3
|
||||||
|
punpcklbw xmm3,xmm0 ; xmm3=row[+1]( 0 1 2 3 4 5 6 7)
|
||||||
|
punpckhbw xmm6,xmm0 ; xmm6=row[+1]( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
pmullw xmm1,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
pmullw xmm4,[GOTOFF(ebx,PW_THREE)]
|
||||||
|
paddw xmm2,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw xmm5,[GOTOFF(ebx,PW_ONE)]
|
||||||
|
paddw xmm3,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
paddw xmm6,[GOTOFF(ebx,PW_TWO)]
|
||||||
|
|
||||||
|
paddw xmm2,xmm1
|
||||||
|
paddw xmm5,xmm4
|
||||||
|
psrlw xmm2,2 ; xmm2=Out0L=( 0 1 2 3 4 5 6 7)
|
||||||
|
psrlw xmm5,2 ; xmm5=Out0H=( 8 9 10 11 12 13 14 15)
|
||||||
|
paddw xmm3,xmm1
|
||||||
|
paddw xmm6,xmm4
|
||||||
|
psrlw xmm3,2 ; xmm3=Out1L=( 0 1 2 3 4 5 6 7)
|
||||||
|
psrlw xmm6,2 ; xmm6=Out1H=( 8 9 10 11 12 13 14 15)
|
||||||
|
|
||||||
|
packuswb xmm2,xmm5 ; xmm2=Out0=( 0 1 2 ... 13 14 15)
|
||||||
|
packuswb xmm3,xmm6 ; xmm3=Out1=( 0 1 2 ... 13 14 15)
|
||||||
|
|
||||||
|
movdqa XMMWORD [edx], xmm2
|
||||||
|
movdqa XMMWORD [edi], xmm3
|
||||||
|
|
||||||
|
poppic ebx
|
||||||
|
|
||||||
|
add ecx, byte 1*SIZEOF_XMMWORD ; inptr1(above)
|
||||||
|
add ebx, byte 1*SIZEOF_XMMWORD ; inptr0
|
||||||
|
add esi, byte 1*SIZEOF_XMMWORD ; inptr1(below)
|
||||||
|
add edx, byte 1*SIZEOF_XMMWORD ; outptr0
|
||||||
|
add edi, byte 1*SIZEOF_XMMWORD ; outptr1
|
||||||
|
sub eax, byte SIZEOF_XMMWORD
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
pop ecx
|
||||||
|
pop eax
|
||||||
|
|
||||||
|
add esi, byte 1*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW ; output_data
|
||||||
|
sub ecx, byte 2 ; rowctr
|
||||||
|
jg near .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
poppic eax ; remove gotptr
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; UPSAMPLE_H1V2_SUPPORTED
|
||||||
|
%endif ; JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
%ifdef JDSAMPLE_SIMPLE_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
%ifndef JDSAMPLE_FANCY_SSE2_SUPPORTED
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
%endif
|
||||||
|
;
|
||||||
|
; Fast processing for the common case of 2:1 horizontal and 1:1 vertical.
|
||||||
|
; It's still a box filter.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v1_upsample_sse2 (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v1_upsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v1_upsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jdstruct_output_width(edx)]
|
||||||
|
add edx, byte (2*SIZEOF_XMMWORD)-1
|
||||||
|
and edx, byte -(2*SIZEOF_XMMWORD)
|
||||||
|
jz short .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz short .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov edi, JSAMPROW [edi] ; outptr
|
||||||
|
mov eax,edx ; colctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa xmm1,xmm0
|
||||||
|
punpcklbw xmm0,xmm0
|
||||||
|
punpckhbw xmm1,xmm1
|
||||||
|
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm0
|
||||||
|
movdqa XMMWORD [edi+1*SIZEOF_XMMWORD], xmm1
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_XMMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
movdqa xmm2, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa xmm3,xmm2
|
||||||
|
punpcklbw xmm2,xmm2
|
||||||
|
punpckhbw xmm3,xmm3
|
||||||
|
|
||||||
|
movdqa XMMWORD [edi+2*SIZEOF_XMMWORD], xmm2
|
||||||
|
movdqa XMMWORD [edi+3*SIZEOF_XMMWORD], xmm3
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_XMMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_XMMWORD ; inptr
|
||||||
|
add edi, byte 4*SIZEOF_XMMWORD ; outptr
|
||||||
|
jmp short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
|
||||||
|
add esi, byte SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; output_data
|
||||||
|
dec ecx ; rowctr
|
||||||
|
jg short .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Fast processing for the common case of 2:1 horizontal and 2:1 vertical.
|
||||||
|
; It's still a box filter.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_h2v2_upsample_sse2 (j_decompress_ptr cinfo,
|
||||||
|
; jpeg_component_info * compptr,
|
||||||
|
; JSAMPARRAY input_data,
|
||||||
|
; JSAMPARRAY * output_data_ptr);
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define input_data(b) (b)+16 ; JSAMPARRAY input_data
|
||||||
|
%define output_data_ptr(b) (b)+20 ; JSAMPARRAY * output_data_ptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_h2v2_upsample_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_h2v2_upsample_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, JDIMENSION [jdstruct_output_width(edx)]
|
||||||
|
add edx, byte (2*SIZEOF_XMMWORD)-1
|
||||||
|
and edx, byte -(2*SIZEOF_XMMWORD)
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov ecx, POINTER [cinfo(ebp)]
|
||||||
|
mov ecx, INT [jdstruct_max_v_samp_factor(ecx)] ; rowctr
|
||||||
|
test ecx,ecx
|
||||||
|
jz near .return
|
||||||
|
|
||||||
|
mov esi, JSAMPARRAY [input_data(ebp)] ; input_data
|
||||||
|
mov edi, POINTER [output_data_ptr(ebp)]
|
||||||
|
mov edi, JSAMPARRAY [edi] ; output_data
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
push esi
|
||||||
|
|
||||||
|
mov esi, JSAMPROW [esi] ; inptr
|
||||||
|
mov ebx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] ; outptr0
|
||||||
|
mov edi, JSAMPROW [edi+1*SIZEOF_JSAMPROW] ; outptr1
|
||||||
|
mov eax,edx ; colctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [esi+0*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa xmm1,xmm0
|
||||||
|
punpcklbw xmm0,xmm0
|
||||||
|
punpckhbw xmm1,xmm1
|
||||||
|
|
||||||
|
movdqa XMMWORD [ebx+0*SIZEOF_XMMWORD], xmm0
|
||||||
|
movdqa XMMWORD [ebx+1*SIZEOF_XMMWORD], xmm1
|
||||||
|
movdqa XMMWORD [edi+0*SIZEOF_XMMWORD], xmm0
|
||||||
|
movdqa XMMWORD [edi+1*SIZEOF_XMMWORD], xmm1
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_XMMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
movdqa xmm2, XMMWORD [esi+1*SIZEOF_XMMWORD]
|
||||||
|
|
||||||
|
movdqa xmm3,xmm2
|
||||||
|
punpcklbw xmm2,xmm2
|
||||||
|
punpckhbw xmm3,xmm3
|
||||||
|
|
||||||
|
movdqa XMMWORD [ebx+2*SIZEOF_XMMWORD], xmm2
|
||||||
|
movdqa XMMWORD [ebx+3*SIZEOF_XMMWORD], xmm3
|
||||||
|
movdqa XMMWORD [edi+2*SIZEOF_XMMWORD], xmm2
|
||||||
|
movdqa XMMWORD [edi+3*SIZEOF_XMMWORD], xmm3
|
||||||
|
|
||||||
|
sub eax, byte 2*SIZEOF_XMMWORD
|
||||||
|
jz short .nextrow
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_XMMWORD ; inptr
|
||||||
|
add ebx, byte 4*SIZEOF_XMMWORD ; outptr0
|
||||||
|
add edi, byte 4*SIZEOF_XMMWORD ; outptr1
|
||||||
|
jmp short .columnloop
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop esi
|
||||||
|
pop edi
|
||||||
|
|
||||||
|
add esi, byte 1*SIZEOF_JSAMPROW ; input_data
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW ; output_data
|
||||||
|
sub ecx, byte 2 ; rowctr
|
||||||
|
jg short .rowloop
|
||||||
|
|
||||||
|
.return:
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JDSAMPLE_SIMPLE_SSE2_SUPPORTED
|
||||||
27
jdtrans.c
27
jdtrans.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jdtrans.c
|
* jdtrans.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1995-1996, Thomas G. Lane.
|
* Copyright (C) 1995-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -30,6 +30,13 @@ LOCAL(void) transdecode_master_selection JPP((j_decompress_ptr cinfo));
|
|||||||
* To release the memory occupied by the virtual arrays, call
|
* To release the memory occupied by the virtual arrays, call
|
||||||
* jpeg_finish_decompress() when done with the data.
|
* jpeg_finish_decompress() when done with the data.
|
||||||
*
|
*
|
||||||
|
* An alternative usage is to simply obtain access to the coefficient arrays
|
||||||
|
* during a buffered-image-mode decompression operation. This is allowed
|
||||||
|
* after any jpeg_finish_output() call. The arrays can be accessed until
|
||||||
|
* jpeg_finish_decompress() is called. (Note that any call to the library
|
||||||
|
* may reposition the arrays, so don't rely on access_virt_barray() results
|
||||||
|
* to stay valid across library calls.)
|
||||||
|
*
|
||||||
* Returns NULL if suspended. This case need be checked only if
|
* Returns NULL if suspended. This case need be checked only if
|
||||||
* a suspending data source is used.
|
* a suspending data source is used.
|
||||||
*/
|
*/
|
||||||
@@ -41,8 +48,8 @@ jpeg_read_coefficients (j_decompress_ptr cinfo)
|
|||||||
/* First call: initialize active modules */
|
/* First call: initialize active modules */
|
||||||
transdecode_master_selection(cinfo);
|
transdecode_master_selection(cinfo);
|
||||||
cinfo->global_state = DSTATE_RDCOEFS;
|
cinfo->global_state = DSTATE_RDCOEFS;
|
||||||
} else if (cinfo->global_state != DSTATE_RDCOEFS)
|
}
|
||||||
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
if (cinfo->global_state == DSTATE_RDCOEFS) {
|
||||||
/* Absorb whole file into the coef buffer */
|
/* Absorb whole file into the coef buffer */
|
||||||
for (;;) {
|
for (;;) {
|
||||||
int retcode;
|
int retcode;
|
||||||
@@ -66,7 +73,18 @@ jpeg_read_coefficients (j_decompress_ptr cinfo)
|
|||||||
}
|
}
|
||||||
/* Set state so that jpeg_finish_decompress does the right thing */
|
/* Set state so that jpeg_finish_decompress does the right thing */
|
||||||
cinfo->global_state = DSTATE_STOPPING;
|
cinfo->global_state = DSTATE_STOPPING;
|
||||||
|
}
|
||||||
|
/* At this point we should be in state DSTATE_STOPPING if being used
|
||||||
|
* standalone, or in state DSTATE_BUFIMAGE if being invoked to get access
|
||||||
|
* to the coefficients during a full buffered-image-mode decompression.
|
||||||
|
*/
|
||||||
|
if ((cinfo->global_state == DSTATE_STOPPING ||
|
||||||
|
cinfo->global_state == DSTATE_BUFIMAGE) && cinfo->buffered_image) {
|
||||||
return cinfo->coef->coef_arrays;
|
return cinfo->coef->coef_arrays;
|
||||||
|
}
|
||||||
|
/* Oops, improper usage */
|
||||||
|
ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);
|
||||||
|
return NULL; /* keep compiler happy */
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -78,6 +96,9 @@ jpeg_read_coefficients (j_decompress_ptr cinfo)
|
|||||||
LOCAL(void)
|
LOCAL(void)
|
||||||
transdecode_master_selection (j_decompress_ptr cinfo)
|
transdecode_master_selection (j_decompress_ptr cinfo)
|
||||||
{
|
{
|
||||||
|
/* This is effectively a buffered-image operation. */
|
||||||
|
cinfo->buffered_image = TRUE;
|
||||||
|
|
||||||
/* Entropy decoding: either Huffman or arithmetic coding. */
|
/* Entropy decoding: either Huffman or arithmetic coding. */
|
||||||
if (cinfo->arith_code) {
|
if (cinfo->arith_code) {
|
||||||
ERREXIT(cinfo, JERR_ARITH_NOTIMPL);
|
ERREXIT(cinfo, JERR_ARITH_NOTIMPL);
|
||||||
|
|||||||
26
jerror.c
26
jerror.c
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jerror.c
|
* jerror.c
|
||||||
*
|
*
|
||||||
* Copyright (C) 1991-1996, Thomas G. Lane.
|
* Copyright (C) 1991-1998, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -10,6 +10,11 @@
|
|||||||
* stderr is the right thing to do. Many applications will want to replace
|
* stderr is the right thing to do. Many applications will want to replace
|
||||||
* some or all of these routines.
|
* some or all of these routines.
|
||||||
*
|
*
|
||||||
|
* If you define USE_WINDOWS_MESSAGEBOX in jconfig.h or in the makefile,
|
||||||
|
* you get a Windows-specific hack to display error messages in a dialog box.
|
||||||
|
* It ain't much, but it beats dropping error messages into the bit bucket,
|
||||||
|
* which is what happens to output to stderr under most Windows C compilers.
|
||||||
|
*
|
||||||
* These routines are used by both the compression and decompression code.
|
* These routines are used by both the compression and decompression code.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
@@ -19,6 +24,10 @@
|
|||||||
#include "jversion.h"
|
#include "jversion.h"
|
||||||
#include "jerror.h"
|
#include "jerror.h"
|
||||||
|
|
||||||
|
#ifdef USE_WINDOWS_MESSAGEBOX
|
||||||
|
#include <windows.h>
|
||||||
|
#endif
|
||||||
|
|
||||||
#ifndef EXIT_FAILURE /* define exit() codes if not provided */
|
#ifndef EXIT_FAILURE /* define exit() codes if not provided */
|
||||||
#define EXIT_FAILURE 1
|
#define EXIT_FAILURE 1
|
||||||
#endif
|
#endif
|
||||||
@@ -74,6 +83,15 @@ error_exit (j_common_ptr cinfo)
|
|||||||
* Actual output of an error or trace message.
|
* Actual output of an error or trace message.
|
||||||
* Applications may override this method to send JPEG messages somewhere
|
* Applications may override this method to send JPEG messages somewhere
|
||||||
* other than stderr.
|
* other than stderr.
|
||||||
|
*
|
||||||
|
* On Windows, printing to stderr is generally completely useless,
|
||||||
|
* so we provide optional code to produce an error-dialog popup.
|
||||||
|
* Most Windows applications will still prefer to override this routine,
|
||||||
|
* but if they don't, it'll do something at least marginally useful.
|
||||||
|
*
|
||||||
|
* NOTE: to use the library in an environment that doesn't support the
|
||||||
|
* C stdio library, you may have to delete the call to fprintf() entirely,
|
||||||
|
* not just not use this routine.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
METHODDEF(void)
|
METHODDEF(void)
|
||||||
@@ -84,8 +102,14 @@ output_message (j_common_ptr cinfo)
|
|||||||
/* Create the message */
|
/* Create the message */
|
||||||
(*cinfo->err->format_message) (cinfo, buffer);
|
(*cinfo->err->format_message) (cinfo, buffer);
|
||||||
|
|
||||||
|
#ifdef USE_WINDOWS_MESSAGEBOX
|
||||||
|
/* Display it in a message dialog box */
|
||||||
|
MessageBox(GetActiveWindow(), buffer, "JPEG Library Error",
|
||||||
|
MB_OK | MB_ICONERROR);
|
||||||
|
#else
|
||||||
/* Send it to stderr, adding a newline */
|
/* Send it to stderr, adding a newline */
|
||||||
fprintf(stderr, "%s\n", buffer);
|
fprintf(stderr, "%s\n", buffer);
|
||||||
|
#endif
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
24
jerror.h
24
jerror.h
@@ -1,7 +1,7 @@
|
|||||||
/*
|
/*
|
||||||
* jerror.h
|
* jerror.h
|
||||||
*
|
*
|
||||||
* Copyright (C) 1994-1995, Thomas G. Lane.
|
* Copyright (C) 1994-1997, Thomas G. Lane.
|
||||||
* This file is part of the Independent JPEG Group's software.
|
* This file is part of the Independent JPEG Group's software.
|
||||||
* For conditions of distribution and use, see the accompanying README file.
|
* For conditions of distribution and use, see the accompanying README file.
|
||||||
*
|
*
|
||||||
@@ -45,7 +45,9 @@ JMESSAGE(JERR_BAD_ALIGN_TYPE, "ALIGN_TYPE is wrong, please fix")
|
|||||||
JMESSAGE(JERR_BAD_ALLOC_CHUNK, "MAX_ALLOC_CHUNK is wrong, please fix")
|
JMESSAGE(JERR_BAD_ALLOC_CHUNK, "MAX_ALLOC_CHUNK is wrong, please fix")
|
||||||
JMESSAGE(JERR_BAD_BUFFER_MODE, "Bogus buffer control mode")
|
JMESSAGE(JERR_BAD_BUFFER_MODE, "Bogus buffer control mode")
|
||||||
JMESSAGE(JERR_BAD_COMPONENT_ID, "Invalid component ID %d in SOS")
|
JMESSAGE(JERR_BAD_COMPONENT_ID, "Invalid component ID %d in SOS")
|
||||||
|
JMESSAGE(JERR_BAD_DCT_COEF, "DCT coefficient out of range")
|
||||||
JMESSAGE(JERR_BAD_DCTSIZE, "IDCT output block size %d not supported")
|
JMESSAGE(JERR_BAD_DCTSIZE, "IDCT output block size %d not supported")
|
||||||
|
JMESSAGE(JERR_BAD_HUFF_TABLE, "Bogus Huffman table definition")
|
||||||
JMESSAGE(JERR_BAD_IN_COLORSPACE, "Bogus input colorspace")
|
JMESSAGE(JERR_BAD_IN_COLORSPACE, "Bogus input colorspace")
|
||||||
JMESSAGE(JERR_BAD_J_COLORSPACE, "Bogus JPEG colorspace")
|
JMESSAGE(JERR_BAD_J_COLORSPACE, "Bogus JPEG colorspace")
|
||||||
JMESSAGE(JERR_BAD_LENGTH, "Bogus marker length")
|
JMESSAGE(JERR_BAD_LENGTH, "Bogus marker length")
|
||||||
@@ -71,7 +73,6 @@ JMESSAGE(JERR_COMPONENT_COUNT, "Too many color components: %d, max %d")
|
|||||||
JMESSAGE(JERR_CONVERSION_NOTIMPL, "Unsupported color conversion request")
|
JMESSAGE(JERR_CONVERSION_NOTIMPL, "Unsupported color conversion request")
|
||||||
JMESSAGE(JERR_DAC_INDEX, "Bogus DAC index %d")
|
JMESSAGE(JERR_DAC_INDEX, "Bogus DAC index %d")
|
||||||
JMESSAGE(JERR_DAC_VALUE, "Bogus DAC value 0x%x")
|
JMESSAGE(JERR_DAC_VALUE, "Bogus DAC value 0x%x")
|
||||||
JMESSAGE(JERR_DHT_COUNTS, "Bogus DHT counts")
|
|
||||||
JMESSAGE(JERR_DHT_INDEX, "Bogus DHT index %d")
|
JMESSAGE(JERR_DHT_INDEX, "Bogus DHT index %d")
|
||||||
JMESSAGE(JERR_DQT_INDEX, "Bogus DQT index %d")
|
JMESSAGE(JERR_DQT_INDEX, "Bogus DQT index %d")
|
||||||
JMESSAGE(JERR_EMPTY_IMAGE, "Empty JPEG image (DNL not supported)")
|
JMESSAGE(JERR_EMPTY_IMAGE, "Empty JPEG image (DNL not supported)")
|
||||||
@@ -134,12 +135,13 @@ JMESSAGE(JTRC_EMS_CLOSE, "Freed EMS handle %u")
|
|||||||
JMESSAGE(JTRC_EMS_OPEN, "Obtained EMS handle %u")
|
JMESSAGE(JTRC_EMS_OPEN, "Obtained EMS handle %u")
|
||||||
JMESSAGE(JTRC_EOI, "End Of Image")
|
JMESSAGE(JTRC_EOI, "End Of Image")
|
||||||
JMESSAGE(JTRC_HUFFBITS, " %3d %3d %3d %3d %3d %3d %3d %3d")
|
JMESSAGE(JTRC_HUFFBITS, " %3d %3d %3d %3d %3d %3d %3d %3d")
|
||||||
JMESSAGE(JTRC_JFIF, "JFIF APP0 marker, density %dx%d %d")
|
JMESSAGE(JTRC_JFIF, "JFIF APP0 marker: version %d.%02d, density %dx%d %d")
|
||||||
JMESSAGE(JTRC_JFIF_BADTHUMBNAILSIZE,
|
JMESSAGE(JTRC_JFIF_BADTHUMBNAILSIZE,
|
||||||
"Warning: thumbnail image size does not match data length %u")
|
"Warning: thumbnail image size does not match data length %u")
|
||||||
JMESSAGE(JTRC_JFIF_MINOR, "Unknown JFIF minor revision number %d.%02d")
|
JMESSAGE(JTRC_JFIF_EXTENSION,
|
||||||
|
"JFIF extension marker: type 0x%02x, length %u")
|
||||||
JMESSAGE(JTRC_JFIF_THUMBNAIL, " with %d x %d thumbnail image")
|
JMESSAGE(JTRC_JFIF_THUMBNAIL, " with %d x %d thumbnail image")
|
||||||
JMESSAGE(JTRC_MISC_MARKER, "Skipping marker 0x%02x, length %u")
|
JMESSAGE(JTRC_MISC_MARKER, "Miscellaneous marker 0x%02x, length %u")
|
||||||
JMESSAGE(JTRC_PARMLESS_MARKER, "Unexpected marker 0x%02x")
|
JMESSAGE(JTRC_PARMLESS_MARKER, "Unexpected marker 0x%02x")
|
||||||
JMESSAGE(JTRC_QUANTVALS, " %4u %4u %4u %4u %4u %4u %4u %4u")
|
JMESSAGE(JTRC_QUANTVALS, " %4u %4u %4u %4u %4u %4u %4u %4u")
|
||||||
JMESSAGE(JTRC_QUANT_3_NCOLORS, "Quantizing to %d = %d*%d*%d colors")
|
JMESSAGE(JTRC_QUANT_3_NCOLORS, "Quantizing to %d = %d*%d*%d colors")
|
||||||
@@ -157,6 +159,12 @@ JMESSAGE(JTRC_SOS_COMPONENT, " Component %d: dc=%d ac=%d")
|
|||||||
JMESSAGE(JTRC_SOS_PARAMS, " Ss=%d, Se=%d, Ah=%d, Al=%d")
|
JMESSAGE(JTRC_SOS_PARAMS, " Ss=%d, Se=%d, Ah=%d, Al=%d")
|
||||||
JMESSAGE(JTRC_TFILE_CLOSE, "Closed temporary file %s")
|
JMESSAGE(JTRC_TFILE_CLOSE, "Closed temporary file %s")
|
||||||
JMESSAGE(JTRC_TFILE_OPEN, "Opened temporary file %s")
|
JMESSAGE(JTRC_TFILE_OPEN, "Opened temporary file %s")
|
||||||
|
JMESSAGE(JTRC_THUMB_JPEG,
|
||||||
|
"JFIF extension marker: JPEG-compressed thumbnail image, length %u")
|
||||||
|
JMESSAGE(JTRC_THUMB_PALETTE,
|
||||||
|
"JFIF extension marker: palette thumbnail image, length %u")
|
||||||
|
JMESSAGE(JTRC_THUMB_RGB,
|
||||||
|
"JFIF extension marker: RGB thumbnail image, length %u")
|
||||||
JMESSAGE(JTRC_UNKNOWN_IDS,
|
JMESSAGE(JTRC_UNKNOWN_IDS,
|
||||||
"Unrecognized component IDs %d %d %d, assuming YCbCr")
|
"Unrecognized component IDs %d %d %d, assuming YCbCr")
|
||||||
JMESSAGE(JTRC_XMS_CLOSE, "Freed XMS handle %u")
|
JMESSAGE(JTRC_XMS_CLOSE, "Freed XMS handle %u")
|
||||||
@@ -263,6 +271,12 @@ JMESSAGE(JWRN_TOO_MUCH_DATA, "Application transferred too many scanlines")
|
|||||||
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
|
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
|
||||||
(cinfo)->err->msg_code = (code); \
|
(cinfo)->err->msg_code = (code); \
|
||||||
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
|
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
|
||||||
|
#define TRACEMS5(cinfo,lvl,code,p1,p2,p3,p4,p5) \
|
||||||
|
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
|
||||||
|
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
|
||||||
|
_mp[4] = (p5); \
|
||||||
|
(cinfo)->err->msg_code = (code); \
|
||||||
|
(*(cinfo)->err->emit_message) ((j_common_ptr) (cinfo), (lvl)); )
|
||||||
#define TRACEMS8(cinfo,lvl,code,p1,p2,p3,p4,p5,p6,p7,p8) \
|
#define TRACEMS8(cinfo,lvl,code,p1,p2,p3,p4,p5,p6,p7,p8) \
|
||||||
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
|
MAKESTMT(int * _mp = (cinfo)->err->msg_parm.i; \
|
||||||
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
|
_mp[0] = (p1); _mp[1] = (p2); _mp[2] = (p3); _mp[3] = (p4); \
|
||||||
|
|||||||
327
jf3dnflt.asm
Normal file
327
jf3dnflt.asm
Normal file
@@ -0,0 +1,327 @@
|
|||||||
|
;
|
||||||
|
; jf3dnflt.asm - floating-point FDCT (3DNow!)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a floating-point implementation of the forward DCT
|
||||||
|
; (Discrete Cosine Transform). The following code is based directly on
|
||||||
|
; the IJG's original jfdctflt.c; see the jfdctflt.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JFDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_float_3dnow)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_float_3dnow):
|
||||||
|
|
||||||
|
PD_0_382 times 2 dd 0.382683432365089771728460
|
||||||
|
PD_0_707 times 2 dd 0.707106781186547524400844
|
||||||
|
PD_0_541 times 2 dd 0.541196100146196984399723
|
||||||
|
PD_1_306 times 2 dd 1.306562964876376527856643
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_float_3dnow (FAST_FLOAT * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; FAST_FLOAT * data
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_float_3dnow)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_float_3dnow):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (FAST_FLOAT *)
|
||||||
|
mov ecx, DCTSIZE/2
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(0,3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(1,3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; mm0=(00 01), mm1=(10 11), mm2=(06 07), mm3=(16 17)
|
||||||
|
|
||||||
|
movq mm4,mm0 ; transpose coefficients
|
||||||
|
punpckldq mm0,mm1 ; mm0=(00 10)=data0
|
||||||
|
punpckhdq mm4,mm1 ; mm4=(01 11)=data1
|
||||||
|
movq mm5,mm2 ; transpose coefficients
|
||||||
|
punpckldq mm2,mm3 ; mm2=(06 16)=data6
|
||||||
|
punpckhdq mm5,mm3 ; mm5=(07 17)=data7
|
||||||
|
|
||||||
|
movq mm6,mm4
|
||||||
|
movq mm7,mm0
|
||||||
|
pfsub mm4,mm2 ; mm4=data1-data6=tmp6
|
||||||
|
pfsub mm0,mm5 ; mm0=data0-data7=tmp7
|
||||||
|
pfadd mm6,mm2 ; mm6=data1+data6=tmp1
|
||||||
|
pfadd mm7,mm5 ; mm7=data0+data7=tmp0
|
||||||
|
|
||||||
|
movq mm1, MMWORD [MMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(0,2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(1,2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; mm1=(02 03), mm3=(12 13), mm2=(04 05), mm5=(14 15)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=tmp6
|
||||||
|
movq MMWORD [wk(1)], mm0 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movq mm4,mm1 ; transpose coefficients
|
||||||
|
punpckldq mm1,mm3 ; mm1=(02 12)=data2
|
||||||
|
punpckhdq mm4,mm3 ; mm4=(03 13)=data3
|
||||||
|
movq mm0,mm2 ; transpose coefficients
|
||||||
|
punpckldq mm2,mm5 ; mm2=(04 14)=data4
|
||||||
|
punpckhdq mm0,mm5 ; mm0=(05 15)=data5
|
||||||
|
|
||||||
|
movq mm3,mm4
|
||||||
|
movq mm5,mm1
|
||||||
|
pfadd mm4,mm2 ; mm4=data3+data4=tmp3
|
||||||
|
pfadd mm1,mm0 ; mm1=data2+data5=tmp2
|
||||||
|
pfsub mm3,mm2 ; mm3=data3-data4=tmp4
|
||||||
|
pfsub mm5,mm0 ; mm5=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm2,mm7
|
||||||
|
movq mm0,mm6
|
||||||
|
pfsub mm7,mm4 ; mm7=tmp13
|
||||||
|
pfsub mm6,mm1 ; mm6=tmp12
|
||||||
|
pfadd mm2,mm4 ; mm2=tmp10
|
||||||
|
pfadd mm0,mm1 ; mm0=tmp11
|
||||||
|
|
||||||
|
pfadd mm6,mm7
|
||||||
|
pfmul mm6,[GOTOFF(ebx,PD_0_707)] ; mm6=z1
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm1,mm7
|
||||||
|
pfsub mm2,mm0 ; mm2=data4
|
||||||
|
pfsub mm7,mm6 ; mm7=data6
|
||||||
|
pfadd mm4,mm0 ; mm4=data0
|
||||||
|
pfadd mm1,mm6 ; mm1=data2
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,2,edx,SIZEOF_FAST_FLOAT)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(0,3,edx,SIZEOF_FAST_FLOAT)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [wk(0)] ; mm0=tmp6
|
||||||
|
movq mm6, MMWORD [wk(1)] ; mm6=tmp7
|
||||||
|
|
||||||
|
pfadd mm3,mm5 ; mm3=tmp10
|
||||||
|
pfadd mm5,mm0 ; mm5=tmp11
|
||||||
|
pfadd mm0,mm6 ; mm0=tmp12, mm6=tmp7
|
||||||
|
|
||||||
|
pfmul mm5,[GOTOFF(ebx,PD_0_707)] ; mm5=z3
|
||||||
|
|
||||||
|
movq mm2,mm3 ; mm2=tmp10
|
||||||
|
pfsub mm3,mm0
|
||||||
|
pfmul mm3,[GOTOFF(ebx,PD_0_382)] ; mm3=z5
|
||||||
|
pfmul mm2,[GOTOFF(ebx,PD_0_541)] ; mm2=MULTIPLY(tmp10,FIX_0_54119610)
|
||||||
|
pfmul mm0,[GOTOFF(ebx,PD_1_306)] ; mm0=MULTIPLY(tmp12,FIX_1_30656296)
|
||||||
|
pfadd mm2,mm3 ; mm2=z2
|
||||||
|
pfadd mm0,mm3 ; mm0=z4
|
||||||
|
|
||||||
|
movq mm7,mm6
|
||||||
|
pfsub mm6,mm5 ; mm6=z13
|
||||||
|
pfadd mm7,mm5 ; mm7=z11
|
||||||
|
|
||||||
|
movq mm4,mm6
|
||||||
|
movq mm1,mm7
|
||||||
|
pfsub mm6,mm2 ; mm6=data3
|
||||||
|
pfsub mm7,mm0 ; mm7=data7
|
||||||
|
pfadd mm4,mm2 ; mm4=data5
|
||||||
|
pfadd mm1,mm0 ; mm1=data1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(1,3,edx,SIZEOF_FAST_FLOAT)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(1,2,edx,SIZEOF_FAST_FLOAT)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
|
||||||
|
add edx, byte 2*DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (FAST_FLOAT *)
|
||||||
|
mov ecx, DCTSIZE/2
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(6,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; mm0=(00 10), mm1=(01 11), mm2=(60 70), mm3=(61 71)
|
||||||
|
|
||||||
|
movq mm4,mm0 ; transpose coefficients
|
||||||
|
punpckldq mm0,mm1 ; mm0=(00 01)=data0
|
||||||
|
punpckhdq mm4,mm1 ; mm4=(10 11)=data1
|
||||||
|
movq mm5,mm2 ; transpose coefficients
|
||||||
|
punpckldq mm2,mm3 ; mm2=(60 61)=data6
|
||||||
|
punpckhdq mm5,mm3 ; mm5=(70 71)=data7
|
||||||
|
|
||||||
|
movq mm6,mm4
|
||||||
|
movq mm7,mm0
|
||||||
|
pfsub mm4,mm2 ; mm4=data1-data6=tmp6
|
||||||
|
pfsub mm0,mm5 ; mm0=data0-data7=tmp7
|
||||||
|
pfadd mm6,mm2 ; mm6=data1+data6=tmp1
|
||||||
|
pfadd mm7,mm5 ; mm7=data0+data7=tmp0
|
||||||
|
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(3,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(4,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(5,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; mm1=(20 30), mm3=(21 31), mm2=(40 50), mm5=(41 51)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=tmp6
|
||||||
|
movq MMWORD [wk(1)], mm0 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movq mm4,mm1 ; transpose coefficients
|
||||||
|
punpckldq mm1,mm3 ; mm1=(20 21)=data2
|
||||||
|
punpckhdq mm4,mm3 ; mm4=(30 31)=data3
|
||||||
|
movq mm0,mm2 ; transpose coefficients
|
||||||
|
punpckldq mm2,mm5 ; mm2=(40 41)=data4
|
||||||
|
punpckhdq mm0,mm5 ; mm0=(50 51)=data5
|
||||||
|
|
||||||
|
movq mm3,mm4
|
||||||
|
movq mm5,mm1
|
||||||
|
pfadd mm4,mm2 ; mm4=data3+data4=tmp3
|
||||||
|
pfadd mm1,mm0 ; mm1=data2+data5=tmp2
|
||||||
|
pfsub mm3,mm2 ; mm3=data3-data4=tmp4
|
||||||
|
pfsub mm5,mm0 ; mm5=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm2,mm7
|
||||||
|
movq mm0,mm6
|
||||||
|
pfsub mm7,mm4 ; mm7=tmp13
|
||||||
|
pfsub mm6,mm1 ; mm6=tmp12
|
||||||
|
pfadd mm2,mm4 ; mm2=tmp10
|
||||||
|
pfadd mm0,mm1 ; mm0=tmp11
|
||||||
|
|
||||||
|
pfadd mm6,mm7
|
||||||
|
pfmul mm6,[GOTOFF(ebx,PD_0_707)] ; mm6=z1
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm1,mm7
|
||||||
|
pfsub mm2,mm0 ; mm2=data4
|
||||||
|
pfsub mm7,mm6 ; mm7=data6
|
||||||
|
pfadd mm4,mm0 ; mm4=data0
|
||||||
|
pfadd mm1,mm6 ; mm1=data2
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(4,0,edx,SIZEOF_FAST_FLOAT)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(6,0,edx,SIZEOF_FAST_FLOAT)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edx,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [wk(0)] ; mm0=tmp6
|
||||||
|
movq mm6, MMWORD [wk(1)] ; mm6=tmp7
|
||||||
|
|
||||||
|
pfadd mm3,mm5 ; mm3=tmp10
|
||||||
|
pfadd mm5,mm0 ; mm5=tmp11
|
||||||
|
pfadd mm0,mm6 ; mm0=tmp12, mm6=tmp7
|
||||||
|
|
||||||
|
pfmul mm5,[GOTOFF(ebx,PD_0_707)] ; mm5=z3
|
||||||
|
|
||||||
|
movq mm2,mm3 ; mm2=tmp10
|
||||||
|
pfsub mm3,mm0
|
||||||
|
pfmul mm3,[GOTOFF(ebx,PD_0_382)] ; mm3=z5
|
||||||
|
pfmul mm2,[GOTOFF(ebx,PD_0_541)] ; mm2=MULTIPLY(tmp10,FIX_0_54119610)
|
||||||
|
pfmul mm0,[GOTOFF(ebx,PD_1_306)] ; mm0=MULTIPLY(tmp12,FIX_1_30656296)
|
||||||
|
pfadd mm2,mm3 ; mm2=z2
|
||||||
|
pfadd mm0,mm3 ; mm0=z4
|
||||||
|
|
||||||
|
movq mm7,mm6
|
||||||
|
pfsub mm6,mm5 ; mm6=z13
|
||||||
|
pfadd mm7,mm5 ; mm7=z11
|
||||||
|
|
||||||
|
movq mm4,mm6
|
||||||
|
movq mm1,mm7
|
||||||
|
pfsub mm6,mm2 ; mm6=data3
|
||||||
|
pfsub mm7,mm0 ; mm7=data7
|
||||||
|
pfadd mm4,mm2 ; mm4=data5
|
||||||
|
pfadd mm1,mm0 ; mm1=data1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edx,SIZEOF_FAST_FLOAT)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(7,0,edx,SIZEOF_FAST_FLOAT)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(5,0,edx,SIZEOF_FAST_FLOAT)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
|
||||||
|
add edx, byte 2*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
femms ; empty MMX/3DNow! state
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
288
jfdctflt.asm
Normal file
288
jfdctflt.asm
Normal file
@@ -0,0 +1,288 @@
|
|||||||
|
;
|
||||||
|
; jfdctflt.asm - floating-point FDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a floating-point implementation of the forward DCT
|
||||||
|
; (Discrete Cosine Transform). The following code is based directly on
|
||||||
|
; the IJG's original jfdctflt.c; see the jfdctflt.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
%define ROTATOR_TYPE FP32 ; float
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_float)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_float):
|
||||||
|
|
||||||
|
F_0_382 dd 0.382683432365089771728460 ; cos(PI*3/8)
|
||||||
|
F_0_707 dd 0.707106781186547524400844 ; cos(PI*1/4)
|
||||||
|
F_0_541 dd 0.541196100146196984399723 ; cos(PI*1/8)-cos(PI*3/8)
|
||||||
|
F_1_306 dd 1.306562964876376527856643 ; cos(PI*1/8)+cos(PI*3/8)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_float (FAST_FLOAT * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; FAST_FLOAT * data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_float)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_float):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(ebp)] ; (FAST_FLOAT *)
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
fld FAST_FLOAT [ROW(1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [ROW(6,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [ROW(7,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [ROW(4,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [ROW(5,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
fld st2 ; st2 = st2 + st1, st1 = st2 - st1
|
||||||
|
fsub st0,st2
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st3,st0
|
||||||
|
fld st3 ; st3 = st3 + st0, st0 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fadd st0,st1
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_707)]
|
||||||
|
|
||||||
|
fld st2 ; st3 = st2 + st3, st2 = st2 - st3
|
||||||
|
fsub st0,st4
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st4,st0
|
||||||
|
fld st1 ; st0 = st1 + st0, st1 = st1 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
fld FAST_FLOAT [ROW(0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [ROW(7,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
fld FAST_FLOAT [ROW(3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [ROW(4,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
fld FAST_FLOAT [ROW(1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [ROW(6,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
fld FAST_FLOAT [ROW(2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [ROW(5,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [ROW(2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [ROW(6,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [ROW(4,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [ROW(0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
fadd st2,st0
|
||||||
|
fadd st0,st1
|
||||||
|
fxch st0,st3
|
||||||
|
fadd st1,st0
|
||||||
|
fxch st0,st3
|
||||||
|
|
||||||
|
fld st2
|
||||||
|
fxch st0,st1
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_707)]
|
||||||
|
fxch st0,st1
|
||||||
|
fsub st0,st2
|
||||||
|
fxch st0,st3
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_541)]
|
||||||
|
fxch st0,st3
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_382)]
|
||||||
|
fxch st0,st2
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_306)]
|
||||||
|
fxch st0,st2
|
||||||
|
fadd st3,st0
|
||||||
|
faddp st2,st0
|
||||||
|
|
||||||
|
fld st3 ; st3 = st3 + st0, st0 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fld st2 ; st0 = st0 + st2, st2 = st0 - st2
|
||||||
|
fsubr st0,st1
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st1,st0
|
||||||
|
fld st1 ; st3 = st3 + st1, st1 = st3 - st1
|
||||||
|
fsubr st0,st4
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [ROW(5,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [ROW(7,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [ROW(3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [ROW(1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
add edx, byte DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx ; advance pointer to next row
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(ebp)] ; (FAST_FLOAT *)
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
fld FAST_FLOAT [COL(1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [COL(6,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [COL(0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [COL(7,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [COL(3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [COL(4,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [COL(2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FAST_FLOAT [COL(5,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
fld st2 ; st2 = st2 + st1, st1 = st2 - st1
|
||||||
|
fsub st0,st2
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st3,st0
|
||||||
|
fld st3 ; st3 = st3 + st0, st0 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fadd st0,st1
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_707)]
|
||||||
|
|
||||||
|
fld st2 ; st3 = st2 + st3, st2 = st2 - st3
|
||||||
|
fsub st0,st4
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st4,st0
|
||||||
|
fld st1 ; st0 = st1 + st0, st1 = st1 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
fld FAST_FLOAT [COL(0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [COL(7,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
fld FAST_FLOAT [COL(3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [COL(4,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
fld FAST_FLOAT [COL(1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [COL(6,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
fld FAST_FLOAT [COL(2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fsub FAST_FLOAT [COL(5,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st4
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [COL(2,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(6,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(4,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
fadd st2,st0
|
||||||
|
fadd st0,st1
|
||||||
|
fxch st0,st3
|
||||||
|
fadd st1,st0
|
||||||
|
fxch st0,st3
|
||||||
|
|
||||||
|
fld st2
|
||||||
|
fxch st0,st1
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_707)]
|
||||||
|
fxch st0,st1
|
||||||
|
fsub st0,st2
|
||||||
|
fxch st0,st3
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_541)]
|
||||||
|
fxch st0,st3
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_0_382)]
|
||||||
|
fxch st0,st2
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_306)]
|
||||||
|
fxch st0,st2
|
||||||
|
fadd st3,st0
|
||||||
|
faddp st2,st0
|
||||||
|
|
||||||
|
fld st3 ; st3 = st3 + st0, st0 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fld st2 ; st0 = st0 + st2, st2 = st0 - st2
|
||||||
|
fsubr st0,st1
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st1,st0
|
||||||
|
fld st1 ; st3 = st3 + st1, st1 = st3 - st1
|
||||||
|
fsubr st0,st4
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [COL(5,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(7,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(3,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
add edx, byte SIZEOF_FAST_FLOAT ; advance pointer to next column
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
303
jfdctfst.asm
Normal file
303
jfdctfst.asm
Normal file
@@ -0,0 +1,303 @@
|
|||||||
|
;
|
||||||
|
; jfdctfst.asm - fast integer FDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a fast, not so accurate integer implementation of
|
||||||
|
; the forward DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jfdctfst.c; see the jfdctfst.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_IFAST_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
; We can gain a little more speed, with a further compromise in accuracy,
|
||||||
|
; by omitting the addition in a descaling shift. This yields an
|
||||||
|
; incorrectly rounded result half the time...
|
||||||
|
;
|
||||||
|
%macro descale 2
|
||||||
|
%ifdef USE_ACCURATE_ROUNDING
|
||||||
|
%if (%2)<=7
|
||||||
|
add %1, byte (1<<((%2)-1)) ; add reg32,imm8
|
||||||
|
%else
|
||||||
|
add %1, (1<<((%2)-1)) ; add reg32,imm32
|
||||||
|
%endif
|
||||||
|
%endif
|
||||||
|
sar %1,%2
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 8
|
||||||
|
|
||||||
|
%if CONST_BITS == 8
|
||||||
|
F_0_382 equ 98 ; FIX(0.382683433)
|
||||||
|
F_0_541 equ 139 ; FIX(0.541196100)
|
||||||
|
F_0_707 equ 181 ; FIX(0.707106781)
|
||||||
|
F_1_306 equ 334 ; FIX(1.306562965)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_382 equ DESCALE( 410903207,30-CONST_BITS) ; FIX(0.382683433)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_707 equ DESCALE( 759250124,30-CONST_BITS) ; FIX(0.707106781)
|
||||||
|
F_1_306 equ DESCALE(1402911301,30-CONST_BITS) ; FIX(1.306562965)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_ifast (DCTELEM * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; DCTELEM * data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_ifast)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_ifast):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
mov edx, POINTER [data(ebp)] ; (DCTELEM *)
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push ecx ; ctr
|
||||||
|
push edx ; dataptr
|
||||||
|
|
||||||
|
movsx eax, DCTELEM [ROW(0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx edi, DCTELEM [ROW(7,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea esi,[eax+edi] ; esi=tmp0
|
||||||
|
sub eax,edi ; eax=tmp7
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ebx, DCTELEM [ROW(1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [ROW(6,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edi,[ebx+ecx] ; edi=tmp1
|
||||||
|
sub ebx,ecx ; ebx=tmp6
|
||||||
|
push ebx
|
||||||
|
|
||||||
|
movsx eax, DCTELEM [ROW(2,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [ROW(5,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea ebx,[eax+ecx] ; ebx=tmp2
|
||||||
|
sub eax,ecx ; eax=tmp5
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ecx, DCTELEM [ROW(3,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx eax, DCTELEM [ROW(4,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edx,[ecx+eax] ; edx=tmp3
|
||||||
|
sub ecx,eax ; ecx=tmp4
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
lea eax,[esi+edx] ; eax=tmp10
|
||||||
|
lea ecx,[edi+ebx] ; ecx=tmp11
|
||||||
|
sub esi,edx ; esi=tmp13
|
||||||
|
sub edi,ebx ; edi=tmp12
|
||||||
|
|
||||||
|
mov edx, POINTER [esp+16] ; dataptr
|
||||||
|
|
||||||
|
add edi,esi
|
||||||
|
imul edi,(F_0_707) ; edi=z1
|
||||||
|
descale edi,CONST_BITS
|
||||||
|
|
||||||
|
lea ebx,[eax+ecx] ; ebx=data0
|
||||||
|
sub eax,ecx ; eax=data4
|
||||||
|
mov DCTELEM [ROW(0,edx,SIZEOF_DCTELEM)], bx
|
||||||
|
mov DCTELEM [ROW(4,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
|
||||||
|
lea ecx,[esi+edi] ; ecx=data2
|
||||||
|
sub esi,edi ; esi=data6
|
||||||
|
mov DCTELEM [ROW(2,edx,SIZEOF_DCTELEM)], cx
|
||||||
|
mov DCTELEM [ROW(6,edx,SIZEOF_DCTELEM)], si
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
pop eax ; eax=tmp4
|
||||||
|
pop edx ; edx=tmp5
|
||||||
|
pop ebx ; ebx=tmp6
|
||||||
|
pop edi ; edi=tmp7
|
||||||
|
|
||||||
|
add eax,edx ; eax=tmp10
|
||||||
|
add edx,ebx ; edx=tmp11
|
||||||
|
add ebx,edi ; ebx=tmp12, edi=tmp7
|
||||||
|
|
||||||
|
imul edx,(F_0_707) ; edx=z3
|
||||||
|
descale edx,CONST_BITS
|
||||||
|
lea esi,[edi+edx] ; esi=z11
|
||||||
|
sub edi,edx ; edi=z13
|
||||||
|
|
||||||
|
mov ecx,eax ; ecx=tmp10
|
||||||
|
sub eax,ebx
|
||||||
|
imul eax,(F_0_382) ; eax=z5
|
||||||
|
imul ecx,(F_0_541) ; ecx=MULTIPLY(tmp10,FIX_0_541196100)
|
||||||
|
imul ebx,(F_1_306) ; ebx=MULTIPLY(tmp12,FIX_1_306562965)
|
||||||
|
descale eax,CONST_BITS
|
||||||
|
descale ecx,CONST_BITS
|
||||||
|
descale ebx,CONST_BITS
|
||||||
|
add ecx,eax ; ecx=z2
|
||||||
|
add ebx,eax ; ebx=z4
|
||||||
|
|
||||||
|
pop edx ; dataptr
|
||||||
|
|
||||||
|
lea eax,[edi+ecx] ; eax=data5
|
||||||
|
sub edi,ecx ; edi=data3
|
||||||
|
mov DCTELEM [ROW(5,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
mov DCTELEM [ROW(3,edx,SIZEOF_DCTELEM)], di
|
||||||
|
|
||||||
|
lea ecx,[esi+ebx] ; ecx=data1
|
||||||
|
sub esi,ebx ; esi=data7
|
||||||
|
mov DCTELEM [ROW(1,edx,SIZEOF_DCTELEM)], cx
|
||||||
|
mov DCTELEM [ROW(7,edx,SIZEOF_DCTELEM)], si
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
add edx, byte DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx ; advance pointer to next row
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
mov edx, POINTER [data(ebp)] ; (DCTELEM *)
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
push ecx ; ctr
|
||||||
|
push edx ; dataptr
|
||||||
|
|
||||||
|
movsx eax, DCTELEM [COL(0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx edi, DCTELEM [COL(7,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea esi,[eax+edi] ; esi=tmp0
|
||||||
|
sub eax,edi ; eax=tmp7
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ebx, DCTELEM [COL(1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [COL(6,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edi,[ebx+ecx] ; edi=tmp1
|
||||||
|
sub ebx,ecx ; ebx=tmp6
|
||||||
|
push ebx
|
||||||
|
|
||||||
|
movsx eax, DCTELEM [COL(2,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [COL(5,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea ebx,[eax+ecx] ; ebx=tmp2
|
||||||
|
sub eax,ecx ; eax=tmp5
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ecx, DCTELEM [COL(3,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx eax, DCTELEM [COL(4,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edx,[ecx+eax] ; edx=tmp3
|
||||||
|
sub ecx,eax ; ecx=tmp4
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
lea eax,[esi+edx] ; eax=tmp10
|
||||||
|
lea ecx,[edi+ebx] ; ecx=tmp11
|
||||||
|
sub esi,edx ; esi=tmp13
|
||||||
|
sub edi,ebx ; edi=tmp12
|
||||||
|
|
||||||
|
mov edx, POINTER [esp+16] ; dataptr
|
||||||
|
|
||||||
|
add edi,esi
|
||||||
|
imul edi,(F_0_707) ; edi=z1
|
||||||
|
descale edi,CONST_BITS
|
||||||
|
|
||||||
|
lea ebx,[eax+ecx] ; ebx=data0
|
||||||
|
sub eax,ecx ; eax=data4
|
||||||
|
mov DCTELEM [COL(0,edx,SIZEOF_DCTELEM)], bx
|
||||||
|
mov DCTELEM [COL(4,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
|
||||||
|
lea ecx,[esi+edi] ; ecx=data2
|
||||||
|
sub esi,edi ; esi=data6
|
||||||
|
mov DCTELEM [COL(2,edx,SIZEOF_DCTELEM)], cx
|
||||||
|
mov DCTELEM [COL(6,edx,SIZEOF_DCTELEM)], si
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
pop eax ; eax=tmp4
|
||||||
|
pop edx ; edx=tmp5
|
||||||
|
pop ebx ; ebx=tmp6
|
||||||
|
pop edi ; edi=tmp7
|
||||||
|
|
||||||
|
add eax,edx ; eax=tmp10
|
||||||
|
add edx,ebx ; edx=tmp11
|
||||||
|
add ebx,edi ; ebx=tmp12, edi=tmp7
|
||||||
|
|
||||||
|
imul edx,(F_0_707) ; edx=z3
|
||||||
|
descale edx,CONST_BITS
|
||||||
|
lea esi,[edi+edx] ; esi=z11
|
||||||
|
sub edi,edx ; edi=z13
|
||||||
|
|
||||||
|
mov ecx,eax ; ecx=tmp10
|
||||||
|
sub eax,ebx
|
||||||
|
imul eax,(F_0_382) ; eax=z5
|
||||||
|
imul ecx,(F_0_541) ; ecx=MULTIPLY(tmp10,FIX_0_541196100)
|
||||||
|
imul ebx,(F_1_306) ; ebx=MULTIPLY(tmp12,FIX_1_306562965)
|
||||||
|
descale eax,CONST_BITS
|
||||||
|
descale ecx,CONST_BITS
|
||||||
|
descale ebx,CONST_BITS
|
||||||
|
add ecx,eax ; ecx=z2
|
||||||
|
add ebx,eax ; ebx=z4
|
||||||
|
|
||||||
|
pop edx ; dataptr
|
||||||
|
|
||||||
|
lea eax,[edi+ecx] ; eax=data5
|
||||||
|
sub edi,ecx ; edi=data3
|
||||||
|
mov DCTELEM [COL(5,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
mov DCTELEM [COL(3,edx,SIZEOF_DCTELEM)], di
|
||||||
|
|
||||||
|
lea ecx,[esi+ebx] ; ecx=data1
|
||||||
|
sub esi,ebx ; esi=data7
|
||||||
|
mov DCTELEM [COL(1,edx,SIZEOF_DCTELEM)], cx
|
||||||
|
mov DCTELEM [COL(7,edx,SIZEOF_DCTELEM)], si
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
add edx, byte SIZEOF_DCTELEM ; advance pointer to next column
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_IFAST_SUPPORTED
|
||||||
342
jfdctint.asm
Normal file
342
jfdctint.asm
Normal file
@@ -0,0 +1,342 @@
|
|||||||
|
;
|
||||||
|
; jfdctint.asm - accurate integer FDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a slow-but-accurate integer implementation of the
|
||||||
|
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
; Descale and correctly round a DWORD value that's scaled by N bits.
|
||||||
|
;
|
||||||
|
%macro descale 2
|
||||||
|
%if (%2)<=7
|
||||||
|
add %1, byte (1<<((%2)-1)) ; add reg32,imm8
|
||||||
|
%else
|
||||||
|
add %1, (1<<((%2)-1)) ; add reg32,imm32
|
||||||
|
%endif
|
||||||
|
sar %1,%2
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_298 equ 2446 ; FIX(0.298631336)
|
||||||
|
F_0_390 equ 3196 ; FIX(0.390180644)
|
||||||
|
F_0_541 equ 4433 ; FIX(0.541196100)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_175 equ 9633 ; FIX(1.175875602)
|
||||||
|
F_1_501 equ 12299 ; FIX(1.501321110)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_1_961 equ 16069 ; FIX(1.961570560)
|
||||||
|
F_2_053 equ 16819 ; FIX(2.053119869)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_072 equ 25172 ; FIX(3.072711026)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_298 equ DESCALE( 320652955,30-CONST_BITS) ; FIX(0.298631336)
|
||||||
|
F_0_390 equ DESCALE( 418953276,30-CONST_BITS) ; FIX(0.390180644)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_175 equ DESCALE(1262586813,30-CONST_BITS) ; FIX(1.175875602)
|
||||||
|
F_1_501 equ DESCALE(1612031267,30-CONST_BITS) ; FIX(1.501321110)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_1_961 equ DESCALE(2106220350,30-CONST_BITS) ; FIX(1.961570560)
|
||||||
|
F_2_053 equ DESCALE(2204520673,30-CONST_BITS) ; FIX(2.053119869)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_072 equ DESCALE(3299298341,30-CONST_BITS) ; FIX(3.072711026)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_islow (DCTELEM * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; DCTELEM * data
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_islow)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_islow):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(ebp)] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
movsx eax, DCTELEM [ROW(0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx edi, DCTELEM [ROW(7,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea esi,[eax+edi] ; esi=tmp0
|
||||||
|
sub eax,edi ; eax=tmp7
|
||||||
|
push ecx ; ctr
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ebx, DCTELEM [ROW(1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [ROW(6,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edi,[ebx+ecx] ; edi=tmp1
|
||||||
|
sub ebx,ecx ; ebx=tmp6
|
||||||
|
push ebx
|
||||||
|
|
||||||
|
movsx eax, DCTELEM [ROW(2,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [ROW(5,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea ebx,[eax+ecx] ; ebx=tmp2
|
||||||
|
sub eax,ecx ; eax=tmp5
|
||||||
|
push edx ; dataptr
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ecx, DCTELEM [ROW(3,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx eax, DCTELEM [ROW(4,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edx,[ecx+eax] ; edx=tmp3
|
||||||
|
sub ecx,eax ; ecx=tmp4
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
lea eax,[esi+edx] ; eax=tmp10
|
||||||
|
lea ecx,[edi+ebx] ; ecx=tmp11
|
||||||
|
sub esi,edx ; esi=tmp13
|
||||||
|
sub edi,ebx ; edi=tmp12
|
||||||
|
|
||||||
|
lea ebx,[eax+ecx] ; ebx=data0
|
||||||
|
sub eax,ecx ; eax=data4
|
||||||
|
mov edx, POINTER [esp+8] ; dataptr
|
||||||
|
sal ebx, PASS1_BITS
|
||||||
|
sal eax, PASS1_BITS
|
||||||
|
mov DCTELEM [ROW(0,edx,SIZEOF_DCTELEM)], bx
|
||||||
|
mov DCTELEM [ROW(4,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
|
||||||
|
lea ecx,[edi+esi]
|
||||||
|
imul ecx,(F_0_541) ; ecx=z1
|
||||||
|
imul esi,(F_0_765) ; esi=MULTIPLY(tmp13,FIX_0_765366865)
|
||||||
|
imul edi,(-F_1_847) ; edi=MULTIPLY(tmp12,-FIX_1_847759065)
|
||||||
|
add esi,ecx ; esi=data2
|
||||||
|
add edi,ecx ; edi=data6
|
||||||
|
descale esi,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale edi,(CONST_BITS-PASS1_BITS)
|
||||||
|
mov DCTELEM [ROW(2,edx,SIZEOF_DCTELEM)], si
|
||||||
|
mov DCTELEM [ROW(6,edx,SIZEOF_DCTELEM)], di
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov eax, INT32 [esp] ; eax=tmp4
|
||||||
|
mov ebx, INT32 [esp+4] ; ebx=tmp5
|
||||||
|
mov ecx, INT32 [esp+12] ; ecx=tmp6
|
||||||
|
mov esi, INT32 [esp+16] ; esi=tmp7
|
||||||
|
|
||||||
|
lea edx,[eax+ecx] ; edx=z3
|
||||||
|
lea edi,[ebx+esi] ; edi=z4
|
||||||
|
add eax,esi ; eax=z1
|
||||||
|
add ebx,ecx ; ebx=z2
|
||||||
|
|
||||||
|
lea esi,[edx+edi]
|
||||||
|
imul esi,(F_1_175) ; esi=z5
|
||||||
|
|
||||||
|
imul edx,(-F_1_961) ; edx=z3(=MULTIPLY(z3,-FIX_1_961570560))
|
||||||
|
imul edi,(-F_0_390) ; edi=z4(=MULTIPLY(z4,-FIX_0_390180644))
|
||||||
|
imul eax,(-F_0_899) ; eax=z1(=MULTIPLY(z1,-FIX_0_899976223))
|
||||||
|
imul ebx,(-F_2_562) ; ebx=z2(=MULTIPLY(z2,-FIX_2_562915447))
|
||||||
|
|
||||||
|
add edx,esi ; edx=z3(=z3+z5)
|
||||||
|
add edi,esi ; edi=z4(=z4+z5)
|
||||||
|
|
||||||
|
lea ecx,[eax+edx] ; ecx=z1+z3
|
||||||
|
lea esi,[ebx+edi] ; esi=z2+z4
|
||||||
|
add eax,edi ; eax=z1+z4
|
||||||
|
add ebx,edx ; ebx=z2+z3
|
||||||
|
|
||||||
|
pop edx ; edx=tmp4
|
||||||
|
pop edi ; edi=tmp5
|
||||||
|
imul edx,(F_0_298) ; edx=tmp4(=MULTIPLY(tmp4,FIX_0_298631336))
|
||||||
|
imul edi,(F_2_053) ; edi=tmp5(=MULTIPLY(tmp5,FIX_2_053119869))
|
||||||
|
add ecx,edx ; ecx=data7(=tmp4+z1+z3)
|
||||||
|
add esi,edi ; esi=data5(=tmp5+z2+z4)
|
||||||
|
pop edx ; dataptr
|
||||||
|
descale ecx,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale esi,(CONST_BITS-PASS1_BITS)
|
||||||
|
mov DCTELEM [ROW(7,edx,SIZEOF_DCTELEM)], cx
|
||||||
|
mov DCTELEM [ROW(5,edx,SIZEOF_DCTELEM)], si
|
||||||
|
|
||||||
|
pop edi ; edi=tmp6
|
||||||
|
pop ecx ; ecx=tmp7
|
||||||
|
imul edi,(F_3_072) ; edi=tmp6(=MULTIPLY(tmp6,FIX_3_072711026))
|
||||||
|
imul ecx,(F_1_501) ; ecx=tmp7(=MULTIPLY(tmp7,FIX_1_501321110))
|
||||||
|
add ebx,edi ; ebx=data3(=tmp6+z2+z3)
|
||||||
|
add eax,ecx ; eax=data1(=tmp7+z1+z4)
|
||||||
|
pop ecx ; ctr
|
||||||
|
descale ebx,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale eax,(CONST_BITS-PASS1_BITS)
|
||||||
|
mov DCTELEM [ROW(3,edx,SIZEOF_DCTELEM)], bx
|
||||||
|
mov DCTELEM [ROW(1,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
|
||||||
|
add edx, byte DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx ; advance pointer to next row
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(ebp)] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
movsx eax, DCTELEM [COL(0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx edi, DCTELEM [COL(7,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea esi,[eax+edi] ; esi=tmp0
|
||||||
|
sub eax,edi ; eax=tmp7
|
||||||
|
push ecx ; ctr
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ebx, DCTELEM [COL(1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [COL(6,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edi,[ebx+ecx] ; edi=tmp1
|
||||||
|
sub ebx,ecx ; ebx=tmp6
|
||||||
|
push ebx
|
||||||
|
|
||||||
|
movsx eax, DCTELEM [COL(2,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx ecx, DCTELEM [COL(5,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea ebx,[eax+ecx] ; ebx=tmp2
|
||||||
|
sub eax,ecx ; eax=tmp5
|
||||||
|
push edx ; dataptr
|
||||||
|
push eax
|
||||||
|
|
||||||
|
movsx ecx, DCTELEM [COL(3,edx,SIZEOF_DCTELEM)]
|
||||||
|
movsx eax, DCTELEM [COL(4,edx,SIZEOF_DCTELEM)]
|
||||||
|
lea edx,[ecx+eax] ; edx=tmp3
|
||||||
|
sub ecx,eax ; ecx=tmp4
|
||||||
|
push ecx
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
lea eax,[esi+edx] ; eax=tmp10
|
||||||
|
lea ecx,[edi+ebx] ; ecx=tmp11
|
||||||
|
sub esi,edx ; esi=tmp13
|
||||||
|
sub edi,ebx ; edi=tmp12
|
||||||
|
|
||||||
|
lea ebx,[eax+ecx] ; ebx=data0
|
||||||
|
sub eax,ecx ; eax=data4
|
||||||
|
mov edx, POINTER [esp+8] ; dataptr
|
||||||
|
descale ebx, PASS1_BITS
|
||||||
|
descale eax, PASS1_BITS
|
||||||
|
mov DCTELEM [COL(0,edx,SIZEOF_DCTELEM)], bx
|
||||||
|
mov DCTELEM [COL(4,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
|
||||||
|
lea ecx,[edi+esi]
|
||||||
|
imul ecx,(F_0_541) ; ecx=z1
|
||||||
|
imul esi,(F_0_765) ; esi=MULTIPLY(tmp13,FIX_0_765366865)
|
||||||
|
imul edi,(-F_1_847) ; edi=MULTIPLY(tmp12,-FIX_1_847759065)
|
||||||
|
add esi,ecx ; esi=data2
|
||||||
|
add edi,ecx ; edi=data6
|
||||||
|
descale esi,(CONST_BITS+PASS1_BITS)
|
||||||
|
descale edi,(CONST_BITS+PASS1_BITS)
|
||||||
|
mov DCTELEM [COL(2,edx,SIZEOF_DCTELEM)], si
|
||||||
|
mov DCTELEM [COL(6,edx,SIZEOF_DCTELEM)], di
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov eax, INT32 [esp] ; eax=tmp4
|
||||||
|
mov ebx, INT32 [esp+4] ; ebx=tmp5
|
||||||
|
mov ecx, INT32 [esp+12] ; ecx=tmp6
|
||||||
|
mov esi, INT32 [esp+16] ; esi=tmp7
|
||||||
|
|
||||||
|
lea edx,[eax+ecx] ; edx=z3
|
||||||
|
lea edi,[ebx+esi] ; edi=z4
|
||||||
|
add eax,esi ; eax=z1
|
||||||
|
add ebx,ecx ; ebx=z2
|
||||||
|
|
||||||
|
lea esi,[edx+edi]
|
||||||
|
imul esi,(F_1_175) ; esi=z5
|
||||||
|
|
||||||
|
imul edx,(-F_1_961) ; edx=z3(=MULTIPLY(z3,-FIX_1_961570560))
|
||||||
|
imul edi,(-F_0_390) ; edi=z4(=MULTIPLY(z4,-FIX_0_390180644))
|
||||||
|
imul eax,(-F_0_899) ; eax=z1(=MULTIPLY(z1,-FIX_0_899976223))
|
||||||
|
imul ebx,(-F_2_562) ; ebx=z2(=MULTIPLY(z2,-FIX_2_562915447))
|
||||||
|
|
||||||
|
add edx,esi ; edx=z3(=z3+z5)
|
||||||
|
add edi,esi ; edi=z4(=z4+z5)
|
||||||
|
|
||||||
|
lea ecx,[eax+edx] ; ecx=z1+z3
|
||||||
|
lea esi,[ebx+edi] ; esi=z2+z4
|
||||||
|
add eax,edi ; eax=z1+z4
|
||||||
|
add ebx,edx ; ebx=z2+z3
|
||||||
|
|
||||||
|
pop edx ; edx=tmp4
|
||||||
|
pop edi ; edi=tmp5
|
||||||
|
imul edx,(F_0_298) ; edx=tmp4(=MULTIPLY(tmp4,FIX_0_298631336))
|
||||||
|
imul edi,(F_2_053) ; edi=tmp5(=MULTIPLY(tmp5,FIX_2_053119869))
|
||||||
|
add ecx,edx ; ecx=data7(=tmp4+z1+z3)
|
||||||
|
add esi,edi ; esi=data5(=tmp5+z2+z4)
|
||||||
|
pop edx ; dataptr
|
||||||
|
descale ecx,(CONST_BITS+PASS1_BITS)
|
||||||
|
descale esi,(CONST_BITS+PASS1_BITS)
|
||||||
|
mov DCTELEM [COL(7,edx,SIZEOF_DCTELEM)], cx
|
||||||
|
mov DCTELEM [COL(5,edx,SIZEOF_DCTELEM)], si
|
||||||
|
|
||||||
|
pop edi ; edi=tmp6
|
||||||
|
pop ecx ; ecx=tmp7
|
||||||
|
imul edi,(F_3_072) ; edi=tmp6(=MULTIPLY(tmp6,FIX_3_072711026))
|
||||||
|
imul ecx,(F_1_501) ; ecx=tmp7(=MULTIPLY(tmp7,FIX_1_501321110))
|
||||||
|
add ebx,edi ; ebx=data3(=tmp6+z2+z3)
|
||||||
|
add eax,ecx ; eax=data1(=tmp7+z1+z4)
|
||||||
|
pop ecx ; ctr
|
||||||
|
descale ebx,(CONST_BITS+PASS1_BITS)
|
||||||
|
descale eax,(CONST_BITS+PASS1_BITS)
|
||||||
|
mov DCTELEM [COL(3,edx,SIZEOF_DCTELEM)], bx
|
||||||
|
mov DCTELEM [COL(1,edx,SIZEOF_DCTELEM)], ax
|
||||||
|
|
||||||
|
add edx, byte SIZEOF_DCTELEM ; advance pointer to next column
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_ISLOW_SUPPORTED
|
||||||
404
jfmmxfst.asm
Normal file
404
jfmmxfst.asm
Normal file
@@ -0,0 +1,404 @@
|
|||||||
|
;
|
||||||
|
; jfmmxfst.asm - fast integer FDCT (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a fast, not so accurate integer implementation of
|
||||||
|
; the forward DCT (Discrete Cosine Transform). The following code is
|
||||||
|
; based directly on the IJG's original jfdctfst.c; see the jfdctfst.c
|
||||||
|
; for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_IFAST_SUPPORTED
|
||||||
|
%ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 8 ; 14 is also OK.
|
||||||
|
|
||||||
|
%if CONST_BITS == 8
|
||||||
|
F_0_382 equ 98 ; FIX(0.382683433)
|
||||||
|
F_0_541 equ 139 ; FIX(0.541196100)
|
||||||
|
F_0_707 equ 181 ; FIX(0.707106781)
|
||||||
|
F_1_306 equ 334 ; FIX(1.306562965)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_382 equ DESCALE( 410903207,30-CONST_BITS) ; FIX(0.382683433)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_707 equ DESCALE( 759250124,30-CONST_BITS) ; FIX(0.707106781)
|
||||||
|
F_1_306 equ DESCALE(1402911301,30-CONST_BITS) ; FIX(1.306562965)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
; PRE_MULTIPLY_SCALE_BITS <= 2 (to avoid overflow)
|
||||||
|
; CONST_BITS + CONST_SHIFT + PRE_MULTIPLY_SCALE_BITS == 16 (for pmulhw)
|
||||||
|
|
||||||
|
%define PRE_MULTIPLY_SCALE_BITS 2
|
||||||
|
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_ifast_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_ifast_mmx):
|
||||||
|
|
||||||
|
PW_F0707 times 4 dw F_0_707 << CONST_SHIFT
|
||||||
|
PW_F0382 times 4 dw F_0_382 << CONST_SHIFT
|
||||||
|
PW_F0541 times 4 dw F_0_541 << CONST_SHIFT
|
||||||
|
PW_F1306 times 4 dw F_1_306 << CONST_SHIFT
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_ifast_mmx (DCTELEM * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; DCTELEM * data
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_ifast_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_ifast_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(2,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(3,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm0=(20 21 22 23), mm2=(24 25 26 27)
|
||||||
|
; mm1=(30 31 32 33), mm3=(34 35 36 37)
|
||||||
|
|
||||||
|
movq mm4,mm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm0,mm1 ; mm0=(20 30 21 31)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(22 32 23 33)
|
||||||
|
movq mm5,mm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm2,mm3 ; mm2=(24 34 25 35)
|
||||||
|
punpckhwd mm5,mm3 ; mm5=(26 36 27 37)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm7, MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(0,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(1,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm6=(00 01 02 03), mm1=(04 05 06 07)
|
||||||
|
; mm7=(10 11 12 13), mm3=(14 15 16 17)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=(22 32 23 33)
|
||||||
|
movq MMWORD [wk(1)], mm2 ; wk(1)=(24 34 25 35)
|
||||||
|
|
||||||
|
movq mm4,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm6,mm7 ; mm6=(00 10 01 11)
|
||||||
|
punpckhwd mm4,mm7 ; mm4=(02 12 03 13)
|
||||||
|
movq mm2,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm1,mm3 ; mm1=(04 14 05 15)
|
||||||
|
punpckhwd mm2,mm3 ; mm2=(06 16 07 17)
|
||||||
|
|
||||||
|
movq mm7,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm0 ; mm6=(00 10 20 30)=data0
|
||||||
|
punpckhdq mm7,mm0 ; mm7=(01 11 21 31)=data1
|
||||||
|
movq mm3,mm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm2,mm5 ; mm2=(06 16 26 36)=data6
|
||||||
|
punpckhdq mm3,mm5 ; mm3=(07 17 27 37)=data7
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
movq mm5,mm6
|
||||||
|
psubw mm7,mm2 ; mm7=data1-data6=tmp6
|
||||||
|
psubw mm6,mm3 ; mm6=data0-data7=tmp7
|
||||||
|
paddw mm0,mm2 ; mm0=data1+data6=tmp1
|
||||||
|
paddw mm5,mm3 ; mm5=data0+data7=tmp0
|
||||||
|
|
||||||
|
movq mm2, MMWORD [wk(0)] ; mm2=(22 32 23 33)
|
||||||
|
movq mm3, MMWORD [wk(1)] ; mm3=(24 34 25 35)
|
||||||
|
movq MMWORD [wk(0)], mm7 ; wk(0)=tmp6
|
||||||
|
movq MMWORD [wk(1)], mm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movq mm7,mm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm4,mm2 ; mm4=(02 12 22 32)=data2
|
||||||
|
punpckhdq mm7,mm2 ; mm7=(03 13 23 33)=data3
|
||||||
|
movq mm6,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm1,mm3 ; mm1=(04 14 24 34)=data4
|
||||||
|
punpckhdq mm6,mm3 ; mm6=(05 15 25 35)=data5
|
||||||
|
|
||||||
|
movq mm2,mm7
|
||||||
|
movq mm3,mm4
|
||||||
|
paddw mm7,mm1 ; mm7=data3+data4=tmp3
|
||||||
|
paddw mm4,mm6 ; mm4=data2+data5=tmp2
|
||||||
|
psubw mm2,mm1 ; mm2=data3-data4=tmp4
|
||||||
|
psubw mm3,mm6 ; mm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm1,mm5
|
||||||
|
movq mm6,mm0
|
||||||
|
psubw mm5,mm7 ; mm5=tmp13
|
||||||
|
psubw mm0,mm4 ; mm0=tmp12
|
||||||
|
paddw mm1,mm7 ; mm1=tmp10
|
||||||
|
paddw mm6,mm4 ; mm6=tmp11
|
||||||
|
|
||||||
|
paddw mm0,mm5
|
||||||
|
psllw mm0,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm0,[GOTOFF(ebx,PW_F0707)] ; mm0=z1
|
||||||
|
|
||||||
|
movq mm7,mm1
|
||||||
|
movq mm4,mm5
|
||||||
|
psubw mm1,mm6 ; mm1=data4
|
||||||
|
psubw mm5,mm0 ; mm5=data6
|
||||||
|
paddw mm7,mm6 ; mm7=data0
|
||||||
|
paddw mm4,mm0 ; mm4=data2
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edx,SIZEOF_DCTELEM)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edx,SIZEOF_DCTELEM)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm6, MMWORD [wk(0)] ; mm6=tmp6
|
||||||
|
movq mm0, MMWORD [wk(1)] ; mm0=tmp7
|
||||||
|
|
||||||
|
paddw mm2,mm3 ; mm2=tmp10
|
||||||
|
paddw mm3,mm6 ; mm3=tmp11
|
||||||
|
paddw mm6,mm0 ; mm6=tmp12, mm0=tmp7
|
||||||
|
|
||||||
|
psllw mm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw mm6,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
psllw mm3,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm3,[GOTOFF(ebx,PW_F0707)] ; mm3=z3
|
||||||
|
|
||||||
|
movq mm1,mm2 ; mm1=tmp10
|
||||||
|
psubw mm2,mm6
|
||||||
|
pmulhw mm2,[GOTOFF(ebx,PW_F0382)] ; mm2=z5
|
||||||
|
pmulhw mm1,[GOTOFF(ebx,PW_F0541)] ; mm1=MULTIPLY(tmp10,FIX_0_54119610)
|
||||||
|
pmulhw mm6,[GOTOFF(ebx,PW_F1306)] ; mm6=MULTIPLY(tmp12,FIX_1_30656296)
|
||||||
|
paddw mm1,mm2 ; mm1=z2
|
||||||
|
paddw mm6,mm2 ; mm6=z4
|
||||||
|
|
||||||
|
movq mm5,mm0
|
||||||
|
psubw mm0,mm3 ; mm0=z13
|
||||||
|
paddw mm5,mm3 ; mm5=z11
|
||||||
|
|
||||||
|
movq mm7,mm0
|
||||||
|
movq mm4,mm5
|
||||||
|
psubw mm0,mm1 ; mm0=data3
|
||||||
|
psubw mm5,mm6 ; mm5=data7
|
||||||
|
paddw mm7,mm1 ; mm7=data5
|
||||||
|
paddw mm4,mm6 ; mm4=data1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edx,SIZEOF_DCTELEM)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edx,SIZEOF_DCTELEM)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
|
||||||
|
add edx, byte 4*DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(6,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm0=(02 12 22 32), mm2=(42 52 62 72)
|
||||||
|
; mm1=(03 13 23 33), mm3=(43 53 63 73)
|
||||||
|
|
||||||
|
movq mm4,mm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm0,mm1 ; mm0=(02 03 12 13)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(22 23 32 33)
|
||||||
|
movq mm5,mm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm2,mm3 ; mm2=(42 43 52 53)
|
||||||
|
punpckhwd mm5,mm3 ; mm5=(62 63 72 73)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm7, MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(4,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(5,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm6=(00 10 20 30), mm1=(40 50 60 70)
|
||||||
|
; mm7=(01 11 21 31), mm3=(41 51 61 71)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=(22 23 32 33)
|
||||||
|
movq MMWORD [wk(1)], mm2 ; wk(1)=(42 43 52 53)
|
||||||
|
|
||||||
|
movq mm4,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm6,mm7 ; mm6=(00 01 10 11)
|
||||||
|
punpckhwd mm4,mm7 ; mm4=(20 21 30 31)
|
||||||
|
movq mm2,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm1,mm3 ; mm1=(40 41 50 51)
|
||||||
|
punpckhwd mm2,mm3 ; mm2=(60 61 70 71)
|
||||||
|
|
||||||
|
movq mm7,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm0 ; mm6=(00 01 02 03)=data0
|
||||||
|
punpckhdq mm7,mm0 ; mm7=(10 11 12 13)=data1
|
||||||
|
movq mm3,mm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm2,mm5 ; mm2=(60 61 62 63)=data6
|
||||||
|
punpckhdq mm3,mm5 ; mm3=(70 71 72 73)=data7
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
movq mm5,mm6
|
||||||
|
psubw mm7,mm2 ; mm7=data1-data6=tmp6
|
||||||
|
psubw mm6,mm3 ; mm6=data0-data7=tmp7
|
||||||
|
paddw mm0,mm2 ; mm0=data1+data6=tmp1
|
||||||
|
paddw mm5,mm3 ; mm5=data0+data7=tmp0
|
||||||
|
|
||||||
|
movq mm2, MMWORD [wk(0)] ; mm2=(22 23 32 33)
|
||||||
|
movq mm3, MMWORD [wk(1)] ; mm3=(42 43 52 53)
|
||||||
|
movq MMWORD [wk(0)], mm7 ; wk(0)=tmp6
|
||||||
|
movq MMWORD [wk(1)], mm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movq mm7,mm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm4,mm2 ; mm4=(20 21 22 23)=data2
|
||||||
|
punpckhdq mm7,mm2 ; mm7=(30 31 32 33)=data3
|
||||||
|
movq mm6,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm1,mm3 ; mm1=(40 41 42 43)=data4
|
||||||
|
punpckhdq mm6,mm3 ; mm6=(50 51 52 53)=data5
|
||||||
|
|
||||||
|
movq mm2,mm7
|
||||||
|
movq mm3,mm4
|
||||||
|
paddw mm7,mm1 ; mm7=data3+data4=tmp3
|
||||||
|
paddw mm4,mm6 ; mm4=data2+data5=tmp2
|
||||||
|
psubw mm2,mm1 ; mm2=data3-data4=tmp4
|
||||||
|
psubw mm3,mm6 ; mm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm1,mm5
|
||||||
|
movq mm6,mm0
|
||||||
|
psubw mm5,mm7 ; mm5=tmp13
|
||||||
|
psubw mm0,mm4 ; mm0=tmp12
|
||||||
|
paddw mm1,mm7 ; mm1=tmp10
|
||||||
|
paddw mm6,mm4 ; mm6=tmp11
|
||||||
|
|
||||||
|
paddw mm0,mm5
|
||||||
|
psllw mm0,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm0,[GOTOFF(ebx,PW_F0707)] ; mm0=z1
|
||||||
|
|
||||||
|
movq mm7,mm1
|
||||||
|
movq mm4,mm5
|
||||||
|
psubw mm1,mm6 ; mm1=data4
|
||||||
|
psubw mm5,mm0 ; mm5=data6
|
||||||
|
paddw mm7,mm6 ; mm7=data0
|
||||||
|
paddw mm4,mm0 ; mm4=data2
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(4,0,edx,SIZEOF_DCTELEM)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(6,0,edx,SIZEOF_DCTELEM)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm6, MMWORD [wk(0)] ; mm6=tmp6
|
||||||
|
movq mm0, MMWORD [wk(1)] ; mm0=tmp7
|
||||||
|
|
||||||
|
paddw mm2,mm3 ; mm2=tmp10
|
||||||
|
paddw mm3,mm6 ; mm3=tmp11
|
||||||
|
paddw mm6,mm0 ; mm6=tmp12, mm0=tmp7
|
||||||
|
|
||||||
|
psllw mm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw mm6,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
psllw mm3,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm3,[GOTOFF(ebx,PW_F0707)] ; mm3=z3
|
||||||
|
|
||||||
|
movq mm1,mm2 ; mm1=tmp10
|
||||||
|
psubw mm2,mm6
|
||||||
|
pmulhw mm2,[GOTOFF(ebx,PW_F0382)] ; mm2=z5
|
||||||
|
pmulhw mm1,[GOTOFF(ebx,PW_F0541)] ; mm1=MULTIPLY(tmp10,FIX_0_54119610)
|
||||||
|
pmulhw mm6,[GOTOFF(ebx,PW_F1306)] ; mm6=MULTIPLY(tmp12,FIX_1_30656296)
|
||||||
|
paddw mm1,mm2 ; mm1=z2
|
||||||
|
paddw mm6,mm2 ; mm6=z4
|
||||||
|
|
||||||
|
movq mm5,mm0
|
||||||
|
psubw mm0,mm3 ; mm0=z13
|
||||||
|
paddw mm5,mm3 ; mm5=z11
|
||||||
|
|
||||||
|
movq mm7,mm0
|
||||||
|
movq mm4,mm5
|
||||||
|
psubw mm0,mm1 ; mm0=data3
|
||||||
|
psubw mm5,mm6 ; mm5=data7
|
||||||
|
paddw mm7,mm1 ; mm7=data5
|
||||||
|
paddw mm4,mm6 ; mm4=data1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(7,0,edx,SIZEOF_DCTELEM)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(5,0,edx,SIZEOF_DCTELEM)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
|
||||||
|
add edx, byte 4*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_INT_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_IFAST_SUPPORTED
|
||||||
629
jfmmxint.asm
Normal file
629
jfmmxint.asm
Normal file
@@ -0,0 +1,629 @@
|
|||||||
|
;
|
||||||
|
; jfmmxint.asm - accurate integer FDCT (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a slow-but-accurate integer implementation of the
|
||||||
|
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
%ifdef JFDCT_INT_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%define DESCALE_P1 (CONST_BITS-PASS1_BITS)
|
||||||
|
%define DESCALE_P2 (CONST_BITS+PASS1_BITS)
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_298 equ 2446 ; FIX(0.298631336)
|
||||||
|
F_0_390 equ 3196 ; FIX(0.390180644)
|
||||||
|
F_0_541 equ 4433 ; FIX(0.541196100)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_175 equ 9633 ; FIX(1.175875602)
|
||||||
|
F_1_501 equ 12299 ; FIX(1.501321110)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_1_961 equ 16069 ; FIX(1.961570560)
|
||||||
|
F_2_053 equ 16819 ; FIX(2.053119869)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_072 equ 25172 ; FIX(3.072711026)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_298 equ DESCALE( 320652955,30-CONST_BITS) ; FIX(0.298631336)
|
||||||
|
F_0_390 equ DESCALE( 418953276,30-CONST_BITS) ; FIX(0.390180644)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_175 equ DESCALE(1262586813,30-CONST_BITS) ; FIX(1.175875602)
|
||||||
|
F_1_501 equ DESCALE(1612031267,30-CONST_BITS) ; FIX(1.501321110)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_1_961 equ DESCALE(2106220350,30-CONST_BITS) ; FIX(1.961570560)
|
||||||
|
F_2_053 equ DESCALE(2204520673,30-CONST_BITS) ; FIX(2.053119869)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_072 equ DESCALE(3299298341,30-CONST_BITS) ; FIX(3.072711026)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_islow_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_islow_mmx):
|
||||||
|
|
||||||
|
PW_F130_F054 times 2 dw (F_0_541+F_0_765), F_0_541
|
||||||
|
PW_F054_MF130 times 2 dw F_0_541, (F_0_541-F_1_847)
|
||||||
|
PW_MF078_F117 times 2 dw (F_1_175-F_1_961), F_1_175
|
||||||
|
PW_F117_F078 times 2 dw F_1_175, (F_1_175-F_0_390)
|
||||||
|
PW_MF060_MF089 times 2 dw (F_0_298-F_0_899),-F_0_899
|
||||||
|
PW_MF089_F060 times 2 dw -F_0_899, (F_1_501-F_0_899)
|
||||||
|
PW_MF050_MF256 times 2 dw (F_2_053-F_2_562),-F_2_562
|
||||||
|
PW_MF256_F050 times 2 dw -F_2_562, (F_3_072-F_2_562)
|
||||||
|
PD_DESCALE_P1 times 2 dd 1 << (DESCALE_P1-1)
|
||||||
|
PD_DESCALE_P2 times 2 dd 1 << (DESCALE_P2-1)
|
||||||
|
PW_DESCALE_P2X times 4 dw 1 << (PASS1_BITS-1)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_islow_mmx (DCTELEM * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; DCTELEM * data
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_islow_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_islow_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(2,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(3,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm0=(20 21 22 23), mm2=(24 25 26 27)
|
||||||
|
; mm1=(30 31 32 33), mm3=(34 35 36 37)
|
||||||
|
|
||||||
|
movq mm4,mm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm0,mm1 ; mm0=(20 30 21 31)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(22 32 23 33)
|
||||||
|
movq mm5,mm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm2,mm3 ; mm2=(24 34 25 35)
|
||||||
|
punpckhwd mm5,mm3 ; mm5=(26 36 27 37)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm7, MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(0,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(1,1,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm6=(00 01 02 03), mm1=(04 05 06 07)
|
||||||
|
; mm7=(10 11 12 13), mm3=(14 15 16 17)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=(22 32 23 33)
|
||||||
|
movq MMWORD [wk(1)], mm2 ; wk(1)=(24 34 25 35)
|
||||||
|
|
||||||
|
movq mm4,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm6,mm7 ; mm6=(00 10 01 11)
|
||||||
|
punpckhwd mm4,mm7 ; mm4=(02 12 03 13)
|
||||||
|
movq mm2,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm1,mm3 ; mm1=(04 14 05 15)
|
||||||
|
punpckhwd mm2,mm3 ; mm2=(06 16 07 17)
|
||||||
|
|
||||||
|
movq mm7,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm0 ; mm6=(00 10 20 30)=data0
|
||||||
|
punpckhdq mm7,mm0 ; mm7=(01 11 21 31)=data1
|
||||||
|
movq mm3,mm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm2,mm5 ; mm2=(06 16 26 36)=data6
|
||||||
|
punpckhdq mm3,mm5 ; mm3=(07 17 27 37)=data7
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
movq mm5,mm6
|
||||||
|
psubw mm7,mm2 ; mm7=data1-data6=tmp6
|
||||||
|
psubw mm6,mm3 ; mm6=data0-data7=tmp7
|
||||||
|
paddw mm0,mm2 ; mm0=data1+data6=tmp1
|
||||||
|
paddw mm5,mm3 ; mm5=data0+data7=tmp0
|
||||||
|
|
||||||
|
movq mm2, MMWORD [wk(0)] ; mm2=(22 32 23 33)
|
||||||
|
movq mm3, MMWORD [wk(1)] ; mm3=(24 34 25 35)
|
||||||
|
movq MMWORD [wk(0)], mm7 ; wk(0)=tmp6
|
||||||
|
movq MMWORD [wk(1)], mm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movq mm7,mm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm4,mm2 ; mm4=(02 12 22 32)=data2
|
||||||
|
punpckhdq mm7,mm2 ; mm7=(03 13 23 33)=data3
|
||||||
|
movq mm6,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm1,mm3 ; mm1=(04 14 24 34)=data4
|
||||||
|
punpckhdq mm6,mm3 ; mm6=(05 15 25 35)=data5
|
||||||
|
|
||||||
|
movq mm2,mm7
|
||||||
|
movq mm3,mm4
|
||||||
|
paddw mm7,mm1 ; mm7=data3+data4=tmp3
|
||||||
|
paddw mm4,mm6 ; mm4=data2+data5=tmp2
|
||||||
|
psubw mm2,mm1 ; mm2=data3-data4=tmp4
|
||||||
|
psubw mm3,mm6 ; mm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm1,mm5
|
||||||
|
movq mm6,mm0
|
||||||
|
paddw mm5,mm7 ; mm5=tmp10
|
||||||
|
paddw mm0,mm4 ; mm0=tmp11
|
||||||
|
psubw mm1,mm7 ; mm1=tmp13
|
||||||
|
psubw mm6,mm4 ; mm6=tmp12
|
||||||
|
|
||||||
|
movq mm7,mm5
|
||||||
|
paddw mm5,mm0 ; mm5=tmp10+tmp11
|
||||||
|
psubw mm7,mm0 ; mm7=tmp10-tmp11
|
||||||
|
|
||||||
|
psllw mm5,PASS1_BITS ; mm5=data0
|
||||||
|
psllw mm7,PASS1_BITS ; mm7=data4
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edx,SIZEOF_DCTELEM)], mm7
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (tmp12 + tmp13) * 0.541196100;
|
||||||
|
; data2 = z1 + tmp13 * 0.765366865;
|
||||||
|
; data6 = z1 + tmp12 * -1.847759065;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; data2 = tmp13 * (0.541196100 + 0.765366865) + tmp12 * 0.541196100;
|
||||||
|
; data6 = tmp13 * 0.541196100 + tmp12 * (0.541196100 - 1.847759065);
|
||||||
|
|
||||||
|
movq mm4,mm1 ; mm1=tmp13
|
||||||
|
movq mm0,mm1
|
||||||
|
punpcklwd mm4,mm6 ; mm6=tmp12
|
||||||
|
punpckhwd mm0,mm6
|
||||||
|
movq mm1,mm4
|
||||||
|
movq mm6,mm0
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F130_F054)] ; mm4=data2L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F130_F054)] ; mm0=data2H
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_F054_MF130)] ; mm1=data6L
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_F054_MF130)] ; mm6=data6H
|
||||||
|
|
||||||
|
paddd mm4,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd mm0,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad mm4,DESCALE_P1
|
||||||
|
psrad mm0,DESCALE_P1
|
||||||
|
paddd mm1,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd mm6,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad mm1,DESCALE_P1
|
||||||
|
psrad mm6,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm4,mm0 ; mm4=data2
|
||||||
|
packssdw mm1,mm6 ; mm1=data6
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edx,SIZEOF_DCTELEM)], mm1
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm5, MMWORD [wk(0)] ; mm5=tmp6
|
||||||
|
movq mm7, MMWORD [wk(1)] ; mm7=tmp7
|
||||||
|
|
||||||
|
movq mm0,mm2 ; mm2=tmp4
|
||||||
|
movq mm6,mm3 ; mm3=tmp5
|
||||||
|
paddw mm0,mm5 ; mm0=z3
|
||||||
|
paddw mm6,mm7 ; mm6=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm1,mm0
|
||||||
|
punpcklwd mm4,mm6
|
||||||
|
punpckhwd mm1,mm6
|
||||||
|
movq mm0,mm4
|
||||||
|
movq mm6,mm1
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_MF078_F117)] ; mm4=z3L
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF078_F117)] ; mm1=z3H
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F117_F078)] ; mm0=z4L
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_F117_F078)] ; mm6=z4H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=z3L
|
||||||
|
movq MMWORD [wk(1)], mm1 ; wk(1)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp4 + tmp7; z2 = tmp5 + tmp6;
|
||||||
|
; tmp4 = tmp4 * 0.298631336; tmp5 = tmp5 * 2.053119869;
|
||||||
|
; tmp6 = tmp6 * 3.072711026; tmp7 = tmp7 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; data7 = tmp4 + z1 + z3; data5 = tmp5 + z2 + z4;
|
||||||
|
; data3 = tmp6 + z2 + z3; data1 = tmp7 + z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp4 = tmp4 * (0.298631336 - 0.899976223) + tmp7 * -0.899976223;
|
||||||
|
; tmp5 = tmp5 * (2.053119869 - 2.562915447) + tmp6 * -2.562915447;
|
||||||
|
; tmp6 = tmp5 * -2.562915447 + tmp6 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp7 = tmp4 * -0.899976223 + tmp7 * (1.501321110 - 0.899976223);
|
||||||
|
; data7 = tmp4 + z3; data5 = tmp5 + z4;
|
||||||
|
; data3 = tmp6 + z3; data1 = tmp7 + z4;
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm1,mm2
|
||||||
|
punpcklwd mm4,mm7
|
||||||
|
punpckhwd mm1,mm7
|
||||||
|
movq mm2,mm4
|
||||||
|
movq mm7,mm1
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_MF060_MF089)] ; mm4=tmp4L
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF060_MF089)] ; mm1=tmp4H
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF089_F060)] ; mm2=tmp7L
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_MF089_F060)] ; mm7=tmp7H
|
||||||
|
|
||||||
|
paddd mm4, MMWORD [wk(0)] ; mm4=data7L
|
||||||
|
paddd mm1, MMWORD [wk(1)] ; mm1=data7H
|
||||||
|
paddd mm2,mm0 ; mm2=data1L
|
||||||
|
paddd mm7,mm6 ; mm7=data1H
|
||||||
|
|
||||||
|
paddd mm4,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd mm1,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad mm4,DESCALE_P1
|
||||||
|
psrad mm1,DESCALE_P1
|
||||||
|
paddd mm2,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd mm7,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad mm2,DESCALE_P1
|
||||||
|
psrad mm7,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm4,mm1 ; mm4=data7
|
||||||
|
packssdw mm2,mm7 ; mm2=data1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)], mm2
|
||||||
|
|
||||||
|
movq mm1,mm3
|
||||||
|
movq mm7,mm3
|
||||||
|
punpcklwd mm1,mm5
|
||||||
|
punpckhwd mm7,mm5
|
||||||
|
movq mm3,mm1
|
||||||
|
movq mm5,mm7
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF050_MF256)] ; mm1=tmp5L
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_MF050_MF256)] ; mm7=tmp5H
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_MF256_F050)] ; mm3=tmp6L
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_MF256_F050)] ; mm5=tmp6H
|
||||||
|
|
||||||
|
paddd mm1,mm0 ; mm1=data5L
|
||||||
|
paddd mm7,mm6 ; mm7=data5H
|
||||||
|
paddd mm3, MMWORD [wk(0)] ; mm3=data3L
|
||||||
|
paddd mm5, MMWORD [wk(1)] ; mm5=data3H
|
||||||
|
|
||||||
|
paddd mm1,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd mm7,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad mm1,DESCALE_P1
|
||||||
|
psrad mm7,DESCALE_P1
|
||||||
|
paddd mm3,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd mm5,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad mm3,DESCALE_P1
|
||||||
|
psrad mm5,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm1,mm7 ; mm1=data5
|
||||||
|
packssdw mm3,mm5 ; mm3=data3
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edx,SIZEOF_DCTELEM)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)], mm3
|
||||||
|
|
||||||
|
add edx, byte 4*DCTSIZE*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(6,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm0=(02 12 22 32), mm2=(42 52 62 72)
|
||||||
|
; mm1=(03 13 23 33), mm3=(43 53 63 73)
|
||||||
|
|
||||||
|
movq mm4,mm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm0,mm1 ; mm0=(02 03 12 13)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(22 23 32 33)
|
||||||
|
movq mm5,mm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm2,mm3 ; mm2=(42 43 52 53)
|
||||||
|
punpckhwd mm5,mm3 ; mm5=(62 63 72 73)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm7, MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(4,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(5,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; mm6=(00 10 20 30), mm1=(40 50 60 70)
|
||||||
|
; mm7=(01 11 21 31), mm3=(41 51 61 71)
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=(22 23 32 33)
|
||||||
|
movq MMWORD [wk(1)], mm2 ; wk(1)=(42 43 52 53)
|
||||||
|
|
||||||
|
movq mm4,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm6,mm7 ; mm6=(00 01 10 11)
|
||||||
|
punpckhwd mm4,mm7 ; mm4=(20 21 30 31)
|
||||||
|
movq mm2,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm1,mm3 ; mm1=(40 41 50 51)
|
||||||
|
punpckhwd mm2,mm3 ; mm2=(60 61 70 71)
|
||||||
|
|
||||||
|
movq mm7,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm0 ; mm6=(00 01 02 03)=data0
|
||||||
|
punpckhdq mm7,mm0 ; mm7=(10 11 12 13)=data1
|
||||||
|
movq mm3,mm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm2,mm5 ; mm2=(60 61 62 63)=data6
|
||||||
|
punpckhdq mm3,mm5 ; mm3=(70 71 72 73)=data7
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
movq mm5,mm6
|
||||||
|
psubw mm7,mm2 ; mm7=data1-data6=tmp6
|
||||||
|
psubw mm6,mm3 ; mm6=data0-data7=tmp7
|
||||||
|
paddw mm0,mm2 ; mm0=data1+data6=tmp1
|
||||||
|
paddw mm5,mm3 ; mm5=data0+data7=tmp0
|
||||||
|
|
||||||
|
movq mm2, MMWORD [wk(0)] ; mm2=(22 23 32 33)
|
||||||
|
movq mm3, MMWORD [wk(1)] ; mm3=(42 43 52 53)
|
||||||
|
movq MMWORD [wk(0)], mm7 ; wk(0)=tmp6
|
||||||
|
movq MMWORD [wk(1)], mm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movq mm7,mm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm4,mm2 ; mm4=(20 21 22 23)=data2
|
||||||
|
punpckhdq mm7,mm2 ; mm7=(30 31 32 33)=data3
|
||||||
|
movq mm6,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm1,mm3 ; mm1=(40 41 42 43)=data4
|
||||||
|
punpckhdq mm6,mm3 ; mm6=(50 51 52 53)=data5
|
||||||
|
|
||||||
|
movq mm2,mm7
|
||||||
|
movq mm3,mm4
|
||||||
|
paddw mm7,mm1 ; mm7=data3+data4=tmp3
|
||||||
|
paddw mm4,mm6 ; mm4=data2+data5=tmp2
|
||||||
|
psubw mm2,mm1 ; mm2=data3-data4=tmp4
|
||||||
|
psubw mm3,mm6 ; mm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm1,mm5
|
||||||
|
movq mm6,mm0
|
||||||
|
paddw mm5,mm7 ; mm5=tmp10
|
||||||
|
paddw mm0,mm4 ; mm0=tmp11
|
||||||
|
psubw mm1,mm7 ; mm1=tmp13
|
||||||
|
psubw mm6,mm4 ; mm6=tmp12
|
||||||
|
|
||||||
|
movq mm7,mm5
|
||||||
|
paddw mm5,mm0 ; mm5=tmp10+tmp11
|
||||||
|
psubw mm7,mm0 ; mm7=tmp10-tmp11
|
||||||
|
|
||||||
|
paddw mm5,[GOTOFF(ebx,PW_DESCALE_P2X)]
|
||||||
|
paddw mm7,[GOTOFF(ebx,PW_DESCALE_P2X)]
|
||||||
|
psraw mm5,PASS1_BITS ; mm5=data0
|
||||||
|
psraw mm7,PASS1_BITS ; mm7=data4
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edx,SIZEOF_DCTELEM)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(4,0,edx,SIZEOF_DCTELEM)], mm7
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (tmp12 + tmp13) * 0.541196100;
|
||||||
|
; data2 = z1 + tmp13 * 0.765366865;
|
||||||
|
; data6 = z1 + tmp12 * -1.847759065;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; data2 = tmp13 * (0.541196100 + 0.765366865) + tmp12 * 0.541196100;
|
||||||
|
; data6 = tmp13 * 0.541196100 + tmp12 * (0.541196100 - 1.847759065);
|
||||||
|
|
||||||
|
movq mm4,mm1 ; mm1=tmp13
|
||||||
|
movq mm0,mm1
|
||||||
|
punpcklwd mm4,mm6 ; mm6=tmp12
|
||||||
|
punpckhwd mm0,mm6
|
||||||
|
movq mm1,mm4
|
||||||
|
movq mm6,mm0
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F130_F054)] ; mm4=data2L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F130_F054)] ; mm0=data2H
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_F054_MF130)] ; mm1=data6L
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_F054_MF130)] ; mm6=data6H
|
||||||
|
|
||||||
|
paddd mm4,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd mm0,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad mm4,DESCALE_P2
|
||||||
|
psrad mm0,DESCALE_P2
|
||||||
|
paddd mm1,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd mm6,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad mm1,DESCALE_P2
|
||||||
|
psrad mm6,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw mm4,mm0 ; mm4=data2
|
||||||
|
packssdw mm1,mm6 ; mm1=data6
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(6,0,edx,SIZEOF_DCTELEM)], mm1
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm5, MMWORD [wk(0)] ; mm5=tmp6
|
||||||
|
movq mm7, MMWORD [wk(1)] ; mm7=tmp7
|
||||||
|
|
||||||
|
movq mm0,mm2 ; mm2=tmp4
|
||||||
|
movq mm6,mm3 ; mm3=tmp5
|
||||||
|
paddw mm0,mm5 ; mm0=z3
|
||||||
|
paddw mm6,mm7 ; mm6=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm1,mm0
|
||||||
|
punpcklwd mm4,mm6
|
||||||
|
punpckhwd mm1,mm6
|
||||||
|
movq mm0,mm4
|
||||||
|
movq mm6,mm1
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_MF078_F117)] ; mm4=z3L
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF078_F117)] ; mm1=z3H
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F117_F078)] ; mm0=z4L
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_F117_F078)] ; mm6=z4H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm4 ; wk(0)=z3L
|
||||||
|
movq MMWORD [wk(1)], mm1 ; wk(1)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp4 + tmp7; z2 = tmp5 + tmp6;
|
||||||
|
; tmp4 = tmp4 * 0.298631336; tmp5 = tmp5 * 2.053119869;
|
||||||
|
; tmp6 = tmp6 * 3.072711026; tmp7 = tmp7 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; data7 = tmp4 + z1 + z3; data5 = tmp5 + z2 + z4;
|
||||||
|
; data3 = tmp6 + z2 + z3; data1 = tmp7 + z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp4 = tmp4 * (0.298631336 - 0.899976223) + tmp7 * -0.899976223;
|
||||||
|
; tmp5 = tmp5 * (2.053119869 - 2.562915447) + tmp6 * -2.562915447;
|
||||||
|
; tmp6 = tmp5 * -2.562915447 + tmp6 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp7 = tmp4 * -0.899976223 + tmp7 * (1.501321110 - 0.899976223);
|
||||||
|
; data7 = tmp4 + z3; data5 = tmp5 + z4;
|
||||||
|
; data3 = tmp6 + z3; data1 = tmp7 + z4;
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm1,mm2
|
||||||
|
punpcklwd mm4,mm7
|
||||||
|
punpckhwd mm1,mm7
|
||||||
|
movq mm2,mm4
|
||||||
|
movq mm7,mm1
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_MF060_MF089)] ; mm4=tmp4L
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF060_MF089)] ; mm1=tmp4H
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF089_F060)] ; mm2=tmp7L
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_MF089_F060)] ; mm7=tmp7H
|
||||||
|
|
||||||
|
paddd mm4, MMWORD [wk(0)] ; mm4=data7L
|
||||||
|
paddd mm1, MMWORD [wk(1)] ; mm1=data7H
|
||||||
|
paddd mm2,mm0 ; mm2=data1L
|
||||||
|
paddd mm7,mm6 ; mm7=data1H
|
||||||
|
|
||||||
|
paddd mm4,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd mm1,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad mm4,DESCALE_P2
|
||||||
|
psrad mm1,DESCALE_P2
|
||||||
|
paddd mm2,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd mm7,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad mm2,DESCALE_P2
|
||||||
|
psrad mm7,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw mm4,mm1 ; mm4=data7
|
||||||
|
packssdw mm2,mm7 ; mm2=data1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(7,0,edx,SIZEOF_DCTELEM)], mm4
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edx,SIZEOF_DCTELEM)], mm2
|
||||||
|
|
||||||
|
movq mm1,mm3
|
||||||
|
movq mm7,mm3
|
||||||
|
punpcklwd mm1,mm5
|
||||||
|
punpckhwd mm7,mm5
|
||||||
|
movq mm3,mm1
|
||||||
|
movq mm5,mm7
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF050_MF256)] ; mm1=tmp5L
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_MF050_MF256)] ; mm7=tmp5H
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_MF256_F050)] ; mm3=tmp6L
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_MF256_F050)] ; mm5=tmp6H
|
||||||
|
|
||||||
|
paddd mm1,mm0 ; mm1=data5L
|
||||||
|
paddd mm7,mm6 ; mm7=data5H
|
||||||
|
paddd mm3, MMWORD [wk(0)] ; mm3=data3L
|
||||||
|
paddd mm5, MMWORD [wk(1)] ; mm5=data3H
|
||||||
|
|
||||||
|
paddd mm1,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd mm7,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad mm1,DESCALE_P2
|
||||||
|
psrad mm7,DESCALE_P2
|
||||||
|
paddd mm3,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd mm5,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad mm3,DESCALE_P2
|
||||||
|
psrad mm5,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw mm1,mm7 ; mm1=data5
|
||||||
|
packssdw mm3,mm5 ; mm3=data3
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(5,0,edx,SIZEOF_DCTELEM)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edx,SIZEOF_DCTELEM)], mm3
|
||||||
|
|
||||||
|
add edx, byte 4*SIZEOF_DCTELEM
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_INT_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_ISLOW_SUPPORTED
|
||||||
411
jfss2fst.asm
Normal file
411
jfss2fst.asm
Normal file
@@ -0,0 +1,411 @@
|
|||||||
|
;
|
||||||
|
; jfss2fst.asm - fast integer FDCT (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a fast, not so accurate integer implementation of
|
||||||
|
; the forward DCT (Discrete Cosine Transform). The following code is
|
||||||
|
; based directly on the IJG's original jfdctfst.c; see the jfdctfst.c
|
||||||
|
; for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_IFAST_SUPPORTED
|
||||||
|
%ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 8 ; 14 is also OK.
|
||||||
|
|
||||||
|
%if CONST_BITS == 8
|
||||||
|
F_0_382 equ 98 ; FIX(0.382683433)
|
||||||
|
F_0_541 equ 139 ; FIX(0.541196100)
|
||||||
|
F_0_707 equ 181 ; FIX(0.707106781)
|
||||||
|
F_1_306 equ 334 ; FIX(1.306562965)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_382 equ DESCALE( 410903207,30-CONST_BITS) ; FIX(0.382683433)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_707 equ DESCALE( 759250124,30-CONST_BITS) ; FIX(0.707106781)
|
||||||
|
F_1_306 equ DESCALE(1402911301,30-CONST_BITS) ; FIX(1.306562965)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
; PRE_MULTIPLY_SCALE_BITS <= 2 (to avoid overflow)
|
||||||
|
; CONST_BITS + CONST_SHIFT + PRE_MULTIPLY_SCALE_BITS == 16 (for pmulhw)
|
||||||
|
|
||||||
|
%define PRE_MULTIPLY_SCALE_BITS 2
|
||||||
|
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_ifast_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_ifast_sse2):
|
||||||
|
|
||||||
|
PW_F0707 times 8 dw F_0_707 << CONST_SHIFT
|
||||||
|
PW_F0382 times 8 dw F_0_382 << CONST_SHIFT
|
||||||
|
PW_F0541 times 8 dw F_0_541 << CONST_SHIFT
|
||||||
|
PW_F1306 times 8 dw F_1_306 << CONST_SHIFT
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_ifast_sse2 (DCTELEM * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; DCTELEM * data
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_ifast_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_ifast_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; xmm0=(00 01 02 03 04 05 06 07), xmm2=(20 21 22 23 24 25 26 27)
|
||||||
|
; xmm1=(10 11 12 13 14 15 16 17), xmm3=(30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm0,xmm1 ; xmm0=(00 10 01 11 02 12 03 13)
|
||||||
|
punpckhwd xmm4,xmm1 ; xmm4=(04 14 05 15 06 16 07 17)
|
||||||
|
movdqa xmm5,xmm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm2,xmm3 ; xmm2=(20 30 21 31 22 32 23 33)
|
||||||
|
punpckhwd xmm5,xmm3 ; xmm5=(24 34 25 35 26 36 27 37)
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm7, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; xmm6=( 4 12 20 28 36 44 52 60), xmm1=( 6 14 22 30 38 46 54 62)
|
||||||
|
; xmm7=( 5 13 21 29 37 45 53 61), xmm3=( 7 15 23 31 39 47 55 63)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm2 ; wk(0)=(20 30 21 31 22 32 23 33)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm5 ; wk(1)=(24 34 25 35 26 36 27 37)
|
||||||
|
|
||||||
|
movdqa xmm2,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm6,xmm7 ; xmm6=(40 50 41 51 42 52 43 53)
|
||||||
|
punpckhwd xmm2,xmm7 ; xmm2=(44 54 45 55 46 56 47 57)
|
||||||
|
movdqa xmm5,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm1,xmm3 ; xmm1=(60 70 61 71 62 72 63 73)
|
||||||
|
punpckhwd xmm5,xmm3 ; xmm5=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm6,xmm1 ; xmm6=(40 50 60 70 41 51 61 71)
|
||||||
|
punpckhdq xmm7,xmm1 ; xmm7=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa xmm3,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm2,xmm5 ; xmm2=(44 54 64 74 45 55 65 75)
|
||||||
|
punpckhdq xmm3,xmm5 ; xmm3=(46 56 66 76 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm1, XMMWORD [wk(0)] ; xmm1=(20 30 21 31 22 32 23 33)
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=(24 34 25 35 26 36 27 37)
|
||||||
|
movdqa XMMWORD [wk(0)], xmm7 ; wk(0)=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm2 ; wk(1)=(44 54 64 74 45 55 65 75)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm0 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm0,xmm1 ; xmm0=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq xmm7,xmm1 ; xmm7=(02 12 22 32 03 13 23 33)
|
||||||
|
movdqa xmm2,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm4,xmm5 ; xmm4=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq xmm2,xmm5 ; xmm2=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm0 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm0,xmm6 ; xmm0=(00 10 20 30 40 50 60 70)=data0
|
||||||
|
punpckhqdq xmm1,xmm6 ; xmm1=(01 11 21 31 41 51 61 71)=data1
|
||||||
|
movdqa xmm5,xmm2 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm2,xmm3 ; xmm2=(06 16 26 36 46 56 66 76)=data6
|
||||||
|
punpckhqdq xmm5,xmm3 ; xmm5=(07 17 27 37 47 57 67 77)=data7
|
||||||
|
|
||||||
|
movdqa xmm6,xmm1
|
||||||
|
movdqa xmm3,xmm0
|
||||||
|
psubw xmm1,xmm2 ; xmm1=data1-data6=tmp6
|
||||||
|
psubw xmm0,xmm5 ; xmm0=data0-data7=tmp7
|
||||||
|
paddw xmm6,xmm2 ; xmm6=data1+data6=tmp1
|
||||||
|
paddw xmm3,xmm5 ; xmm3=data0+data7=tmp0
|
||||||
|
|
||||||
|
movdqa xmm2, XMMWORD [wk(0)] ; xmm2=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=(44 54 64 74 45 55 65 75)
|
||||||
|
movdqa XMMWORD [wk(0)], xmm1 ; wk(0)=tmp6
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movdqa xmm1,xmm7 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm7,xmm2 ; xmm7=(02 12 22 32 42 52 62 72)=data2
|
||||||
|
punpckhqdq xmm1,xmm2 ; xmm1=(03 13 23 33 43 53 63 73)=data3
|
||||||
|
movdqa xmm0,xmm4 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm4,xmm5 ; xmm4=(04 14 24 34 44 54 64 74)=data4
|
||||||
|
punpckhqdq xmm0,xmm5 ; xmm0=(05 15 25 35 45 55 65 75)=data5
|
||||||
|
|
||||||
|
movdqa xmm2,xmm1
|
||||||
|
movdqa xmm5,xmm7
|
||||||
|
paddw xmm1,xmm4 ; xmm1=data3+data4=tmp3
|
||||||
|
paddw xmm7,xmm0 ; xmm7=data2+data5=tmp2
|
||||||
|
psubw xmm2,xmm4 ; xmm2=data3-data4=tmp4
|
||||||
|
psubw xmm5,xmm0 ; xmm5=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
movdqa xmm0,xmm6
|
||||||
|
psubw xmm3,xmm1 ; xmm3=tmp13
|
||||||
|
psubw xmm6,xmm7 ; xmm6=tmp12
|
||||||
|
paddw xmm4,xmm1 ; xmm4=tmp10
|
||||||
|
paddw xmm0,xmm7 ; xmm0=tmp11
|
||||||
|
|
||||||
|
paddw xmm6,xmm3
|
||||||
|
psllw xmm6,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm6,[GOTOFF(ebx,PW_F0707)] ; xmm6=z1
|
||||||
|
|
||||||
|
movdqa xmm1,xmm4
|
||||||
|
movdqa xmm7,xmm3
|
||||||
|
psubw xmm4,xmm0 ; xmm4=data4
|
||||||
|
psubw xmm3,xmm6 ; xmm3=data6
|
||||||
|
paddw xmm1,xmm0 ; xmm1=data0
|
||||||
|
paddw xmm7,xmm6 ; xmm7=data2
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [wk(0)] ; xmm0=tmp6
|
||||||
|
movdqa xmm6, XMMWORD [wk(1)] ; xmm6=tmp7
|
||||||
|
movdqa XMMWORD [wk(0)], xmm4 ; wk(0)=data4
|
||||||
|
movdqa XMMWORD [wk(1)], xmm3 ; wk(1)=data6
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
paddw xmm2,xmm5 ; xmm2=tmp10
|
||||||
|
paddw xmm5,xmm0 ; xmm5=tmp11
|
||||||
|
paddw xmm0,xmm6 ; xmm0=tmp12, xmm6=tmp7
|
||||||
|
|
||||||
|
psllw xmm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw xmm0,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
psllw xmm5,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm5,[GOTOFF(ebx,PW_F0707)] ; xmm5=z3
|
||||||
|
|
||||||
|
movdqa xmm4,xmm2 ; xmm4=tmp10
|
||||||
|
psubw xmm2,xmm0
|
||||||
|
pmulhw xmm2,[GOTOFF(ebx,PW_F0382)] ; xmm2=z5
|
||||||
|
pmulhw xmm4,[GOTOFF(ebx,PW_F0541)] ; xmm4=MULTIPLY(tmp10,FIX_0_541196)
|
||||||
|
pmulhw xmm0,[GOTOFF(ebx,PW_F1306)] ; xmm0=MULTIPLY(tmp12,FIX_1_306562)
|
||||||
|
paddw xmm4,xmm2 ; xmm4=z2
|
||||||
|
paddw xmm0,xmm2 ; xmm0=z4
|
||||||
|
|
||||||
|
movdqa xmm3,xmm6
|
||||||
|
psubw xmm6,xmm5 ; xmm6=z13
|
||||||
|
paddw xmm3,xmm5 ; xmm3=z11
|
||||||
|
|
||||||
|
movdqa xmm2,xmm6
|
||||||
|
movdqa xmm5,xmm3
|
||||||
|
psubw xmm6,xmm4 ; xmm6=data3
|
||||||
|
psubw xmm3,xmm0 ; xmm3=data7
|
||||||
|
paddw xmm2,xmm4 ; xmm2=data5
|
||||||
|
paddw xmm5,xmm0 ; xmm5=data1
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
; mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
|
||||||
|
; xmm1=(00 10 20 30 40 50 60 70), xmm7=(02 12 22 32 42 52 62 72)
|
||||||
|
; xmm5=(01 11 21 31 41 51 61 71), xmm6=(03 13 23 33 43 53 63 73)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm1,xmm5 ; xmm1=(00 01 10 11 20 21 30 31)
|
||||||
|
punpckhwd xmm4,xmm5 ; xmm4=(40 41 50 51 60 61 70 71)
|
||||||
|
movdqa xmm0,xmm7 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm7,xmm6 ; xmm7=(02 03 12 13 22 23 32 33)
|
||||||
|
punpckhwd xmm0,xmm6 ; xmm0=(42 43 52 53 62 63 72 73)
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [wk(0)] ; xmm5=col4
|
||||||
|
movdqa xmm6, XMMWORD [wk(1)] ; xmm6=col6
|
||||||
|
|
||||||
|
; xmm5=(04 14 24 34 44 54 64 74), xmm6=(06 16 26 36 46 56 66 76)
|
||||||
|
; xmm2=(05 15 25 35 45 55 65 75), xmm3=(07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm7 ; wk(0)=(02 03 12 13 22 23 32 33)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=(42 43 52 53 62 63 72 73)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm5,xmm2 ; xmm5=(04 05 14 15 24 25 34 35)
|
||||||
|
punpckhwd xmm7,xmm2 ; xmm7=(44 45 54 55 64 65 74 75)
|
||||||
|
movdqa xmm0,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm6,xmm3 ; xmm6=(06 07 16 17 26 27 36 37)
|
||||||
|
punpckhwd xmm0,xmm3 ; xmm0=(46 47 56 57 66 67 76 77)
|
||||||
|
|
||||||
|
movdqa xmm2,xmm5 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm5,xmm6 ; xmm5=(04 05 06 07 14 15 16 17)
|
||||||
|
punpckhdq xmm2,xmm6 ; xmm2=(24 25 26 27 34 35 36 37)
|
||||||
|
movdqa xmm3,xmm7 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm7,xmm0 ; xmm7=(44 45 46 47 54 55 56 57)
|
||||||
|
punpckhdq xmm3,xmm0 ; xmm3=(64 65 66 67 74 75 76 77)
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [wk(0)] ; xmm6=(02 03 12 13 22 23 32 33)
|
||||||
|
movdqa xmm0, XMMWORD [wk(1)] ; xmm0=(42 43 52 53 62 63 72 73)
|
||||||
|
movdqa XMMWORD [wk(0)], xmm2 ; wk(0)=(24 25 26 27 34 35 36 37)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm7 ; wk(1)=(44 45 46 47 54 55 56 57)
|
||||||
|
|
||||||
|
movdqa xmm2,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm1,xmm6 ; xmm1=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhdq xmm2,xmm6 ; xmm2=(20 21 22 23 30 31 32 33)
|
||||||
|
movdqa xmm7,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm4,xmm0 ; xmm4=(40 41 42 43 50 51 52 53)
|
||||||
|
punpckhdq xmm7,xmm0 ; xmm7=(60 61 62 63 70 71 72 73)
|
||||||
|
|
||||||
|
movdqa xmm6,xmm1 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm1,xmm5 ; xmm1=(00 01 02 03 04 05 06 07)=data0
|
||||||
|
punpckhqdq xmm6,xmm5 ; xmm6=(10 11 12 13 14 15 16 17)=data1
|
||||||
|
movdqa xmm0,xmm7 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm7,xmm3 ; xmm7=(60 61 62 63 64 65 66 67)=data6
|
||||||
|
punpckhqdq xmm0,xmm3 ; xmm0=(70 71 72 73 74 75 76 77)=data7
|
||||||
|
|
||||||
|
movdqa xmm5,xmm6
|
||||||
|
movdqa xmm3,xmm1
|
||||||
|
psubw xmm6,xmm7 ; xmm6=data1-data6=tmp6
|
||||||
|
psubw xmm1,xmm0 ; xmm1=data0-data7=tmp7
|
||||||
|
paddw xmm5,xmm7 ; xmm5=data1+data6=tmp1
|
||||||
|
paddw xmm3,xmm0 ; xmm3=data0+data7=tmp0
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [wk(0)] ; xmm7=(24 25 26 27 34 35 36 37)
|
||||||
|
movdqa xmm0, XMMWORD [wk(1)] ; xmm0=(44 45 46 47 54 55 56 57)
|
||||||
|
movdqa XMMWORD [wk(0)], xmm6 ; wk(0)=tmp6
|
||||||
|
movdqa XMMWORD [wk(1)], xmm1 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movdqa xmm6,xmm2 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm2,xmm7 ; xmm2=(20 21 22 23 24 25 26 27)=data2
|
||||||
|
punpckhqdq xmm6,xmm7 ; xmm6=(30 31 32 33 34 35 36 37)=data3
|
||||||
|
movdqa xmm1,xmm4 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm4,xmm0 ; xmm4=(40 41 42 43 44 45 46 47)=data4
|
||||||
|
punpckhqdq xmm1,xmm0 ; xmm1=(50 51 52 53 54 55 56 57)=data5
|
||||||
|
|
||||||
|
movdqa xmm7,xmm6
|
||||||
|
movdqa xmm0,xmm2
|
||||||
|
paddw xmm6,xmm4 ; xmm6=data3+data4=tmp3
|
||||||
|
paddw xmm2,xmm1 ; xmm2=data2+data5=tmp2
|
||||||
|
psubw xmm7,xmm4 ; xmm7=data3-data4=tmp4
|
||||||
|
psubw xmm0,xmm1 ; xmm0=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
movdqa xmm1,xmm5
|
||||||
|
psubw xmm3,xmm6 ; xmm3=tmp13
|
||||||
|
psubw xmm5,xmm2 ; xmm5=tmp12
|
||||||
|
paddw xmm4,xmm6 ; xmm4=tmp10
|
||||||
|
paddw xmm1,xmm2 ; xmm1=tmp11
|
||||||
|
|
||||||
|
paddw xmm5,xmm3
|
||||||
|
psllw xmm5,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm5,[GOTOFF(ebx,PW_F0707)] ; xmm5=z1
|
||||||
|
|
||||||
|
movdqa xmm6,xmm4
|
||||||
|
movdqa xmm2,xmm3
|
||||||
|
psubw xmm4,xmm1 ; xmm4=data4
|
||||||
|
psubw xmm3,xmm5 ; xmm3=data6
|
||||||
|
paddw xmm6,xmm1 ; xmm6=data0
|
||||||
|
paddw xmm2,xmm5 ; xmm2=data2
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_DCTELEM)], xmm4
|
||||||
|
movdqa XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_DCTELEM)], xmm3
|
||||||
|
movdqa XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_DCTELEM)], xmm6
|
||||||
|
movdqa XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_DCTELEM)], xmm2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm1, XMMWORD [wk(0)] ; xmm1=tmp6
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=tmp7
|
||||||
|
|
||||||
|
paddw xmm7,xmm0 ; xmm7=tmp10
|
||||||
|
paddw xmm0,xmm1 ; xmm0=tmp11
|
||||||
|
paddw xmm1,xmm5 ; xmm1=tmp12, xmm5=tmp7
|
||||||
|
|
||||||
|
psllw xmm7,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw xmm1,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
psllw xmm0,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm0,[GOTOFF(ebx,PW_F0707)] ; xmm0=z3
|
||||||
|
|
||||||
|
movdqa xmm4,xmm7 ; xmm4=tmp10
|
||||||
|
psubw xmm7,xmm1
|
||||||
|
pmulhw xmm7,[GOTOFF(ebx,PW_F0382)] ; xmm7=z5
|
||||||
|
pmulhw xmm4,[GOTOFF(ebx,PW_F0541)] ; xmm4=MULTIPLY(tmp10,FIX_0_541196)
|
||||||
|
pmulhw xmm1,[GOTOFF(ebx,PW_F1306)] ; xmm1=MULTIPLY(tmp12,FIX_1_306562)
|
||||||
|
paddw xmm4,xmm7 ; xmm4=z2
|
||||||
|
paddw xmm1,xmm7 ; xmm1=z4
|
||||||
|
|
||||||
|
movdqa xmm3,xmm5
|
||||||
|
psubw xmm5,xmm0 ; xmm5=z13
|
||||||
|
paddw xmm3,xmm0 ; xmm3=z11
|
||||||
|
|
||||||
|
movdqa xmm6,xmm5
|
||||||
|
movdqa xmm2,xmm3
|
||||||
|
psubw xmm5,xmm4 ; xmm5=data3
|
||||||
|
psubw xmm3,xmm1 ; xmm3=data7
|
||||||
|
paddw xmm6,xmm4 ; xmm6=data5
|
||||||
|
paddw xmm2,xmm1 ; xmm2=data1
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_DCTELEM)], xmm5
|
||||||
|
movdqa XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_DCTELEM)], xmm3
|
||||||
|
movdqa XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_DCTELEM)], xmm6
|
||||||
|
movdqa XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_DCTELEM)], xmm2
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
%endif ; DCT_IFAST_SUPPORTED
|
||||||
641
jfss2int.asm
Normal file
641
jfss2int.asm
Normal file
@@ -0,0 +1,641 @@
|
|||||||
|
;
|
||||||
|
; jfss2int.asm - accurate integer FDCT (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a slow-but-accurate integer implementation of the
|
||||||
|
; forward DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jfdctint.c; see the jfdctint.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
%ifdef JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%define DESCALE_P1 (CONST_BITS-PASS1_BITS)
|
||||||
|
%define DESCALE_P2 (CONST_BITS+PASS1_BITS)
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_298 equ 2446 ; FIX(0.298631336)
|
||||||
|
F_0_390 equ 3196 ; FIX(0.390180644)
|
||||||
|
F_0_541 equ 4433 ; FIX(0.541196100)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_175 equ 9633 ; FIX(1.175875602)
|
||||||
|
F_1_501 equ 12299 ; FIX(1.501321110)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_1_961 equ 16069 ; FIX(1.961570560)
|
||||||
|
F_2_053 equ 16819 ; FIX(2.053119869)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_072 equ 25172 ; FIX(3.072711026)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_298 equ DESCALE( 320652955,30-CONST_BITS) ; FIX(0.298631336)
|
||||||
|
F_0_390 equ DESCALE( 418953276,30-CONST_BITS) ; FIX(0.390180644)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_175 equ DESCALE(1262586813,30-CONST_BITS) ; FIX(1.175875602)
|
||||||
|
F_1_501 equ DESCALE(1612031267,30-CONST_BITS) ; FIX(1.501321110)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_1_961 equ DESCALE(2106220350,30-CONST_BITS) ; FIX(1.961570560)
|
||||||
|
F_2_053 equ DESCALE(2204520673,30-CONST_BITS) ; FIX(2.053119869)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_072 equ DESCALE(3299298341,30-CONST_BITS) ; FIX(3.072711026)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_islow_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_islow_sse2):
|
||||||
|
|
||||||
|
PW_F130_F054 times 4 dw (F_0_541+F_0_765), F_0_541
|
||||||
|
PW_F054_MF130 times 4 dw F_0_541, (F_0_541-F_1_847)
|
||||||
|
PW_MF078_F117 times 4 dw (F_1_175-F_1_961), F_1_175
|
||||||
|
PW_F117_F078 times 4 dw F_1_175, (F_1_175-F_0_390)
|
||||||
|
PW_MF060_MF089 times 4 dw (F_0_298-F_0_899),-F_0_899
|
||||||
|
PW_MF089_F060 times 4 dw -F_0_899, (F_1_501-F_0_899)
|
||||||
|
PW_MF050_MF256 times 4 dw (F_2_053-F_2_562),-F_2_562
|
||||||
|
PW_MF256_F050 times 4 dw -F_2_562, (F_3_072-F_2_562)
|
||||||
|
PD_DESCALE_P1 times 4 dd 1 << (DESCALE_P1-1)
|
||||||
|
PD_DESCALE_P2 times 4 dd 1 << (DESCALE_P2-1)
|
||||||
|
PW_DESCALE_P2X times 8 dw 1 << (PASS1_BITS-1)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_islow_sse2 (DCTELEM * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; DCTELEM * data
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 6
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_islow_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_islow_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; xmm0=(00 01 02 03 04 05 06 07), xmm2=(20 21 22 23 24 25 26 27)
|
||||||
|
; xmm1=(10 11 12 13 14 15 16 17), xmm3=(30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm0,xmm1 ; xmm0=(00 10 01 11 02 12 03 13)
|
||||||
|
punpckhwd xmm4,xmm1 ; xmm4=(04 14 05 15 06 16 07 17)
|
||||||
|
movdqa xmm5,xmm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm2,xmm3 ; xmm2=(20 30 21 31 22 32 23 33)
|
||||||
|
punpckhwd xmm5,xmm3 ; xmm5=(24 34 25 35 26 36 27 37)
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm7, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_DCTELEM)]
|
||||||
|
|
||||||
|
; xmm6=( 4 12 20 28 36 44 52 60), xmm1=( 6 14 22 30 38 46 54 62)
|
||||||
|
; xmm7=( 5 13 21 29 37 45 53 61), xmm3=( 7 15 23 31 39 47 55 63)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm2 ; wk(0)=(20 30 21 31 22 32 23 33)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm5 ; wk(1)=(24 34 25 35 26 36 27 37)
|
||||||
|
|
||||||
|
movdqa xmm2,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm6,xmm7 ; xmm6=(40 50 41 51 42 52 43 53)
|
||||||
|
punpckhwd xmm2,xmm7 ; xmm2=(44 54 45 55 46 56 47 57)
|
||||||
|
movdqa xmm5,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm1,xmm3 ; xmm1=(60 70 61 71 62 72 63 73)
|
||||||
|
punpckhwd xmm5,xmm3 ; xmm5=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm6,xmm1 ; xmm6=(40 50 60 70 41 51 61 71)
|
||||||
|
punpckhdq xmm7,xmm1 ; xmm7=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa xmm3,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm2,xmm5 ; xmm2=(44 54 64 74 45 55 65 75)
|
||||||
|
punpckhdq xmm3,xmm5 ; xmm3=(46 56 66 76 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm1, XMMWORD [wk(0)] ; xmm1=(20 30 21 31 22 32 23 33)
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=(24 34 25 35 26 36 27 37)
|
||||||
|
movdqa XMMWORD [wk(2)], xmm7 ; wk(2)=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa XMMWORD [wk(3)], xmm2 ; wk(3)=(44 54 64 74 45 55 65 75)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm0 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm0,xmm1 ; xmm0=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq xmm7,xmm1 ; xmm7=(02 12 22 32 03 13 23 33)
|
||||||
|
movdqa xmm2,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm4,xmm5 ; xmm4=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq xmm2,xmm5 ; xmm2=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm0 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm0,xmm6 ; xmm0=(00 10 20 30 40 50 60 70)=data0
|
||||||
|
punpckhqdq xmm1,xmm6 ; xmm1=(01 11 21 31 41 51 61 71)=data1
|
||||||
|
movdqa xmm5,xmm2 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm2,xmm3 ; xmm2=(06 16 26 36 46 56 66 76)=data6
|
||||||
|
punpckhqdq xmm5,xmm3 ; xmm5=(07 17 27 37 47 57 67 77)=data7
|
||||||
|
|
||||||
|
movdqa xmm6,xmm1
|
||||||
|
movdqa xmm3,xmm0
|
||||||
|
psubw xmm1,xmm2 ; xmm1=data1-data6=tmp6
|
||||||
|
psubw xmm0,xmm5 ; xmm0=data0-data7=tmp7
|
||||||
|
paddw xmm6,xmm2 ; xmm6=data1+data6=tmp1
|
||||||
|
paddw xmm3,xmm5 ; xmm3=data0+data7=tmp0
|
||||||
|
|
||||||
|
movdqa xmm2, XMMWORD [wk(2)] ; xmm2=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa xmm5, XMMWORD [wk(3)] ; xmm5=(44 54 64 74 45 55 65 75)
|
||||||
|
movdqa XMMWORD [wk(0)], xmm1 ; wk(0)=tmp6
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movdqa xmm1,xmm7 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm7,xmm2 ; xmm7=(02 12 22 32 42 52 62 72)=data2
|
||||||
|
punpckhqdq xmm1,xmm2 ; xmm1=(03 13 23 33 43 53 63 73)=data3
|
||||||
|
movdqa xmm0,xmm4 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm4,xmm5 ; xmm4=(04 14 24 34 44 54 64 74)=data4
|
||||||
|
punpckhqdq xmm0,xmm5 ; xmm0=(05 15 25 35 45 55 65 75)=data5
|
||||||
|
|
||||||
|
movdqa xmm2,xmm1
|
||||||
|
movdqa xmm5,xmm7
|
||||||
|
paddw xmm1,xmm4 ; xmm1=data3+data4=tmp3
|
||||||
|
paddw xmm7,xmm0 ; xmm7=data2+data5=tmp2
|
||||||
|
psubw xmm2,xmm4 ; xmm2=data3-data4=tmp4
|
||||||
|
psubw xmm5,xmm0 ; xmm5=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
movdqa xmm0,xmm6
|
||||||
|
paddw xmm3,xmm1 ; xmm3=tmp10
|
||||||
|
paddw xmm6,xmm7 ; xmm6=tmp11
|
||||||
|
psubw xmm4,xmm1 ; xmm4=tmp13
|
||||||
|
psubw xmm0,xmm7 ; xmm0=tmp12
|
||||||
|
|
||||||
|
movdqa xmm1,xmm3
|
||||||
|
paddw xmm3,xmm6 ; xmm3=tmp10+tmp11
|
||||||
|
psubw xmm1,xmm6 ; xmm1=tmp10-tmp11
|
||||||
|
|
||||||
|
psllw xmm3,PASS1_BITS ; xmm3=data0
|
||||||
|
psllw xmm1,PASS1_BITS ; xmm1=data4
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(2)], xmm3 ; wk(2)=data0
|
||||||
|
movdqa XMMWORD [wk(3)], xmm1 ; wk(3)=data4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (tmp12 + tmp13) * 0.541196100;
|
||||||
|
; data2 = z1 + tmp13 * 0.765366865;
|
||||||
|
; data6 = z1 + tmp12 * -1.847759065;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; data2 = tmp13 * (0.541196100 + 0.765366865) + tmp12 * 0.541196100;
|
||||||
|
; data6 = tmp13 * 0.541196100 + tmp12 * (0.541196100 - 1.847759065);
|
||||||
|
|
||||||
|
movdqa xmm7,xmm4 ; xmm4=tmp13
|
||||||
|
movdqa xmm6,xmm4
|
||||||
|
punpcklwd xmm7,xmm0 ; xmm0=tmp12
|
||||||
|
punpckhwd xmm6,xmm0
|
||||||
|
movdqa xmm4,xmm7
|
||||||
|
movdqa xmm0,xmm6
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_F130_F054)] ; xmm7=data2L
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_F130_F054)] ; xmm6=data2H
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_F054_MF130)] ; xmm4=data6L
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_F054_MF130)] ; xmm0=data6H
|
||||||
|
|
||||||
|
paddd xmm7,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd xmm6,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad xmm7,DESCALE_P1
|
||||||
|
psrad xmm6,DESCALE_P1
|
||||||
|
paddd xmm4,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd xmm0,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad xmm4,DESCALE_P1
|
||||||
|
psrad xmm0,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm7,xmm6 ; xmm7=data2
|
||||||
|
packssdw xmm4,xmm0 ; xmm4=data6
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(4)], xmm7 ; wk(4)=data2
|
||||||
|
movdqa XMMWORD [wk(5)], xmm4 ; wk(5)=data6
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm3, XMMWORD [wk(0)] ; xmm3=tmp6
|
||||||
|
movdqa xmm1, XMMWORD [wk(1)] ; xmm1=tmp7
|
||||||
|
|
||||||
|
movdqa xmm6,xmm2 ; xmm2=tmp4
|
||||||
|
movdqa xmm0,xmm5 ; xmm5=tmp5
|
||||||
|
paddw xmm6,xmm3 ; xmm6=z3
|
||||||
|
paddw xmm0,xmm1 ; xmm0=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movdqa xmm7,xmm6
|
||||||
|
movdqa xmm4,xmm6
|
||||||
|
punpcklwd xmm7,xmm0
|
||||||
|
punpckhwd xmm4,xmm0
|
||||||
|
movdqa xmm6,xmm7
|
||||||
|
movdqa xmm0,xmm4
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF078_F117)] ; xmm7=z3L
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_MF078_F117)] ; xmm4=z3H
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_F117_F078)] ; xmm6=z4L
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_F117_F078)] ; xmm0=z4H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm7 ; wk(0)=z3L
|
||||||
|
movdqa XMMWORD [wk(1)], xmm4 ; wk(1)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp4 + tmp7; z2 = tmp5 + tmp6;
|
||||||
|
; tmp4 = tmp4 * 0.298631336; tmp5 = tmp5 * 2.053119869;
|
||||||
|
; tmp6 = tmp6 * 3.072711026; tmp7 = tmp7 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; data7 = tmp4 + z1 + z3; data5 = tmp5 + z2 + z4;
|
||||||
|
; data3 = tmp6 + z2 + z3; data1 = tmp7 + z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp4 = tmp4 * (0.298631336 - 0.899976223) + tmp7 * -0.899976223;
|
||||||
|
; tmp5 = tmp5 * (2.053119869 - 2.562915447) + tmp6 * -2.562915447;
|
||||||
|
; tmp6 = tmp5 * -2.562915447 + tmp6 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp7 = tmp4 * -0.899976223 + tmp7 * (1.501321110 - 0.899976223);
|
||||||
|
; data7 = tmp4 + z3; data5 = tmp5 + z4;
|
||||||
|
; data3 = tmp6 + z3; data1 = tmp7 + z4;
|
||||||
|
|
||||||
|
movdqa xmm7,xmm2
|
||||||
|
movdqa xmm4,xmm2
|
||||||
|
punpcklwd xmm7,xmm1
|
||||||
|
punpckhwd xmm4,xmm1
|
||||||
|
movdqa xmm2,xmm7
|
||||||
|
movdqa xmm1,xmm4
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm7=tmp4L
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm4=tmp4H
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_MF089_F060)] ; xmm2=tmp7L
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF089_F060)] ; xmm1=tmp7H
|
||||||
|
|
||||||
|
paddd xmm7, XMMWORD [wk(0)] ; xmm7=data7L
|
||||||
|
paddd xmm4, XMMWORD [wk(1)] ; xmm4=data7H
|
||||||
|
paddd xmm2,xmm6 ; xmm2=data1L
|
||||||
|
paddd xmm1,xmm0 ; xmm1=data1H
|
||||||
|
|
||||||
|
paddd xmm7,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd xmm4,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad xmm7,DESCALE_P1
|
||||||
|
psrad xmm4,DESCALE_P1
|
||||||
|
paddd xmm2,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd xmm1,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad xmm2,DESCALE_P1
|
||||||
|
psrad xmm1,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm7,xmm4 ; xmm7=data7
|
||||||
|
packssdw xmm2,xmm1 ; xmm2=data1
|
||||||
|
|
||||||
|
movdqa xmm4,xmm5
|
||||||
|
movdqa xmm1,xmm5
|
||||||
|
punpcklwd xmm4,xmm3
|
||||||
|
punpckhwd xmm1,xmm3
|
||||||
|
movdqa xmm5,xmm4
|
||||||
|
movdqa xmm3,xmm1
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm4=tmp5L
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm1=tmp5H
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_MF256_F050)] ; xmm5=tmp6L
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_MF256_F050)] ; xmm3=tmp6H
|
||||||
|
|
||||||
|
paddd xmm4,xmm6 ; xmm4=data5L
|
||||||
|
paddd xmm1,xmm0 ; xmm1=data5H
|
||||||
|
paddd xmm5, XMMWORD [wk(0)] ; xmm5=data3L
|
||||||
|
paddd xmm3, XMMWORD [wk(1)] ; xmm3=data3H
|
||||||
|
|
||||||
|
paddd xmm4,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd xmm1,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad xmm4,DESCALE_P1
|
||||||
|
psrad xmm1,DESCALE_P1
|
||||||
|
paddd xmm5,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
paddd xmm3,[GOTOFF(ebx,PD_DESCALE_P1)]
|
||||||
|
psrad xmm5,DESCALE_P1
|
||||||
|
psrad xmm3,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm4,xmm1 ; xmm4=data5
|
||||||
|
packssdw xmm5,xmm3 ; xmm5=data3
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
; mov edx, POINTER [data(eax)] ; (DCTELEM *)
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [wk(2)] ; xmm6=col0
|
||||||
|
movdqa xmm0, XMMWORD [wk(4)] ; xmm0=col2
|
||||||
|
|
||||||
|
; xmm6=(00 10 20 30 40 50 60 70), xmm0=(02 12 22 32 42 52 62 72)
|
||||||
|
; xmm2=(01 11 21 31 41 51 61 71), xmm5=(03 13 23 33 43 53 63 73)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm6,xmm2 ; xmm6=(00 01 10 11 20 21 30 31)
|
||||||
|
punpckhwd xmm1,xmm2 ; xmm1=(40 41 50 51 60 61 70 71)
|
||||||
|
movdqa xmm3,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm0,xmm5 ; xmm0=(02 03 12 13 22 23 32 33)
|
||||||
|
punpckhwd xmm3,xmm5 ; xmm3=(42 43 52 53 62 63 72 73)
|
||||||
|
|
||||||
|
movdqa xmm2, XMMWORD [wk(3)] ; xmm2=col4
|
||||||
|
movdqa xmm5, XMMWORD [wk(5)] ; xmm5=col6
|
||||||
|
|
||||||
|
; xmm2=(04 14 24 34 44 54 64 74), xmm5=(06 16 26 36 46 56 66 76)
|
||||||
|
; xmm4=(05 15 25 35 45 55 65 75), xmm7=(07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm0 ; wk(0)=(02 03 12 13 22 23 32 33)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm3 ; wk(1)=(42 43 52 53 62 63 72 73)
|
||||||
|
|
||||||
|
movdqa xmm0,xmm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm2,xmm4 ; xmm2=(04 05 14 15 24 25 34 35)
|
||||||
|
punpckhwd xmm0,xmm4 ; xmm0=(44 45 54 55 64 65 74 75)
|
||||||
|
movdqa xmm3,xmm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm5,xmm7 ; xmm5=(06 07 16 17 26 27 36 37)
|
||||||
|
punpckhwd xmm3,xmm7 ; xmm3=(46 47 56 57 66 67 76 77)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm2,xmm5 ; xmm2=(04 05 06 07 14 15 16 17)
|
||||||
|
punpckhdq xmm4,xmm5 ; xmm4=(24 25 26 27 34 35 36 37)
|
||||||
|
movdqa xmm7,xmm0 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm0,xmm3 ; xmm0=(44 45 46 47 54 55 56 57)
|
||||||
|
punpckhdq xmm7,xmm3 ; xmm7=(64 65 66 67 74 75 76 77)
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [wk(0)] ; xmm5=(02 03 12 13 22 23 32 33)
|
||||||
|
movdqa xmm3, XMMWORD [wk(1)] ; xmm3=(42 43 52 53 62 63 72 73)
|
||||||
|
movdqa XMMWORD [wk(2)], xmm4 ; wk(2)=(24 25 26 27 34 35 36 37)
|
||||||
|
movdqa XMMWORD [wk(3)], xmm0 ; wk(3)=(44 45 46 47 54 55 56 57)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm6,xmm5 ; xmm6=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhdq xmm4,xmm5 ; xmm4=(20 21 22 23 30 31 32 33)
|
||||||
|
movdqa xmm0,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm1,xmm3 ; xmm1=(40 41 42 43 50 51 52 53)
|
||||||
|
punpckhdq xmm0,xmm3 ; xmm0=(60 61 62 63 70 71 72 73)
|
||||||
|
|
||||||
|
movdqa xmm5,xmm6 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm6,xmm2 ; xmm6=(00 01 02 03 04 05 06 07)=data0
|
||||||
|
punpckhqdq xmm5,xmm2 ; xmm5=(10 11 12 13 14 15 16 17)=data1
|
||||||
|
movdqa xmm3,xmm0 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm0,xmm7 ; xmm0=(60 61 62 63 64 65 66 67)=data6
|
||||||
|
punpckhqdq xmm3,xmm7 ; xmm3=(70 71 72 73 74 75 76 77)=data7
|
||||||
|
|
||||||
|
movdqa xmm2,xmm5
|
||||||
|
movdqa xmm7,xmm6
|
||||||
|
psubw xmm5,xmm0 ; xmm5=data1-data6=tmp6
|
||||||
|
psubw xmm6,xmm3 ; xmm6=data0-data7=tmp7
|
||||||
|
paddw xmm2,xmm0 ; xmm2=data1+data6=tmp1
|
||||||
|
paddw xmm7,xmm3 ; xmm7=data0+data7=tmp0
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [wk(2)] ; xmm0=(24 25 26 27 34 35 36 37)
|
||||||
|
movdqa xmm3, XMMWORD [wk(3)] ; xmm3=(44 45 46 47 54 55 56 57)
|
||||||
|
movdqa XMMWORD [wk(0)], xmm5 ; wk(0)=tmp6
|
||||||
|
movdqa XMMWORD [wk(1)], xmm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movdqa xmm5,xmm4 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm4,xmm0 ; xmm4=(20 21 22 23 24 25 26 27)=data2
|
||||||
|
punpckhqdq xmm5,xmm0 ; xmm5=(30 31 32 33 34 35 36 37)=data3
|
||||||
|
movdqa xmm6,xmm1 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm1,xmm3 ; xmm1=(40 41 42 43 44 45 46 47)=data4
|
||||||
|
punpckhqdq xmm6,xmm3 ; xmm6=(50 51 52 53 54 55 56 57)=data5
|
||||||
|
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
movdqa xmm3,xmm4
|
||||||
|
paddw xmm5,xmm1 ; xmm5=data3+data4=tmp3
|
||||||
|
paddw xmm4,xmm6 ; xmm4=data2+data5=tmp2
|
||||||
|
psubw xmm0,xmm1 ; xmm0=data3-data4=tmp4
|
||||||
|
psubw xmm3,xmm6 ; xmm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm1,xmm7
|
||||||
|
movdqa xmm6,xmm2
|
||||||
|
paddw xmm7,xmm5 ; xmm7=tmp10
|
||||||
|
paddw xmm2,xmm4 ; xmm2=tmp11
|
||||||
|
psubw xmm1,xmm5 ; xmm1=tmp13
|
||||||
|
psubw xmm6,xmm4 ; xmm6=tmp12
|
||||||
|
|
||||||
|
movdqa xmm5,xmm7
|
||||||
|
paddw xmm7,xmm2 ; xmm7=tmp10+tmp11
|
||||||
|
psubw xmm5,xmm2 ; xmm5=tmp10-tmp11
|
||||||
|
|
||||||
|
paddw xmm7,[GOTOFF(ebx,PW_DESCALE_P2X)]
|
||||||
|
paddw xmm5,[GOTOFF(ebx,PW_DESCALE_P2X)]
|
||||||
|
psraw xmm7,PASS1_BITS ; xmm7=data0
|
||||||
|
psraw xmm5,PASS1_BITS ; xmm5=data4
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_DCTELEM)], xmm7
|
||||||
|
movdqa XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_DCTELEM)], xmm5
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (tmp12 + tmp13) * 0.541196100;
|
||||||
|
; data2 = z1 + tmp13 * 0.765366865;
|
||||||
|
; data6 = z1 + tmp12 * -1.847759065;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; data2 = tmp13 * (0.541196100 + 0.765366865) + tmp12 * 0.541196100;
|
||||||
|
; data6 = tmp13 * 0.541196100 + tmp12 * (0.541196100 - 1.847759065);
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1 ; xmm1=tmp13
|
||||||
|
movdqa xmm2,xmm1
|
||||||
|
punpcklwd xmm4,xmm6 ; xmm6=tmp12
|
||||||
|
punpckhwd xmm2,xmm6
|
||||||
|
movdqa xmm1,xmm4
|
||||||
|
movdqa xmm6,xmm2
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_F130_F054)] ; xmm4=data2L
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_F130_F054)] ; xmm2=data2H
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_F054_MF130)] ; xmm1=data6L
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_F054_MF130)] ; xmm6=data6H
|
||||||
|
|
||||||
|
paddd xmm4,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd xmm2,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad xmm4,DESCALE_P2
|
||||||
|
psrad xmm2,DESCALE_P2
|
||||||
|
paddd xmm1,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd xmm6,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad xmm1,DESCALE_P2
|
||||||
|
psrad xmm6,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw xmm4,xmm2 ; xmm4=data2
|
||||||
|
packssdw xmm1,xmm6 ; xmm1=data6
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_DCTELEM)], xmm4
|
||||||
|
movdqa XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_DCTELEM)], xmm1
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [wk(0)] ; xmm7=tmp6
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=tmp7
|
||||||
|
|
||||||
|
movdqa xmm2,xmm0 ; xmm0=tmp4
|
||||||
|
movdqa xmm6,xmm3 ; xmm3=tmp5
|
||||||
|
paddw xmm2,xmm7 ; xmm2=z3
|
||||||
|
paddw xmm6,xmm5 ; xmm6=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movdqa xmm4,xmm2
|
||||||
|
movdqa xmm1,xmm2
|
||||||
|
punpcklwd xmm4,xmm6
|
||||||
|
punpckhwd xmm1,xmm6
|
||||||
|
movdqa xmm2,xmm4
|
||||||
|
movdqa xmm6,xmm1
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_MF078_F117)] ; xmm4=z3L
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF078_F117)] ; xmm1=z3H
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_F117_F078)] ; xmm2=z4L
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_F117_F078)] ; xmm6=z4H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm4 ; wk(0)=z3L
|
||||||
|
movdqa XMMWORD [wk(1)], xmm1 ; wk(1)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp4 + tmp7; z2 = tmp5 + tmp6;
|
||||||
|
; tmp4 = tmp4 * 0.298631336; tmp5 = tmp5 * 2.053119869;
|
||||||
|
; tmp6 = tmp6 * 3.072711026; tmp7 = tmp7 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; data7 = tmp4 + z1 + z3; data5 = tmp5 + z2 + z4;
|
||||||
|
; data3 = tmp6 + z2 + z3; data1 = tmp7 + z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp4 = tmp4 * (0.298631336 - 0.899976223) + tmp7 * -0.899976223;
|
||||||
|
; tmp5 = tmp5 * (2.053119869 - 2.562915447) + tmp6 * -2.562915447;
|
||||||
|
; tmp6 = tmp5 * -2.562915447 + tmp6 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp7 = tmp4 * -0.899976223 + tmp7 * (1.501321110 - 0.899976223);
|
||||||
|
; data7 = tmp4 + z3; data5 = tmp5 + z4;
|
||||||
|
; data3 = tmp6 + z3; data1 = tmp7 + z4;
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
movdqa xmm1,xmm0
|
||||||
|
punpcklwd xmm4,xmm5
|
||||||
|
punpckhwd xmm1,xmm5
|
||||||
|
movdqa xmm0,xmm4
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm4=tmp4L
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm1=tmp4H
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF089_F060)] ; xmm0=tmp7L
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_MF089_F060)] ; xmm5=tmp7H
|
||||||
|
|
||||||
|
paddd xmm4, XMMWORD [wk(0)] ; xmm4=data7L
|
||||||
|
paddd xmm1, XMMWORD [wk(1)] ; xmm1=data7H
|
||||||
|
paddd xmm0,xmm2 ; xmm0=data1L
|
||||||
|
paddd xmm5,xmm6 ; xmm5=data1H
|
||||||
|
|
||||||
|
paddd xmm4,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd xmm1,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad xmm4,DESCALE_P2
|
||||||
|
psrad xmm1,DESCALE_P2
|
||||||
|
paddd xmm0,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd xmm5,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad xmm0,DESCALE_P2
|
||||||
|
psrad xmm5,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw xmm4,xmm1 ; xmm4=data7
|
||||||
|
packssdw xmm0,xmm5 ; xmm0=data1
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_DCTELEM)], xmm4
|
||||||
|
movdqa XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_DCTELEM)], xmm0
|
||||||
|
|
||||||
|
movdqa xmm1,xmm3
|
||||||
|
movdqa xmm5,xmm3
|
||||||
|
punpcklwd xmm1,xmm7
|
||||||
|
punpckhwd xmm5,xmm7
|
||||||
|
movdqa xmm3,xmm1
|
||||||
|
movdqa xmm7,xmm5
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm1=tmp5L
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm5=tmp5H
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_MF256_F050)] ; xmm3=tmp6L
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF256_F050)] ; xmm7=tmp6H
|
||||||
|
|
||||||
|
paddd xmm1,xmm2 ; xmm1=data5L
|
||||||
|
paddd xmm5,xmm6 ; xmm5=data5H
|
||||||
|
paddd xmm3, XMMWORD [wk(0)] ; xmm3=data3L
|
||||||
|
paddd xmm7, XMMWORD [wk(1)] ; xmm7=data3H
|
||||||
|
|
||||||
|
paddd xmm1,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd xmm5,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad xmm1,DESCALE_P2
|
||||||
|
psrad xmm5,DESCALE_P2
|
||||||
|
paddd xmm3,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
paddd xmm7,[GOTOFF(ebx,PD_DESCALE_P2)]
|
||||||
|
psrad xmm3,DESCALE_P2
|
||||||
|
psrad xmm7,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw xmm1,xmm5 ; xmm1=data5
|
||||||
|
packssdw xmm3,xmm7 ; xmm3=data3
|
||||||
|
|
||||||
|
movdqa XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_DCTELEM)], xmm1
|
||||||
|
movdqa XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_DCTELEM)], xmm3
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_INT_SSE2_SUPPORTED
|
||||||
|
%endif ; DCT_ISLOW_SUPPORTED
|
||||||
383
jfsseflt.asm
Normal file
383
jfsseflt.asm
Normal file
@@ -0,0 +1,383 @@
|
|||||||
|
;
|
||||||
|
; jfsseflt.asm - floating-point FDCT (SSE)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a floating-point implementation of the forward DCT
|
||||||
|
; (Discrete Cosine Transform). The following code is based directly on
|
||||||
|
; the IJG's original jfdctflt.c; see the jfdctflt.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JFDCT_FLT_SSE_MMX_SUPPORTED
|
||||||
|
%define JFDCT_FLT_SSE_SUPPORTED
|
||||||
|
%endif
|
||||||
|
%ifdef JFDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
%define JFDCT_FLT_SSE_SUPPORTED
|
||||||
|
%endif
|
||||||
|
%ifdef JFDCT_FLT_SSE_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%macro unpcklps2 2 ; %1=(0 1 2 3) / %2=(4 5 6 7) => %1=(0 1 4 5)
|
||||||
|
shufps %1,%2,0x44
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
%macro unpckhps2 2 ; %1=(0 1 2 3) / %2=(4 5 6 7) => %1=(2 3 6 7)
|
||||||
|
shufps %1,%2,0xEE
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_fdct_float_sse)
|
||||||
|
|
||||||
|
EXTN(jconst_fdct_float_sse):
|
||||||
|
|
||||||
|
PD_0_382 times 4 dd 0.382683432365089771728460
|
||||||
|
PD_0_707 times 4 dd 0.707106781186547524400844
|
||||||
|
PD_0_541 times 4 dd 0.541196100146196984399723
|
||||||
|
PD_1_306 times 4 dd 1.306562964876376527856643
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform the forward DCT on one block of samples.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_fdct_float_sse (FAST_FLOAT * data)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define data(b) (b)+8 ; FAST_FLOAT * data
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_fdct_float_sse)
|
||||||
|
|
||||||
|
EXTN(jpeg_fdct_float_sse):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process rows.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (FAST_FLOAT *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
movaps xmm0, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm2, XMMWORD [XMMBLOCK(2,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(3,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; xmm0=(20 21 22 23), xmm2=(24 25 26 27)
|
||||||
|
; xmm1=(30 31 32 33), xmm3=(34 35 36 37)
|
||||||
|
|
||||||
|
movaps xmm4,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm0,xmm1 ; xmm0=(20 30 21 31)
|
||||||
|
unpckhps xmm4,xmm1 ; xmm4=(22 32 23 33)
|
||||||
|
movaps xmm5,xmm2 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm2,xmm3 ; xmm2=(24 34 25 35)
|
||||||
|
unpckhps xmm5,xmm3 ; xmm5=(26 36 27 37)
|
||||||
|
|
||||||
|
movaps xmm6, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm7, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; xmm6=(00 01 02 03), xmm1=(04 05 06 07)
|
||||||
|
; xmm7=(10 11 12 13), xmm3=(14 15 16 17)
|
||||||
|
|
||||||
|
movaps XMMWORD [wk(0)], xmm4 ; wk(0)=(22 32 23 33)
|
||||||
|
movaps XMMWORD [wk(1)], xmm2 ; wk(1)=(24 34 25 35)
|
||||||
|
|
||||||
|
movaps xmm4,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm6,xmm7 ; xmm6=(00 10 01 11)
|
||||||
|
unpckhps xmm4,xmm7 ; xmm4=(02 12 03 13)
|
||||||
|
movaps xmm2,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm1,xmm3 ; xmm1=(04 14 05 15)
|
||||||
|
unpckhps xmm2,xmm3 ; xmm2=(06 16 07 17)
|
||||||
|
|
||||||
|
movaps xmm7,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm6,xmm0 ; xmm6=(00 10 20 30)=data0
|
||||||
|
unpckhps2 xmm7,xmm0 ; xmm7=(01 11 21 31)=data1
|
||||||
|
movaps xmm3,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm2,xmm5 ; xmm2=(06 16 26 36)=data6
|
||||||
|
unpckhps2 xmm3,xmm5 ; xmm3=(07 17 27 37)=data7
|
||||||
|
|
||||||
|
movaps xmm0,xmm7
|
||||||
|
movaps xmm5,xmm6
|
||||||
|
subps xmm7,xmm2 ; xmm7=data1-data6=tmp6
|
||||||
|
subps xmm6,xmm3 ; xmm6=data0-data7=tmp7
|
||||||
|
addps xmm0,xmm2 ; xmm0=data1+data6=tmp1
|
||||||
|
addps xmm5,xmm3 ; xmm5=data0+data7=tmp0
|
||||||
|
|
||||||
|
movaps xmm2, XMMWORD [wk(0)] ; xmm2=(22 32 23 33)
|
||||||
|
movaps xmm3, XMMWORD [wk(1)] ; xmm3=(24 34 25 35)
|
||||||
|
movaps XMMWORD [wk(0)], xmm7 ; wk(0)=tmp6
|
||||||
|
movaps XMMWORD [wk(1)], xmm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movaps xmm7,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm4,xmm2 ; xmm4=(02 12 22 32)=data2
|
||||||
|
unpckhps2 xmm7,xmm2 ; xmm7=(03 13 23 33)=data3
|
||||||
|
movaps xmm6,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm1,xmm3 ; xmm1=(04 14 24 34)=data4
|
||||||
|
unpckhps2 xmm6,xmm3 ; xmm6=(05 15 25 35)=data5
|
||||||
|
|
||||||
|
movaps xmm2,xmm7
|
||||||
|
movaps xmm3,xmm4
|
||||||
|
addps xmm7,xmm1 ; xmm7=data3+data4=tmp3
|
||||||
|
addps xmm4,xmm6 ; xmm4=data2+data5=tmp2
|
||||||
|
subps xmm2,xmm1 ; xmm2=data3-data4=tmp4
|
||||||
|
subps xmm3,xmm6 ; xmm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movaps xmm1,xmm5
|
||||||
|
movaps xmm6,xmm0
|
||||||
|
subps xmm5,xmm7 ; xmm5=tmp13
|
||||||
|
subps xmm0,xmm4 ; xmm0=tmp12
|
||||||
|
addps xmm1,xmm7 ; xmm1=tmp10
|
||||||
|
addps xmm6,xmm4 ; xmm6=tmp11
|
||||||
|
|
||||||
|
addps xmm0,xmm5
|
||||||
|
mulps xmm0,[GOTOFF(ebx,PD_0_707)] ; xmm0=z1
|
||||||
|
|
||||||
|
movaps xmm7,xmm1
|
||||||
|
movaps xmm4,xmm5
|
||||||
|
subps xmm1,xmm6 ; xmm1=data4
|
||||||
|
subps xmm5,xmm0 ; xmm5=data6
|
||||||
|
addps xmm7,xmm6 ; xmm7=data0
|
||||||
|
addps xmm4,xmm0 ; xmm4=data2
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)], xmm1
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,1,edx,SIZEOF_FAST_FLOAT)], xmm5
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)], xmm7
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_FAST_FLOAT)], xmm4
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movaps xmm6, XMMWORD [wk(0)] ; xmm6=tmp6
|
||||||
|
movaps xmm0, XMMWORD [wk(1)] ; xmm0=tmp7
|
||||||
|
|
||||||
|
addps xmm2,xmm3 ; xmm2=tmp10
|
||||||
|
addps xmm3,xmm6 ; xmm3=tmp11
|
||||||
|
addps xmm6,xmm0 ; xmm6=tmp12, xmm0=tmp7
|
||||||
|
|
||||||
|
mulps xmm3,[GOTOFF(ebx,PD_0_707)] ; xmm3=z3
|
||||||
|
|
||||||
|
movaps xmm1,xmm2 ; xmm1=tmp10
|
||||||
|
subps xmm2,xmm6
|
||||||
|
mulps xmm2,[GOTOFF(ebx,PD_0_382)] ; xmm2=z5
|
||||||
|
mulps xmm1,[GOTOFF(ebx,PD_0_541)] ; xmm1=MULTIPLY(tmp10,FIX_0_541196)
|
||||||
|
mulps xmm6,[GOTOFF(ebx,PD_1_306)] ; xmm6=MULTIPLY(tmp12,FIX_1_306562)
|
||||||
|
addps xmm1,xmm2 ; xmm1=z2
|
||||||
|
addps xmm6,xmm2 ; xmm6=z4
|
||||||
|
|
||||||
|
movaps xmm5,xmm0
|
||||||
|
subps xmm0,xmm3 ; xmm0=z13
|
||||||
|
addps xmm5,xmm3 ; xmm5=z11
|
||||||
|
|
||||||
|
movaps xmm7,xmm0
|
||||||
|
movaps xmm4,xmm5
|
||||||
|
subps xmm0,xmm1 ; xmm0=data3
|
||||||
|
subps xmm5,xmm6 ; xmm5=data7
|
||||||
|
addps xmm7,xmm1 ; xmm7=data5
|
||||||
|
addps xmm4,xmm6 ; xmm4=data1
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,1,edx,SIZEOF_FAST_FLOAT)], xmm5
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)], xmm7
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)], xmm4
|
||||||
|
|
||||||
|
add edx, 4*DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process columns.
|
||||||
|
|
||||||
|
mov edx, POINTER [data(eax)] ; (FAST_FLOAT *)
|
||||||
|
mov ecx, DCTSIZE/4
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
|
||||||
|
movaps xmm0, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm2, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; xmm0=(02 12 22 32), xmm2=(42 52 62 72)
|
||||||
|
; xmm1=(03 13 23 33), xmm3=(43 53 63 73)
|
||||||
|
|
||||||
|
movaps xmm4,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm0,xmm1 ; xmm0=(02 03 12 13)
|
||||||
|
unpckhps xmm4,xmm1 ; xmm4=(22 23 32 33)
|
||||||
|
movaps xmm5,xmm2 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm2,xmm3 ; xmm2=(42 43 52 53)
|
||||||
|
unpckhps xmm5,xmm3 ; xmm5=(62 63 72 73)
|
||||||
|
|
||||||
|
movaps xmm6, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm7, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
; xmm6=(00 10 20 30), xmm1=(40 50 60 70)
|
||||||
|
; xmm7=(01 11 21 31), xmm3=(41 51 61 71)
|
||||||
|
|
||||||
|
movaps XMMWORD [wk(0)], xmm4 ; wk(0)=(22 23 32 33)
|
||||||
|
movaps XMMWORD [wk(1)], xmm2 ; wk(1)=(42 43 52 53)
|
||||||
|
|
||||||
|
movaps xmm4,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm6,xmm7 ; xmm6=(00 01 10 11)
|
||||||
|
unpckhps xmm4,xmm7 ; xmm4=(20 21 30 31)
|
||||||
|
movaps xmm2,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm1,xmm3 ; xmm1=(40 41 50 51)
|
||||||
|
unpckhps xmm2,xmm3 ; xmm2=(60 61 70 71)
|
||||||
|
|
||||||
|
movaps xmm7,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm6,xmm0 ; xmm6=(00 01 02 03)=data0
|
||||||
|
unpckhps2 xmm7,xmm0 ; xmm7=(10 11 12 13)=data1
|
||||||
|
movaps xmm3,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm2,xmm5 ; xmm2=(60 61 62 63)=data6
|
||||||
|
unpckhps2 xmm3,xmm5 ; xmm3=(70 71 72 73)=data7
|
||||||
|
|
||||||
|
movaps xmm0,xmm7
|
||||||
|
movaps xmm5,xmm6
|
||||||
|
subps xmm7,xmm2 ; xmm7=data1-data6=tmp6
|
||||||
|
subps xmm6,xmm3 ; xmm6=data0-data7=tmp7
|
||||||
|
addps xmm0,xmm2 ; xmm0=data1+data6=tmp1
|
||||||
|
addps xmm5,xmm3 ; xmm5=data0+data7=tmp0
|
||||||
|
|
||||||
|
movaps xmm2, XMMWORD [wk(0)] ; xmm2=(22 23 32 33)
|
||||||
|
movaps xmm3, XMMWORD [wk(1)] ; xmm3=(42 43 52 53)
|
||||||
|
movaps XMMWORD [wk(0)], xmm7 ; wk(0)=tmp6
|
||||||
|
movaps XMMWORD [wk(1)], xmm6 ; wk(1)=tmp7
|
||||||
|
|
||||||
|
movaps xmm7,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm4,xmm2 ; xmm4=(20 21 22 23)=data2
|
||||||
|
unpckhps2 xmm7,xmm2 ; xmm7=(30 31 32 33)=data3
|
||||||
|
movaps xmm6,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm1,xmm3 ; xmm1=(40 41 42 43)=data4
|
||||||
|
unpckhps2 xmm6,xmm3 ; xmm6=(50 51 52 53)=data5
|
||||||
|
|
||||||
|
movaps xmm2,xmm7
|
||||||
|
movaps xmm3,xmm4
|
||||||
|
addps xmm7,xmm1 ; xmm7=data3+data4=tmp3
|
||||||
|
addps xmm4,xmm6 ; xmm4=data2+data5=tmp2
|
||||||
|
subps xmm2,xmm1 ; xmm2=data3-data4=tmp4
|
||||||
|
subps xmm3,xmm6 ; xmm3=data2-data5=tmp5
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movaps xmm1,xmm5
|
||||||
|
movaps xmm6,xmm0
|
||||||
|
subps xmm5,xmm7 ; xmm5=tmp13
|
||||||
|
subps xmm0,xmm4 ; xmm0=tmp12
|
||||||
|
addps xmm1,xmm7 ; xmm1=tmp10
|
||||||
|
addps xmm6,xmm4 ; xmm6=tmp11
|
||||||
|
|
||||||
|
addps xmm0,xmm5
|
||||||
|
mulps xmm0,[GOTOFF(ebx,PD_0_707)] ; xmm0=z1
|
||||||
|
|
||||||
|
movaps xmm7,xmm1
|
||||||
|
movaps xmm4,xmm5
|
||||||
|
subps xmm1,xmm6 ; xmm1=data4
|
||||||
|
subps xmm5,xmm0 ; xmm5=data6
|
||||||
|
addps xmm7,xmm6 ; xmm7=data0
|
||||||
|
addps xmm4,xmm0 ; xmm4=data2
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_FAST_FLOAT)], xmm1
|
||||||
|
movaps XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_FAST_FLOAT)], xmm5
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)], xmm7
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_FAST_FLOAT)], xmm4
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movaps xmm6, XMMWORD [wk(0)] ; xmm6=tmp6
|
||||||
|
movaps xmm0, XMMWORD [wk(1)] ; xmm0=tmp7
|
||||||
|
|
||||||
|
addps xmm2,xmm3 ; xmm2=tmp10
|
||||||
|
addps xmm3,xmm6 ; xmm3=tmp11
|
||||||
|
addps xmm6,xmm0 ; xmm6=tmp12, xmm0=tmp7
|
||||||
|
|
||||||
|
mulps xmm3,[GOTOFF(ebx,PD_0_707)] ; xmm3=z3
|
||||||
|
|
||||||
|
movaps xmm1,xmm2 ; xmm1=tmp10
|
||||||
|
subps xmm2,xmm6
|
||||||
|
mulps xmm2,[GOTOFF(ebx,PD_0_382)] ; xmm2=z5
|
||||||
|
mulps xmm1,[GOTOFF(ebx,PD_0_541)] ; xmm1=MULTIPLY(tmp10,FIX_0_541196)
|
||||||
|
mulps xmm6,[GOTOFF(ebx,PD_1_306)] ; xmm6=MULTIPLY(tmp12,FIX_1_306562)
|
||||||
|
addps xmm1,xmm2 ; xmm1=z2
|
||||||
|
addps xmm6,xmm2 ; xmm6=z4
|
||||||
|
|
||||||
|
movaps xmm5,xmm0
|
||||||
|
subps xmm0,xmm3 ; xmm0=z13
|
||||||
|
addps xmm5,xmm3 ; xmm5=z11
|
||||||
|
|
||||||
|
movaps xmm7,xmm0
|
||||||
|
movaps xmm4,xmm5
|
||||||
|
subps xmm0,xmm1 ; xmm0=data3
|
||||||
|
subps xmm5,xmm6 ; xmm5=data7
|
||||||
|
addps xmm7,xmm1 ; xmm7=data5
|
||||||
|
addps xmm4,xmm6 ; xmm4=data1
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
movaps XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_FAST_FLOAT)], xmm5
|
||||||
|
movaps XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_FAST_FLOAT)], xmm7
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)], xmm4
|
||||||
|
|
||||||
|
add edx, byte 4*SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JFDCT_FLT_SSE_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
462
ji3dnflt.asm
Normal file
462
ji3dnflt.asm
Normal file
@@ -0,0 +1,462 @@
|
|||||||
|
;
|
||||||
|
; ji3dnflt.asm - floating-point IDCT (3DNow! & MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a floating-point implementation of the inverse DCT
|
||||||
|
; (Discrete Cosine Transform). The following code is based directly on
|
||||||
|
; the IJG's original jidctflt.c; see the jidctflt.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JIDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_float_3dnow)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_float_3dnow):
|
||||||
|
|
||||||
|
PD_1_414 times 2 dd 1.414213562373095048801689
|
||||||
|
PD_1_847 times 2 dd 1.847759065022573512256366
|
||||||
|
PD_1_082 times 2 dd 1.082392200292393968799446
|
||||||
|
PD_2_613 times 2 dd 2.613125929752753055713286
|
||||||
|
PD_RNDINT_MAGIC times 2 dd 100663296.0 ; (float)(0x00C00000 << 3)
|
||||||
|
PB_CENTERJSAMP times 8 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_float_3dnow (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
%define workspace wk(0)-DCTSIZE2*SIZEOF_FAST_FLOAT
|
||||||
|
; FAST_FLOAT workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_float_3dnow)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_float_3dnow):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
lea edi, [workspace] ; FAST_FLOAT * wsptr
|
||||||
|
mov ecx, DCTSIZE/2 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_FLOAT_3DNOW
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
pushpic ebx ; save GOT address
|
||||||
|
mov ebx, DWORD [DWBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
mov eax, DWORD [DWBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or ebx, DWORD [DWBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or ebx, DWORD [DWBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax,ebx
|
||||||
|
poppic ebx ; restore GOT address
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movd mm0, DWORD [DWBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
punpcklwd mm0,mm0
|
||||||
|
psrad mm0,(DWORD_BIT-WORD_BIT)
|
||||||
|
pi2fd mm0,mm0
|
||||||
|
|
||||||
|
pfmul mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm1,mm0
|
||||||
|
punpckldq mm0,mm0
|
||||||
|
punpckhdq mm1,mm1
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,2,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,3,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,2,edi,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,3,edi,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movd mm0, DWORD [DWBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movd mm1, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movd mm2, DWORD [DWBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movd mm3, DWORD [DWBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
punpcklwd mm0,mm0
|
||||||
|
punpcklwd mm1,mm1
|
||||||
|
psrad mm0,(DWORD_BIT-WORD_BIT)
|
||||||
|
psrad mm1,(DWORD_BIT-WORD_BIT)
|
||||||
|
pi2fd mm0,mm0
|
||||||
|
pi2fd mm1,mm1
|
||||||
|
|
||||||
|
pfmul mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
pfmul mm1, MMWORD [MMBLOCK(2,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
punpcklwd mm2,mm2
|
||||||
|
punpcklwd mm3,mm3
|
||||||
|
psrad mm2,(DWORD_BIT-WORD_BIT)
|
||||||
|
psrad mm3,(DWORD_BIT-WORD_BIT)
|
||||||
|
pi2fd mm2,mm2
|
||||||
|
pi2fd mm3,mm3
|
||||||
|
|
||||||
|
pfmul mm2, MMWORD [MMBLOCK(4,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
pfmul mm3, MMWORD [MMBLOCK(6,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
pfsub mm0,mm2 ; mm0=tmp11
|
||||||
|
pfsub mm1,mm3
|
||||||
|
pfadd mm4,mm2 ; mm4=tmp10
|
||||||
|
pfadd mm5,mm3 ; mm5=tmp13
|
||||||
|
|
||||||
|
pfmul mm1,[GOTOFF(ebx,PD_1_414)]
|
||||||
|
pfsub mm1,mm5 ; mm1=tmp12
|
||||||
|
|
||||||
|
movq mm6,mm4
|
||||||
|
movq mm7,mm0
|
||||||
|
pfsub mm4,mm5 ; mm4=tmp3
|
||||||
|
pfsub mm0,mm1 ; mm0=tmp2
|
||||||
|
pfadd mm6,mm5 ; mm6=tmp0
|
||||||
|
pfadd mm7,mm1 ; mm7=tmp1
|
||||||
|
|
||||||
|
movq MMWORD [wk(1)], mm4 ; tmp3
|
||||||
|
movq MMWORD [wk(0)], mm0 ; tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movd mm2, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movd mm3, DWORD [DWBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movd mm5, DWORD [DWBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movd mm1, DWORD [DWBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
punpcklwd mm2,mm2
|
||||||
|
punpcklwd mm3,mm3
|
||||||
|
psrad mm2,(DWORD_BIT-WORD_BIT)
|
||||||
|
psrad mm3,(DWORD_BIT-WORD_BIT)
|
||||||
|
pi2fd mm2,mm2
|
||||||
|
pi2fd mm3,mm3
|
||||||
|
|
||||||
|
pfmul mm2, MMWORD [MMBLOCK(1,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
pfmul mm3, MMWORD [MMBLOCK(3,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
punpcklwd mm5,mm5
|
||||||
|
punpcklwd mm1,mm1
|
||||||
|
psrad mm5,(DWORD_BIT-WORD_BIT)
|
||||||
|
psrad mm1,(DWORD_BIT-WORD_BIT)
|
||||||
|
pi2fd mm5,mm5
|
||||||
|
pi2fd mm1,mm1
|
||||||
|
|
||||||
|
pfmul mm5, MMWORD [MMBLOCK(5,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
pfmul mm1, MMWORD [MMBLOCK(7,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm0,mm5
|
||||||
|
pfadd mm2,mm1 ; mm2=z11
|
||||||
|
pfadd mm5,mm3 ; mm5=z13
|
||||||
|
pfsub mm4,mm1 ; mm4=z12
|
||||||
|
pfsub mm0,mm3 ; mm0=z10
|
||||||
|
|
||||||
|
movq mm1,mm2
|
||||||
|
pfsub mm2,mm5
|
||||||
|
pfadd mm1,mm5 ; mm1=tmp7
|
||||||
|
|
||||||
|
pfmul mm2,[GOTOFF(ebx,PD_1_414)] ; mm2=tmp11
|
||||||
|
|
||||||
|
movq mm3,mm0
|
||||||
|
pfadd mm0,mm4
|
||||||
|
pfmul mm0,[GOTOFF(ebx,PD_1_847)] ; mm0=z5
|
||||||
|
pfmul mm3,[GOTOFF(ebx,PD_2_613)] ; mm3=(z10 * 2.613125930)
|
||||||
|
pfmul mm4,[GOTOFF(ebx,PD_1_082)] ; mm4=(z12 * 1.082392200)
|
||||||
|
pfsubr mm3,mm0 ; mm3=tmp12
|
||||||
|
pfsub mm4,mm0 ; mm4=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pfsub mm3,mm1 ; mm3=tmp6
|
||||||
|
movq mm5,mm6
|
||||||
|
movq mm0,mm7
|
||||||
|
pfadd mm6,mm1 ; mm6=data0=(00 01)
|
||||||
|
pfadd mm7,mm3 ; mm7=data1=(10 11)
|
||||||
|
pfsub mm5,mm1 ; mm5=data7=(70 71)
|
||||||
|
pfsub mm0,mm3 ; mm0=data6=(60 61)
|
||||||
|
pfsub mm2,mm3 ; mm2=tmp5
|
||||||
|
|
||||||
|
movq mm1,mm6 ; transpose coefficients
|
||||||
|
punpckldq mm6,mm7 ; mm6=(00 10)
|
||||||
|
punpckhdq mm1,mm7 ; mm1=(01 11)
|
||||||
|
movq mm3,mm0 ; transpose coefficients
|
||||||
|
punpckldq mm0,mm5 ; mm0=(60 70)
|
||||||
|
punpckhdq mm3,mm5 ; mm3=(61 71)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(0,3,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(1,3,edi,SIZEOF_FAST_FLOAT)], mm3
|
||||||
|
|
||||||
|
movq mm7, MMWORD [wk(0)] ; mm7=tmp2
|
||||||
|
movq mm5, MMWORD [wk(1)] ; mm5=tmp3
|
||||||
|
|
||||||
|
pfadd mm4,mm2 ; mm4=tmp4
|
||||||
|
movq mm6,mm7
|
||||||
|
movq mm1,mm5
|
||||||
|
pfadd mm7,mm2 ; mm7=data2=(20 21)
|
||||||
|
pfadd mm5,mm4 ; mm5=data4=(40 41)
|
||||||
|
pfsub mm6,mm2 ; mm6=data5=(50 51)
|
||||||
|
pfsub mm1,mm4 ; mm1=data3=(30 31)
|
||||||
|
|
||||||
|
movq mm0,mm7 ; transpose coefficients
|
||||||
|
punpckldq mm7,mm1 ; mm7=(20 30)
|
||||||
|
punpckhdq mm0,mm1 ; mm0=(21 31)
|
||||||
|
movq mm3,mm5 ; transpose coefficients
|
||||||
|
punpckldq mm5,mm6 ; mm5=(40 50)
|
||||||
|
punpckhdq mm3,mm6 ; mm3=(41 51)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,2,edi,SIZEOF_FAST_FLOAT)], mm5
|
||||||
|
movq MMWORD [MMBLOCK(1,2,edi,SIZEOF_FAST_FLOAT)], mm3
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte 2*SIZEOF_JCOEF ; coef_block
|
||||||
|
add edx, byte 2*SIZEOF_FLOAT_MULT_TYPE ; quantptr
|
||||||
|
add edi, byte 2*DCTSIZE*SIZEOF_FAST_FLOAT ; wsptr
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; -- Prefetch the next coefficient block
|
||||||
|
|
||||||
|
prefetch [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 0*32]
|
||||||
|
prefetch [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 1*32]
|
||||||
|
prefetch [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 2*32]
|
||||||
|
prefetch [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 3*32]
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
lea esi, [workspace] ; FAST_FLOAT * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
mov ecx, DCTSIZE/2 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
pfsub mm0,mm2 ; mm0=tmp11
|
||||||
|
pfsub mm1,mm3
|
||||||
|
pfadd mm4,mm2 ; mm4=tmp10
|
||||||
|
pfadd mm5,mm3 ; mm5=tmp13
|
||||||
|
|
||||||
|
pfmul mm1,[GOTOFF(ebx,PD_1_414)]
|
||||||
|
pfsub mm1,mm5 ; mm1=tmp12
|
||||||
|
|
||||||
|
movq mm6,mm4
|
||||||
|
movq mm7,mm0
|
||||||
|
pfsub mm4,mm5 ; mm4=tmp3
|
||||||
|
pfsub mm0,mm1 ; mm0=tmp2
|
||||||
|
pfadd mm6,mm5 ; mm6=tmp0
|
||||||
|
pfadd mm7,mm1 ; mm7=tmp1
|
||||||
|
|
||||||
|
movq MMWORD [wk(1)], mm4 ; tmp3
|
||||||
|
movq MMWORD [wk(0)], mm0 ; tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm2, MMWORD [MMBLOCK(1,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(3,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(5,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(7,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm0,mm5
|
||||||
|
pfadd mm2,mm1 ; mm2=z11
|
||||||
|
pfadd mm5,mm3 ; mm5=z13
|
||||||
|
pfsub mm4,mm1 ; mm4=z12
|
||||||
|
pfsub mm0,mm3 ; mm0=z10
|
||||||
|
|
||||||
|
movq mm1,mm2
|
||||||
|
pfsub mm2,mm5
|
||||||
|
pfadd mm1,mm5 ; mm1=tmp7
|
||||||
|
|
||||||
|
pfmul mm2,[GOTOFF(ebx,PD_1_414)] ; mm2=tmp11
|
||||||
|
|
||||||
|
movq mm3,mm0
|
||||||
|
pfadd mm0,mm4
|
||||||
|
pfmul mm0,[GOTOFF(ebx,PD_1_847)] ; mm0=z5
|
||||||
|
pfmul mm3,[GOTOFF(ebx,PD_2_613)] ; mm3=(z10 * 2.613125930)
|
||||||
|
pfmul mm4,[GOTOFF(ebx,PD_1_082)] ; mm4=(z12 * 1.082392200)
|
||||||
|
pfsubr mm3,mm0 ; mm3=tmp12
|
||||||
|
pfsub mm4,mm0 ; mm4=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pfsub mm3,mm1 ; mm3=tmp6
|
||||||
|
movq mm5,mm6
|
||||||
|
movq mm0,mm7
|
||||||
|
pfadd mm6,mm1 ; mm6=data0=(00 10)
|
||||||
|
pfadd mm7,mm3 ; mm7=data1=(01 11)
|
||||||
|
pfsub mm5,mm1 ; mm5=data7=(07 17)
|
||||||
|
pfsub mm0,mm3 ; mm0=data6=(06 16)
|
||||||
|
pfsub mm2,mm3 ; mm2=tmp5
|
||||||
|
|
||||||
|
movq mm1,[GOTOFF(ebx,PD_RNDINT_MAGIC)] ; mm1=[PD_RNDINT_MAGIC]
|
||||||
|
pcmpeqd mm3,mm3
|
||||||
|
psrld mm3,WORD_BIT ; mm3={0xFFFF 0x0000 0xFFFF 0x0000}
|
||||||
|
|
||||||
|
pfadd mm6,mm1 ; mm6=roundint(data0/8)=(00 ** 10 **)
|
||||||
|
pfadd mm7,mm1 ; mm7=roundint(data1/8)=(01 ** 11 **)
|
||||||
|
pfadd mm0,mm1 ; mm0=roundint(data6/8)=(06 ** 16 **)
|
||||||
|
pfadd mm5,mm1 ; mm5=roundint(data7/8)=(07 ** 17 **)
|
||||||
|
|
||||||
|
pand mm6,mm3 ; mm6=(00 -- 10 --)
|
||||||
|
pslld mm7,WORD_BIT ; mm7=(-- 01 -- 11)
|
||||||
|
pand mm0,mm3 ; mm0=(06 -- 16 --)
|
||||||
|
pslld mm5,WORD_BIT ; mm5=(-- 07 -- 17)
|
||||||
|
por mm6,mm7 ; mm6=(00 01 10 11)
|
||||||
|
por mm0,mm5 ; mm0=(06 07 16 17)
|
||||||
|
|
||||||
|
movq mm1, MMWORD [wk(0)] ; mm1=tmp2
|
||||||
|
movq mm3, MMWORD [wk(1)] ; mm3=tmp3
|
||||||
|
|
||||||
|
pfadd mm4,mm2 ; mm4=tmp4
|
||||||
|
movq mm7,mm1
|
||||||
|
movq mm5,mm3
|
||||||
|
pfadd mm1,mm2 ; mm1=data2=(02 12)
|
||||||
|
pfadd mm3,mm4 ; mm3=data4=(04 14)
|
||||||
|
pfsub mm7,mm2 ; mm7=data5=(05 15)
|
||||||
|
pfsub mm5,mm4 ; mm5=data3=(03 13)
|
||||||
|
|
||||||
|
movq mm2,[GOTOFF(ebx,PD_RNDINT_MAGIC)] ; mm2=[PD_RNDINT_MAGIC]
|
||||||
|
pcmpeqd mm4,mm4
|
||||||
|
psrld mm4,WORD_BIT ; mm4={0xFFFF 0x0000 0xFFFF 0x0000}
|
||||||
|
|
||||||
|
pfadd mm3,mm2 ; mm3=roundint(data4/8)=(04 ** 14 **)
|
||||||
|
pfadd mm7,mm2 ; mm7=roundint(data5/8)=(05 ** 15 **)
|
||||||
|
pfadd mm1,mm2 ; mm1=roundint(data2/8)=(02 ** 12 **)
|
||||||
|
pfadd mm5,mm2 ; mm5=roundint(data3/8)=(03 ** 13 **)
|
||||||
|
|
||||||
|
pand mm3,mm4 ; mm3=(04 -- 14 --)
|
||||||
|
pslld mm7,WORD_BIT ; mm7=(-- 05 -- 15)
|
||||||
|
pand mm1,mm4 ; mm1=(02 -- 12 --)
|
||||||
|
pslld mm5,WORD_BIT ; mm5=(-- 03 -- 13)
|
||||||
|
por mm3,mm7 ; mm3=(04 05 14 15)
|
||||||
|
por mm1,mm5 ; mm1=(02 03 12 13)
|
||||||
|
|
||||||
|
movq mm2,[GOTOFF(ebx,PB_CENTERJSAMP)] ; mm2=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packsswb mm6,mm3 ; mm6=(00 01 10 11 04 05 14 15)
|
||||||
|
packsswb mm1,mm0 ; mm1=(02 03 12 13 06 07 16 17)
|
||||||
|
paddb mm6,mm2
|
||||||
|
paddb mm1,mm2
|
||||||
|
|
||||||
|
movq mm4,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd mm6,mm1 ; mm6=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(04 05 06 07 14 15 16 17)
|
||||||
|
|
||||||
|
movq mm7,mm6 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq mm6,mm4 ; mm6=(00 01 02 03 04 05 06 07)
|
||||||
|
punpckhdq mm7,mm4 ; mm7=(10 11 12 13 14 15 16 17)
|
||||||
|
|
||||||
|
pushpic ebx ; save GOT address
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm6
|
||||||
|
movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm7
|
||||||
|
|
||||||
|
poppic ebx ; restore GOT address
|
||||||
|
|
||||||
|
add esi, byte 2*SIZEOF_FAST_FLOAT ; wsptr
|
||||||
|
add edi, byte 2*SIZEOF_JSAMPROW
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
femms ; empty MMX/3DNow! state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_FLT_3DNOW_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
473
jidctflt.asm
Normal file
473
jidctflt.asm
Normal file
@@ -0,0 +1,473 @@
|
|||||||
|
;
|
||||||
|
; jidctflt.asm - floating-point IDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a floating-point implementation of the inverse DCT
|
||||||
|
; (Discrete Cosine Transform). The following code is based directly on
|
||||||
|
; the IJG's original jidctflt.c; see the jidctflt.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
%define ROTATOR_TYPE FP32 ; float
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_float)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_float):
|
||||||
|
|
||||||
|
F_1_414 dd 1.414213562373095048801689 ; 2*cos(PI*1/4)
|
||||||
|
F_1_847 dd 1.847759065022573512256366 ; 2*cos(PI*1/8)
|
||||||
|
F_1_082 dd 1.082392200292393968799446 ; 2*(cos(PI*1/8)-cos(PI*3/8))
|
||||||
|
F_2_613 dd 2.613125929752753055713286 ; 2*(cos(PI*1/8)+cos(PI*3/8))
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_float (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define tmp ebp-SIZEOF_FP64 ; double tmp
|
||||||
|
%define workspace tmp-DCTSIZE2*SIZEOF_FAST_FLOAT
|
||||||
|
; FAST_FLOAT workspace[DCTSIZE2]
|
||||||
|
%define rndint_magic workspace-SIZEOF_FP32
|
||||||
|
; float rndint_magic = 100663296.0F
|
||||||
|
%define gotptr rndint_magic-SIZEOF_POINTER ; void * gotptr
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_float)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_float):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push FP32 0x4CC00000 ; (float)(0x00C00000 << 3)
|
||||||
|
pushpic eax ; make a room for GOT address
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
movpic POINTER [gotptr], ebx ; save GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
lea edi, [workspace] ; FAST_FLOAT * wsptr
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
mov ax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
mov bx, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
mov ax, JCOEF [COL(4,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax,bx
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
fild JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
fst FAST_FLOAT [COL(0,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fst FAST_FLOAT [COL(1,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fst FAST_FLOAT [COL(2,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fst FAST_FLOAT [COL(3,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fst FAST_FLOAT [COL(4,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fst FAST_FLOAT [COL(5,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fst FAST_FLOAT [COL(6,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(7,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnDCT:
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
fild JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
fild JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
fild JCOEF [COL(4,esi,SIZEOF_JCOEF)]
|
||||||
|
fild JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
fxch st0,st3
|
||||||
|
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(2,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st2
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(6,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st1
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(4,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st3
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st1
|
||||||
|
|
||||||
|
fld st2 ; st2 = st2 + st0, st0 = st2 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st3,st0
|
||||||
|
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_414)]
|
||||||
|
|
||||||
|
fld st3 ; st1 = st1 + st3, st3 = st1 - st3
|
||||||
|
fsubr st0,st2
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st2,st0
|
||||||
|
|
||||||
|
fsub st0,st2
|
||||||
|
|
||||||
|
fld st1 ; st2 = st1 + st2, st1 = st1 - st2
|
||||||
|
fsub st0,st3
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st3,st0
|
||||||
|
fld st3 ; st0 = st3 + st0, st3 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
fild JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
fild JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
fild JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
fild JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
fxch st0,st3
|
||||||
|
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(1,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st2
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(7,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st1
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(3,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st6
|
||||||
|
fxch st3,st0
|
||||||
|
fmul FLOAT_MULT_TYPE [COL(5,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
fxch st0,st5
|
||||||
|
fstp FP64 [tmp]
|
||||||
|
|
||||||
|
fld st1 ; st1 = st1 + st0, st0 = st1 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st2,st0
|
||||||
|
fld st5 ; st4 = st4 + st5, st5 = st4 - st5
|
||||||
|
fsubr st0,st5
|
||||||
|
fxch st0,st6
|
||||||
|
faddp st5,st0
|
||||||
|
|
||||||
|
fld st1 ; st1 = st1 + st4, st4 = st1 - st4
|
||||||
|
fsub st0,st5
|
||||||
|
fxch st0,st5
|
||||||
|
faddp st2,st0
|
||||||
|
|
||||||
|
fld st5
|
||||||
|
fadd st0,st1
|
||||||
|
fxch st0,st5
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_414)]
|
||||||
|
fxch st0,st5
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_847)]
|
||||||
|
fxch st0,st6
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_2_613)]
|
||||||
|
fxch st0,st1
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_082)]
|
||||||
|
fxch st0,st6
|
||||||
|
fsubr st1,st0
|
||||||
|
fsubp st6,st0
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
fsub st0,st1
|
||||||
|
fld st2 ; st1 = st2 + st1, st2 = st2 - st1
|
||||||
|
fsub st0,st2
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st2,st0
|
||||||
|
fsub st4,st0
|
||||||
|
fld st3 ; st0 = st3 + st0, st3 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
fxch st0,st2
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [COL(7,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(0,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(1,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(6,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
fadd st1,st0
|
||||||
|
fld FP64 [tmp]
|
||||||
|
fld st1 ; st3 = st3 + st1, st1 = st3 - st1
|
||||||
|
fsubr st0,st4
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st4,st0
|
||||||
|
fld st0 ; st0 = st0 + st2, st2 = st0 - st2
|
||||||
|
fsub st0,st3
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
fxch st0,st3
|
||||||
|
|
||||||
|
fstp FAST_FLOAT [COL(2,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(5,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(3,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fstp FAST_FLOAT [COL(4,edi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte SIZEOF_JCOEF ; advance pointers to next column
|
||||||
|
add edx, byte SIZEOF_FLOAT_MULT_TYPE
|
||||||
|
add edi, byte SIZEOF_FAST_FLOAT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov edx, POINTER [cinfo(ebp)]
|
||||||
|
mov edx, POINTER [jdstruct_sample_range_limit(edx)]
|
||||||
|
sub edx, byte -CENTERJSAMPLE*SIZEOF_JSAMPLE ; JSAMPLE * range_limit
|
||||||
|
|
||||||
|
lea esi, [workspace] ; FAST_FLOAT * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
mov edi, JSAMPROW [edi] ; (JSAMPLE *)
|
||||||
|
add edi, JDIMENSION [output_col(ebp)] ; edi=outptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_ROW_TEST_FLOAT
|
||||||
|
mov eax, FAST_FLOAT [ROW(1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
add eax,eax ; shl eax,1 (shift out the sign bit)
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
mov eax, FAST_FLOAT [ROW(2,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
mov ebx, FAST_FLOAT [ROW(3,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
or eax, FAST_FLOAT [ROW(4,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
or ebx, FAST_FLOAT [ROW(5,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
or eax, FAST_FLOAT [ROW(6,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
or ebx, FAST_FLOAT [ROW(7,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
or eax,ebx
|
||||||
|
add eax,eax ; shl eax,1 (shift out the sign bit)
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
push eax
|
||||||
|
|
||||||
|
fld FAST_FLOAT [ROW(0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fadd FP32 [rndint_magic]
|
||||||
|
fstp FP32 [esp]
|
||||||
|
|
||||||
|
pop eax
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+4*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+5*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+6*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+7*SIZEOF_JSAMPLE], al
|
||||||
|
jmp near .nextrow
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.rowDCT:
|
||||||
|
movpic ebx, POINTER [gotptr] ; load GOT address
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
fld FAST_FLOAT [ROW(4,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(2,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(6,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
fld st2 ; st2 = st2 + st0, st0 = st2 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st3,st0
|
||||||
|
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_414)]
|
||||||
|
|
||||||
|
fld st3 ; st1 = st1 + st3, st3 = st1 - st3
|
||||||
|
fsubr st0,st2
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st2,st0
|
||||||
|
|
||||||
|
fsub st0,st2
|
||||||
|
|
||||||
|
fld st1 ; st2 = st1 + st2, st1 = st1 - st2
|
||||||
|
fsub st0,st3
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st3,st0
|
||||||
|
fld st3 ; st0 = st3 + st0, st3 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
fld FAST_FLOAT [ROW(3,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st3
|
||||||
|
fld FAST_FLOAT [ROW(1,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(7,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fld FAST_FLOAT [ROW(5,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
fxch st0,st5
|
||||||
|
fstp FP64 [tmp]
|
||||||
|
|
||||||
|
fld st1 ; st1 = st1 + st0, st0 = st1 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st1
|
||||||
|
faddp st2,st0
|
||||||
|
fld st5 ; st4 = st4 + st5, st5 = st4 - st5
|
||||||
|
fsubr st0,st5
|
||||||
|
fxch st0,st6
|
||||||
|
faddp st5,st0
|
||||||
|
|
||||||
|
fld st1 ; st1 = st1 + st4, st4 = st1 - st4
|
||||||
|
fsub st0,st5
|
||||||
|
fxch st0,st5
|
||||||
|
faddp st2,st0
|
||||||
|
|
||||||
|
fld st5
|
||||||
|
fadd st0,st1
|
||||||
|
fxch st0,st5
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_414)]
|
||||||
|
fxch st0,st5
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_847)]
|
||||||
|
fxch st0,st6
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_2_613)]
|
||||||
|
fxch st0,st1
|
||||||
|
fmul ROTATOR_TYPE [GOTOFF(ebx,F_1_082)]
|
||||||
|
fxch st0,st6
|
||||||
|
fsubr st1,st0
|
||||||
|
fsubp st6,st0
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
sub esp, byte DCTSIZE*SIZEOF_FP32
|
||||||
|
|
||||||
|
fsub st0,st1
|
||||||
|
fld st2 ; st1 = st2 + st1, st2 = st2 - st1
|
||||||
|
fsub st0,st2
|
||||||
|
fxch st0,st3
|
||||||
|
faddp st2,st0
|
||||||
|
fsub st4,st0
|
||||||
|
fld st3 ; st0 = st3 + st0, st3 = st3 - st0
|
||||||
|
fsub st0,st1
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
fld FP32 [rndint_magic]
|
||||||
|
|
||||||
|
fadd st4,st0
|
||||||
|
fadd st1,st0
|
||||||
|
fadd st2,st0
|
||||||
|
fadd st3,st0
|
||||||
|
|
||||||
|
fxch st0,st4
|
||||||
|
|
||||||
|
fstp FP32 [esp+6*SIZEOF_FP32]
|
||||||
|
fstp FP32 [esp+1*SIZEOF_FP32]
|
||||||
|
fstp FP32 [esp+0*SIZEOF_FP32]
|
||||||
|
fstp FP32 [esp+7*SIZEOF_FP32]
|
||||||
|
|
||||||
|
fxch st0,st1
|
||||||
|
|
||||||
|
fadd st2,st0
|
||||||
|
fld FP64 [tmp]
|
||||||
|
fld st1 ; st4 = st4 + st1, st1 = st4 - st1
|
||||||
|
fsubr st0,st5
|
||||||
|
fxch st0,st2
|
||||||
|
faddp st5,st0
|
||||||
|
fld st0 ; st0 = st0 + st3, st3 = st0 - st3
|
||||||
|
fsub st0,st4
|
||||||
|
fxch st0,st4
|
||||||
|
faddp st1,st0
|
||||||
|
|
||||||
|
fxch st0,st2
|
||||||
|
|
||||||
|
fadd st1,st0
|
||||||
|
fadd st2,st0
|
||||||
|
fadd st3,st0
|
||||||
|
faddp st4,st0
|
||||||
|
|
||||||
|
fstp FP32 [esp+5*SIZEOF_FP32]
|
||||||
|
fstp FP32 [esp+4*SIZEOF_FP32]
|
||||||
|
fstp FP32 [esp+3*SIZEOF_FP32]
|
||||||
|
fstp FP32 [esp+2*SIZEOF_FP32]
|
||||||
|
|
||||||
|
%assign i 0 ; i=0;
|
||||||
|
%rep 4 ; -- repeat 4 times ---
|
||||||
|
pop eax
|
||||||
|
pop ebx
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov bl, JSAMPLE [edx+ebx*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+(i+0)*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+(i+1)*SIZEOF_JSAMPLE], bl
|
||||||
|
%assign i i+2 ; i+=2;
|
||||||
|
%endrep ; -- repeat end ---
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop edi
|
||||||
|
add esi, byte DCTSIZE*SIZEOF_FAST_FLOAT
|
||||||
|
add edi, byte SIZEOF_JSAMPROW ; advance pointer to next row
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
464
jidctfst.asm
Normal file
464
jidctfst.asm
Normal file
@@ -0,0 +1,464 @@
|
|||||||
|
;
|
||||||
|
; jidctfst.asm - fast integer IDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a fast, not so accurate integer implementation of
|
||||||
|
; the inverse DCT (Discrete Cosine Transform). The following code is
|
||||||
|
; based directly on the IJG's original jidctfst.c; see the jidctfst.c
|
||||||
|
; for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_IFAST_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
; We can gain a little more speed, with a further compromise in accuracy,
|
||||||
|
; by omitting the addition in a descaling shift. This yields an
|
||||||
|
; incorrectly rounded result half the time...
|
||||||
|
;
|
||||||
|
%macro descale 2
|
||||||
|
%ifdef USE_ACCURATE_ROUNDING
|
||||||
|
%if (%2)<=7
|
||||||
|
add %1, byte (1<<((%2)-1)) ; add reg32,imm8
|
||||||
|
%else
|
||||||
|
add %1, (1<<((%2)-1)) ; add reg32,imm32
|
||||||
|
%endif
|
||||||
|
%endif
|
||||||
|
sar %1,%2
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 8
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%if IFAST_SCALE_BITS != PASS1_BITS
|
||||||
|
%error "'IFAST_SCALE_BITS' must be equal to 'PASS1_BITS'."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if CONST_BITS == 8
|
||||||
|
F_1_082 equ 277 ; FIX(1.082392200)
|
||||||
|
F_1_414 equ 362 ; FIX(1.414213562)
|
||||||
|
F_1_847 equ 473 ; FIX(1.847759065)
|
||||||
|
F_2_613 equ 669 ; FIX(2.613125930)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_1_082 equ DESCALE(1162209775,30-CONST_BITS) ; FIX(1.082392200)
|
||||||
|
F_1_414 equ DESCALE(1518500249,30-CONST_BITS) ; FIX(1.414213562)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_2_613 equ DESCALE(2805822602,30-CONST_BITS) ; FIX(2.613125930)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_ifast (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define range_limit ebp-SIZEOF_POINTER ; JSAMPLE * range_limit
|
||||||
|
%define ptr range_limit-SIZEOF_POINTER ; void * ptr
|
||||||
|
%define workspace ptr-DCTSIZE2*SIZEOF_INT
|
||||||
|
; int workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_ifast)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_ifast):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
lea edi, [workspace] ; int * wsptr
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
mov ax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
mov bx, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
mov ax, JCOEF [COL(4,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax,bx
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, IFAST_MULT_TYPE [COL(0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
cwde
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(2,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(3,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(4,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(5,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(6,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(7,edi,SIZEOF_INT)], eax
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnDCT:
|
||||||
|
push ecx ; ctr
|
||||||
|
push esi ; coef_block
|
||||||
|
push edx ; quantptr
|
||||||
|
|
||||||
|
mov POINTER [ptr], edi ; wsptr
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movsx eax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ecx, JCOEF [COL(4,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, IFAST_MULT_TYPE [COL(0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
imul cx, IFAST_MULT_TYPE [COL(4,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
movsx ebx, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx edi, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
imul bx, IFAST_MULT_TYPE [COL(2,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
imul di, IFAST_MULT_TYPE [COL(6,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
lea edx,[eax+ecx] ; edx=tmp10
|
||||||
|
sub eax,ecx ; eax=tmp11
|
||||||
|
|
||||||
|
lea ecx,[ebx+edi] ; ecx=tmp13
|
||||||
|
sub ebx,edi
|
||||||
|
imul ebx,(F_1_414)
|
||||||
|
descale ebx,CONST_BITS
|
||||||
|
sub ebx,ecx ; ebx=tmp12
|
||||||
|
|
||||||
|
lea edi,[edx+ecx] ; edi=tmp0
|
||||||
|
sub edx,ecx ; edx=tmp3
|
||||||
|
lea ecx,[eax+ebx] ; ecx=tmp1
|
||||||
|
sub eax,ebx ; eax=tmp2
|
||||||
|
|
||||||
|
push edx ; tmp3
|
||||||
|
push eax ; tmp2
|
||||||
|
push ecx ; tmp1
|
||||||
|
push edi ; tmp0
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov edx, POINTER [esp+16] ; quantptr
|
||||||
|
|
||||||
|
movsx eax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ebx, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, IFAST_MULT_TYPE [COL(1,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
imul bx, IFAST_MULT_TYPE [COL(7,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
movsx edi, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ecx, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
imul di, IFAST_MULT_TYPE [COL(5,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
imul cx, IFAST_MULT_TYPE [COL(3,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
lea esi,[eax+ebx] ; esi=z11
|
||||||
|
sub eax,ebx ; eax=z12
|
||||||
|
lea edx,[edi+ecx] ; edx=z13
|
||||||
|
sub edi,ecx ; edi=z10
|
||||||
|
|
||||||
|
lea ebx,[esi+edx] ; ebx=tmp7
|
||||||
|
sub esi,edx
|
||||||
|
imul esi,(F_1_414) ; esi=tmp11
|
||||||
|
descale esi,CONST_BITS
|
||||||
|
|
||||||
|
lea ecx,[edi+eax]
|
||||||
|
imul ecx,(F_1_847) ; ecx=z5
|
||||||
|
imul edi,(-F_2_613) ; edi=MULTIPLY(z10,-FIX_2_613125930)
|
||||||
|
imul eax,(F_1_082) ; eax=MULTIPLY(z12,FIX_1_082392200)
|
||||||
|
descale ecx,CONST_BITS
|
||||||
|
descale edi,CONST_BITS
|
||||||
|
descale eax,CONST_BITS
|
||||||
|
add edi,ecx ; edi=tmp12
|
||||||
|
sub eax,ecx ; eax=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
sub edi,ebx ; edi=tmp6
|
||||||
|
pop edx ; edx=tmp0
|
||||||
|
sub esi,edi ; esi=tmp5
|
||||||
|
pop ecx ; ecx=tmp1
|
||||||
|
add eax,esi ; eax=tmp4
|
||||||
|
push esi ; tmp5
|
||||||
|
push eax ; tmp4
|
||||||
|
|
||||||
|
lea eax,[edx+ebx] ; eax=data0(=tmp0+tmp7)
|
||||||
|
sub edx,ebx ; edx=data7(=tmp0-tmp7)
|
||||||
|
lea ebx,[ecx+edi] ; ebx=data1(=tmp1+tmp6)
|
||||||
|
sub ecx,edi ; ecx=data6(=tmp1-tmp6)
|
||||||
|
|
||||||
|
mov edi, POINTER [ptr] ; edi=wsptr
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(7,edi,SIZEOF_INT)], edx
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], ebx
|
||||||
|
mov INT [COL(6,edi,SIZEOF_INT)], ecx
|
||||||
|
|
||||||
|
pop esi ; esi=tmp4
|
||||||
|
pop eax ; eax=tmp5
|
||||||
|
pop edx ; edx=tmp2
|
||||||
|
pop ecx ; ecx=tmp3
|
||||||
|
|
||||||
|
lea ebx,[edx+eax] ; ebx=data2(=tmp2+tmp5)
|
||||||
|
sub edx,eax ; edx=data5(=tmp2-tmp5)
|
||||||
|
lea eax,[ecx+esi] ; eax=data4(=tmp3+tmp4)
|
||||||
|
sub ecx,esi ; ecx=data3(=tmp3-tmp4)
|
||||||
|
|
||||||
|
mov INT [COL(2,edi,SIZEOF_INT)], ebx
|
||||||
|
mov INT [COL(5,edi,SIZEOF_INT)], edx
|
||||||
|
mov INT [COL(4,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(3,edi,SIZEOF_INT)], ecx
|
||||||
|
|
||||||
|
pop edx ; quantptr
|
||||||
|
pop esi ; coef_block
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte SIZEOF_JCOEF ; advance pointers to next column
|
||||||
|
add edx, byte SIZEOF_IFAST_MULT_TYPE
|
||||||
|
add edi, byte SIZEOF_INT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, POINTER [jdstruct_sample_range_limit(eax)]
|
||||||
|
sub eax, byte -CENTERJSAMPLE*SIZEOF_JSAMPLE ; JSAMPLE * range_limit
|
||||||
|
mov POINTER [range_limit], eax
|
||||||
|
|
||||||
|
lea esi, [workspace] ; int * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
mov edi, JSAMPROW [edi] ; (JSAMPLE *)
|
||||||
|
add edi, JDIMENSION [output_col(ebp)] ; edi=outptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_ROW_TEST
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(2,esi,SIZEOF_INT)]
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
mov ebx, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov eax, INT [ROW(4,esi,SIZEOF_INT)]
|
||||||
|
or ebx, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(6,esi,SIZEOF_INT)]
|
||||||
|
or ebx, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
or eax,ebx
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
mov edx, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(PASS1_BITS+3)
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+4*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+5*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+6*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+7*SIZEOF_JSAMPLE], al
|
||||||
|
jmp near .nextrow
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.rowDCT:
|
||||||
|
push esi ; wsptr
|
||||||
|
push ecx ; ctr
|
||||||
|
|
||||||
|
mov POINTER [ptr], edi ; outptr
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(2,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(4,esi,SIZEOF_INT)]
|
||||||
|
mov edi, INT [ROW(6,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
lea edx,[eax+ecx] ; edx=tmp10
|
||||||
|
sub eax,ecx ; eax=tmp11
|
||||||
|
|
||||||
|
lea ecx,[ebx+edi] ; ecx=tmp13
|
||||||
|
sub ebx,edi
|
||||||
|
imul ebx,(F_1_414)
|
||||||
|
descale ebx,CONST_BITS
|
||||||
|
sub ebx,ecx ; ebx=tmp12
|
||||||
|
|
||||||
|
lea edi,[edx+ecx] ; edi=tmp0
|
||||||
|
sub edx,ecx ; edx=tmp3
|
||||||
|
lea ecx,[eax+ebx] ; ecx=tmp1
|
||||||
|
sub eax,ebx ; eax=tmp2
|
||||||
|
|
||||||
|
push edx ; tmp3
|
||||||
|
push eax ; tmp2
|
||||||
|
push ecx ; tmp1
|
||||||
|
push edi ; tmp0
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov edi, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
lea esi,[eax+ebx] ; esi=z11
|
||||||
|
sub eax,ebx ; eax=z12
|
||||||
|
lea edx,[edi+ecx] ; edx=z13
|
||||||
|
sub edi,ecx ; edi=z10
|
||||||
|
|
||||||
|
lea ebx,[esi+edx] ; ebx=tmp7
|
||||||
|
sub esi,edx
|
||||||
|
imul esi,(F_1_414) ; esi=tmp11
|
||||||
|
descale esi,CONST_BITS
|
||||||
|
|
||||||
|
lea ecx,[edi+eax]
|
||||||
|
imul ecx,(F_1_847) ; ecx=z5
|
||||||
|
imul edi,(-F_2_613) ; edi=MULTIPLY(z10,-FIX_2_613125930)
|
||||||
|
imul eax,(F_1_082) ; eax=MULTIPLY(z12,FIX_1_082392200)
|
||||||
|
descale ecx,CONST_BITS
|
||||||
|
descale edi,CONST_BITS
|
||||||
|
descale eax,CONST_BITS
|
||||||
|
add edi,ecx ; edi=tmp12
|
||||||
|
sub eax,ecx ; eax=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
sub edi,ebx ; edi=tmp6
|
||||||
|
pop edx ; edx=tmp0
|
||||||
|
sub esi,edi ; esi=tmp5
|
||||||
|
pop ecx ; ecx=tmp1
|
||||||
|
add eax,esi ; eax=tmp4
|
||||||
|
push esi ; tmp5
|
||||||
|
push eax ; tmp4
|
||||||
|
|
||||||
|
lea eax,[edx+ebx] ; eax=data0(=tmp0+tmp7)
|
||||||
|
sub edx,ebx ; edx=data7(=tmp0-tmp7)
|
||||||
|
lea ebx,[ecx+edi] ; ebx=data1(=tmp1+tmp6)
|
||||||
|
sub ecx,edi ; ecx=data6(=tmp1-tmp6)
|
||||||
|
|
||||||
|
mov esi, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(PASS1_BITS+3)
|
||||||
|
descale edx,(PASS1_BITS+3)
|
||||||
|
descale ebx,(PASS1_BITS+3)
|
||||||
|
descale ecx,(PASS1_BITS+3)
|
||||||
|
|
||||||
|
mov edi, POINTER [ptr] ; edi=outptr
|
||||||
|
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
and edx,RANGE_MASK
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
and ecx,RANGE_MASK
|
||||||
|
|
||||||
|
mov al, JSAMPLE [esi+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [esi+edx*SIZEOF_JSAMPLE]
|
||||||
|
mov bl, JSAMPLE [esi+ebx*SIZEOF_JSAMPLE]
|
||||||
|
mov cl, JSAMPLE [esi+ecx*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+7*SIZEOF_JSAMPLE], dl
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], bl
|
||||||
|
mov JSAMPLE [edi+6*SIZEOF_JSAMPLE], cl
|
||||||
|
|
||||||
|
pop esi ; esi=tmp4
|
||||||
|
pop eax ; eax=tmp5
|
||||||
|
pop edx ; edx=tmp2
|
||||||
|
pop ecx ; ecx=tmp3
|
||||||
|
|
||||||
|
lea ebx,[edx+eax] ; ebx=data2(=tmp2+tmp5)
|
||||||
|
sub edx,eax ; edx=data5(=tmp2-tmp5)
|
||||||
|
lea eax,[ecx+esi] ; eax=data4(=tmp3+tmp4)
|
||||||
|
sub ecx,esi ; ecx=data3(=tmp3-tmp4)
|
||||||
|
|
||||||
|
mov esi, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale ebx,(PASS1_BITS+3)
|
||||||
|
descale edx,(PASS1_BITS+3)
|
||||||
|
descale eax,(PASS1_BITS+3)
|
||||||
|
descale ecx,(PASS1_BITS+3)
|
||||||
|
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
and edx,RANGE_MASK
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
and ecx,RANGE_MASK
|
||||||
|
|
||||||
|
mov bl, JSAMPLE [esi+ebx*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [esi+edx*SIZEOF_JSAMPLE]
|
||||||
|
mov al, JSAMPLE [esi+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov cl, JSAMPLE [esi+ecx*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], bl
|
||||||
|
mov JSAMPLE [edi+5*SIZEOF_JSAMPLE], dl
|
||||||
|
mov JSAMPLE [edi+4*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], cl
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
pop esi ; wsptr
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop edi
|
||||||
|
add esi, byte DCTSIZE*SIZEOF_INT ; advance pointer to next row
|
||||||
|
add edi, byte SIZEOF_JSAMPROW
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_IFAST_SUPPORTED
|
||||||
524
jidctint.asm
Normal file
524
jidctint.asm
Normal file
@@ -0,0 +1,524 @@
|
|||||||
|
;
|
||||||
|
; jidctint.asm - accurate integer IDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a slow-but-accurate integer implementation of the
|
||||||
|
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
; Descale and correctly round a DWORD value that's scaled by N bits.
|
||||||
|
;
|
||||||
|
%macro descale 2
|
||||||
|
%if (%2)<=7
|
||||||
|
add %1, byte (1<<((%2)-1)) ; add reg32,imm8
|
||||||
|
%else
|
||||||
|
add %1, (1<<((%2)-1)) ; add reg32,imm32
|
||||||
|
%endif
|
||||||
|
sar %1,%2
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_298 equ 2446 ; FIX(0.298631336)
|
||||||
|
F_0_390 equ 3196 ; FIX(0.390180644)
|
||||||
|
F_0_541 equ 4433 ; FIX(0.541196100)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_175 equ 9633 ; FIX(1.175875602)
|
||||||
|
F_1_501 equ 12299 ; FIX(1.501321110)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_1_961 equ 16069 ; FIX(1.961570560)
|
||||||
|
F_2_053 equ 16819 ; FIX(2.053119869)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_072 equ 25172 ; FIX(3.072711026)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_298 equ DESCALE( 320652955,30-CONST_BITS) ; FIX(0.298631336)
|
||||||
|
F_0_390 equ DESCALE( 418953276,30-CONST_BITS) ; FIX(0.390180644)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_175 equ DESCALE(1262586813,30-CONST_BITS) ; FIX(1.175875602)
|
||||||
|
F_1_501 equ DESCALE(1612031267,30-CONST_BITS) ; FIX(1.501321110)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_1_961 equ DESCALE(2106220350,30-CONST_BITS) ; FIX(1.961570560)
|
||||||
|
F_2_053 equ DESCALE(2204520673,30-CONST_BITS) ; FIX(2.053119869)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_072 equ DESCALE(3299298341,30-CONST_BITS) ; FIX(3.072711026)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_islow (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define range_limit ebp-SIZEOF_POINTER ; JSAMPLE * range_limit
|
||||||
|
%define ptr range_limit-SIZEOF_POINTER ; void * ptr
|
||||||
|
%define workspace ptr-DCTSIZE2*SIZEOF_INT
|
||||||
|
; int workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_islow)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_islow):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
lea edi, [workspace] ; int * wsptr
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
mov ax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
mov bx, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
mov ax, JCOEF [COL(4,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax,bx
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
cwde
|
||||||
|
|
||||||
|
sal eax,PASS1_BITS
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(2,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(3,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(4,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(5,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(6,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(7,edi,SIZEOF_INT)], eax
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnDCT:
|
||||||
|
push ecx ; ctr
|
||||||
|
push esi ; coef_block
|
||||||
|
push edx ; quantptr
|
||||||
|
|
||||||
|
mov POINTER [ptr], edi ; wsptr
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movsx eax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ecx, JCOEF [COL(4,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul cx, ISLOW_MULT_TYPE [COL(4,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movsx ebx, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx edi, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
imul bx, ISLOW_MULT_TYPE [COL(2,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul di, ISLOW_MULT_TYPE [COL(6,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
lea edx,[eax+ecx]
|
||||||
|
sub eax,ecx
|
||||||
|
sal edx,CONST_BITS ; edx=tmp0
|
||||||
|
sal eax,CONST_BITS ; eax=tmp1
|
||||||
|
|
||||||
|
lea ecx,[ebx+edi]
|
||||||
|
imul ecx,(F_0_541) ; ecx=z1
|
||||||
|
imul ebx,(F_0_765) ; ebx=MULTIPLY(z2,FIX_0_765366865)
|
||||||
|
imul edi,(-F_1_847) ; edi=MULTIPLY(z3,-FIX_1_847759065)
|
||||||
|
add ebx,ecx ; ebx=tmp3
|
||||||
|
add edi,ecx ; edi=tmp2
|
||||||
|
|
||||||
|
lea ecx,[edx+ebx] ; ecx=tmp10
|
||||||
|
sub edx,ebx ; edx=tmp13
|
||||||
|
lea ebx,[eax+edi] ; ebx=tmp11
|
||||||
|
sub eax,edi ; eax=tmp12
|
||||||
|
|
||||||
|
push edx ; tmp13
|
||||||
|
push eax ; tmp12
|
||||||
|
push ebx ; tmp11
|
||||||
|
push ecx ; tmp10
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov edx, POINTER [esp+16] ; quantptr
|
||||||
|
|
||||||
|
movsx eax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx edi, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul di, ISLOW_MULT_TYPE [COL(3,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movsx ecx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ebx, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
imul cx, ISLOW_MULT_TYPE [COL(5,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul bx, ISLOW_MULT_TYPE [COL(7,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
push eax ; eax=tmp3
|
||||||
|
push edi ; edi=tmp2
|
||||||
|
push ecx ; ecx=tmp1
|
||||||
|
push ebx ; ebx=tmp0
|
||||||
|
|
||||||
|
lea esi,[ebx+edi] ; esi=z3
|
||||||
|
lea edx,[ecx+eax] ; edx=z4
|
||||||
|
add ebx,eax ; ebx=z1
|
||||||
|
add ecx,edi ; ecx=z2
|
||||||
|
|
||||||
|
lea eax,[esi+edx]
|
||||||
|
imul eax,(F_1_175) ; eax=z5
|
||||||
|
|
||||||
|
imul esi,(-F_1_961) ; esi=z3(=MULTIPLY(z3,-FIX_1_961570560))
|
||||||
|
imul edx,(-F_0_390) ; edx=z4(=MULTIPLY(z4,-FIX_0_390180644))
|
||||||
|
imul ebx,(-F_0_899) ; ebx=z1(=MULTIPLY(z1,-FIX_0_899976223))
|
||||||
|
imul ecx,(-F_2_562) ; ecx=z2(=MULTIPLY(z2,-FIX_2_562915447))
|
||||||
|
|
||||||
|
add esi,eax ; esi=z3(=z3+z5)
|
||||||
|
add edx,eax ; edx=z4(=z4+z5)
|
||||||
|
|
||||||
|
lea edi,[esi+ebx] ; edi=z1+z3
|
||||||
|
lea eax,[edx+ecx] ; eax=z2+z4
|
||||||
|
add esi,ecx ; esi=z2+z3
|
||||||
|
add edx,ebx ; edx=z1+z4
|
||||||
|
|
||||||
|
pop ecx ; ecx=tmp0
|
||||||
|
pop ebx ; ebx=tmp1
|
||||||
|
imul ecx,(F_0_298) ; ecx=tmp0(=MULTIPLY(tmp0,FIX_0_298631336))
|
||||||
|
imul ebx,(F_2_053) ; ebx=tmp1(=MULTIPLY(tmp1,FIX_2_053119869))
|
||||||
|
add edi,ecx ; edi=tmp0(=tmp0+z1+z3)
|
||||||
|
add eax,ebx ; eax=tmp1(=tmp1+z2+z4)
|
||||||
|
|
||||||
|
pop ecx ; ecx=tmp2
|
||||||
|
pop ebx ; ebx=tmp3
|
||||||
|
imul ecx,(F_3_072) ; ecx=tmp2(=MULTIPLY(tmp2,FIX_3_072711026))
|
||||||
|
imul ebx,(F_1_501) ; ebx=tmp3(=MULTIPLY(tmp3,FIX_1_501321110))
|
||||||
|
add esi,ecx ; esi=tmp2(=tmp2+z2+z3)
|
||||||
|
add edx,ebx ; edx=tmp3(=tmp3+z1+z4)
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pop ecx ; ecx=tmp10
|
||||||
|
pop ebx ; ebx=tmp11
|
||||||
|
push eax ; tmp1
|
||||||
|
push edi ; tmp0
|
||||||
|
|
||||||
|
lea eax,[ecx+edx] ; eax=data0(=tmp10+tmp3)
|
||||||
|
sub ecx,edx ; ecx=data7(=tmp10-tmp3)
|
||||||
|
lea edx,[ebx+esi] ; edx=data1(=tmp11+tmp2)
|
||||||
|
sub ebx,esi ; ebx=data6(=tmp11-tmp2)
|
||||||
|
|
||||||
|
mov edi, POINTER [ptr] ; edi=wsptr
|
||||||
|
|
||||||
|
descale eax,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale ecx,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale edx,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale ebx,(CONST_BITS-PASS1_BITS)
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(7,edi,SIZEOF_INT)], ecx
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], edx
|
||||||
|
mov INT [COL(6,edi,SIZEOF_INT)], ebx
|
||||||
|
|
||||||
|
pop esi ; esi=tmp0
|
||||||
|
pop eax ; eax=tmp1
|
||||||
|
pop ecx ; ecx=tmp12
|
||||||
|
pop edx ; edx=tmp13
|
||||||
|
|
||||||
|
lea ebx,[ecx+eax] ; ebx=data2(=tmp12+tmp1)
|
||||||
|
sub ecx,eax ; ecx=data5(=tmp12-tmp1)
|
||||||
|
lea eax,[edx+esi] ; eax=data3(=tmp13+tmp0)
|
||||||
|
sub edx,esi ; edx=data4(=tmp13-tmp0)
|
||||||
|
|
||||||
|
descale ebx,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale ecx,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale eax,(CONST_BITS-PASS1_BITS)
|
||||||
|
descale edx,(CONST_BITS-PASS1_BITS)
|
||||||
|
|
||||||
|
mov INT [COL(2,edi,SIZEOF_INT)], ebx
|
||||||
|
mov INT [COL(5,edi,SIZEOF_INT)], ecx
|
||||||
|
mov INT [COL(3,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(4,edi,SIZEOF_INT)], edx
|
||||||
|
|
||||||
|
pop edx ; quantptr
|
||||||
|
pop esi ; coef_block
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte SIZEOF_JCOEF ; advance pointers to next column
|
||||||
|
add edx, byte SIZEOF_ISLOW_MULT_TYPE
|
||||||
|
add edi, byte SIZEOF_INT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, POINTER [jdstruct_sample_range_limit(eax)]
|
||||||
|
sub eax, byte -CENTERJSAMPLE*SIZEOF_JSAMPLE ; JSAMPLE * range_limit
|
||||||
|
mov POINTER [range_limit], eax
|
||||||
|
|
||||||
|
lea esi, [workspace] ; int * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
mov edi, JSAMPROW [edi] ; (JSAMPLE *)
|
||||||
|
add edi, JDIMENSION [output_col(ebp)] ; edi=outptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_ROW_TEST
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(2,esi,SIZEOF_INT)]
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
mov ebx, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov eax, INT [ROW(4,esi,SIZEOF_INT)]
|
||||||
|
or ebx, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(6,esi,SIZEOF_INT)]
|
||||||
|
or ebx, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
or eax,ebx
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
mov edx, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(PASS1_BITS+3)
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+4*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+5*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+6*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+7*SIZEOF_JSAMPLE], al
|
||||||
|
jmp near .nextrow
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.rowDCT:
|
||||||
|
push esi ; wsptr
|
||||||
|
push ecx ; ctr
|
||||||
|
|
||||||
|
mov POINTER [ptr], edi ; outptr
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(2,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(4,esi,SIZEOF_INT)]
|
||||||
|
mov edi, INT [ROW(6,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
lea edx,[eax+ecx]
|
||||||
|
sub eax,ecx
|
||||||
|
sal edx,CONST_BITS ; edx=tmp0
|
||||||
|
sal eax,CONST_BITS ; eax=tmp1
|
||||||
|
|
||||||
|
lea ecx,[ebx+edi]
|
||||||
|
imul ecx,(F_0_541) ; ecx=z1
|
||||||
|
imul ebx,(F_0_765) ; ebx=MULTIPLY(z2,FIX_0_765366865)
|
||||||
|
imul edi,(-F_1_847) ; edi=MULTIPLY(z3,-FIX_1_847759065)
|
||||||
|
add ebx,ecx ; ebx=tmp3
|
||||||
|
add edi,ecx ; edi=tmp2
|
||||||
|
|
||||||
|
lea ecx,[edx+ebx] ; ecx=tmp10
|
||||||
|
sub edx,ebx ; edx=tmp13
|
||||||
|
lea ebx,[eax+edi] ; ebx=tmp11
|
||||||
|
sub eax,edi ; eax=tmp12
|
||||||
|
|
||||||
|
push edx ; tmp13
|
||||||
|
push eax ; tmp12
|
||||||
|
push ebx ; tmp11
|
||||||
|
push ecx ; tmp10
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
mov edi, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
push eax ; eax=tmp3
|
||||||
|
push edi ; edi=tmp2
|
||||||
|
push ecx ; ecx=tmp1
|
||||||
|
push ebx ; ebx=tmp0
|
||||||
|
|
||||||
|
lea esi,[ebx+edi] ; esi=z3
|
||||||
|
lea edx,[ecx+eax] ; edx=z4
|
||||||
|
add ebx,eax ; ebx=z1
|
||||||
|
add ecx,edi ; ecx=z2
|
||||||
|
|
||||||
|
lea eax,[esi+edx]
|
||||||
|
imul eax,(F_1_175) ; eax=z5
|
||||||
|
|
||||||
|
imul esi,(-F_1_961) ; esi=z3(=MULTIPLY(z3,-FIX_1_961570560))
|
||||||
|
imul edx,(-F_0_390) ; edx=z4(=MULTIPLY(z4,-FIX_0_390180644))
|
||||||
|
imul ebx,(-F_0_899) ; ebx=z1(=MULTIPLY(z1,-FIX_0_899976223))
|
||||||
|
imul ecx,(-F_2_562) ; ecx=z2(=MULTIPLY(z2,-FIX_2_562915447))
|
||||||
|
|
||||||
|
add esi,eax ; esi=z3(=z3+z5)
|
||||||
|
add edx,eax ; edx=z4(=z4+z5)
|
||||||
|
|
||||||
|
lea edi,[esi+ebx] ; edi=z1+z3
|
||||||
|
lea eax,[edx+ecx] ; eax=z2+z4
|
||||||
|
add esi,ecx ; esi=z2+z3
|
||||||
|
add edx,ebx ; edx=z1+z4
|
||||||
|
|
||||||
|
pop ecx ; ecx=tmp0
|
||||||
|
pop ebx ; ebx=tmp1
|
||||||
|
imul ecx,(F_0_298) ; ecx=tmp0(=MULTIPLY(tmp0,FIX_0_298631336))
|
||||||
|
imul ebx,(F_2_053) ; ebx=tmp1(=MULTIPLY(tmp1,FIX_2_053119869))
|
||||||
|
add edi,ecx ; edi=tmp0(=tmp0+z1+z3)
|
||||||
|
add eax,ebx ; eax=tmp1(=tmp1+z2+z4)
|
||||||
|
|
||||||
|
pop ecx ; ecx=tmp2
|
||||||
|
pop ebx ; ebx=tmp3
|
||||||
|
imul ecx,(F_3_072) ; ecx=tmp2(=MULTIPLY(tmp2,FIX_3_072711026))
|
||||||
|
imul ebx,(F_1_501) ; ebx=tmp3(=MULTIPLY(tmp3,FIX_1_501321110))
|
||||||
|
add esi,ecx ; esi=tmp2(=tmp2+z2+z3)
|
||||||
|
add edx,ebx ; edx=tmp3(=tmp3+z1+z4)
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pop ecx ; ecx=tmp10
|
||||||
|
pop ebx ; ebx=tmp11
|
||||||
|
push eax ; tmp1
|
||||||
|
push edi ; tmp0
|
||||||
|
|
||||||
|
lea eax,[ecx+edx] ; eax=data0(=tmp10+tmp3)
|
||||||
|
sub ecx,edx ; ecx=data7(=tmp10-tmp3)
|
||||||
|
lea edx,[ebx+esi] ; edx=data1(=tmp11+tmp2)
|
||||||
|
sub ebx,esi ; ebx=data6(=tmp11-tmp2)
|
||||||
|
|
||||||
|
mov esi, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
descale ecx,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
descale edx,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
descale ebx,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
|
||||||
|
mov edi, POINTER [ptr] ; edi=outptr
|
||||||
|
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
and ecx,RANGE_MASK
|
||||||
|
and edx,RANGE_MASK
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
|
||||||
|
mov al, JSAMPLE [esi+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov cl, JSAMPLE [esi+ecx*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [esi+edx*SIZEOF_JSAMPLE]
|
||||||
|
mov bl, JSAMPLE [esi+ebx*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+7*SIZEOF_JSAMPLE], cl
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], dl
|
||||||
|
mov JSAMPLE [edi+6*SIZEOF_JSAMPLE], bl
|
||||||
|
|
||||||
|
pop esi ; esi=tmp0
|
||||||
|
pop eax ; eax=tmp1
|
||||||
|
pop ecx ; ecx=tmp12
|
||||||
|
pop edx ; edx=tmp13
|
||||||
|
|
||||||
|
lea ebx,[ecx+eax] ; ebx=data2(=tmp12+tmp1)
|
||||||
|
sub ecx,eax ; ecx=data5(=tmp12-tmp1)
|
||||||
|
lea eax,[edx+esi] ; eax=data3(=tmp13+tmp0)
|
||||||
|
sub edx,esi ; edx=data4(=tmp13-tmp0)
|
||||||
|
|
||||||
|
mov esi, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale ebx,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
descale ecx,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
descale eax,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
descale edx,(CONST_BITS+PASS1_BITS+3)
|
||||||
|
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
and ecx,RANGE_MASK
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
and edx,RANGE_MASK
|
||||||
|
|
||||||
|
mov bl, JSAMPLE [esi+ebx*SIZEOF_JSAMPLE]
|
||||||
|
mov cl, JSAMPLE [esi+ecx*SIZEOF_JSAMPLE]
|
||||||
|
mov al, JSAMPLE [esi+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [esi+edx*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], bl
|
||||||
|
mov JSAMPLE [edi+5*SIZEOF_JSAMPLE], cl
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+4*SIZEOF_JSAMPLE], dl
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
pop esi ; wsptr
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop edi
|
||||||
|
add esi, byte DCTSIZE*SIZEOF_INT ; advance pointer to next row
|
||||||
|
add edi, byte SIZEOF_JSAMPROW
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; DCT_ISLOW_SUPPORTED
|
||||||
688
jidctred.asm
Normal file
688
jidctred.asm
Normal file
@@ -0,0 +1,688 @@
|
|||||||
|
;
|
||||||
|
; jidctred.asm - reduced-size IDCT (non-SIMD)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains inverse-DCT routines that produce reduced-size output:
|
||||||
|
; either 4x4, 2x2, or 1x1 pixels from an 8x8 DCT block.
|
||||||
|
; The following code is based directly on the IJG's original jidctred.c;
|
||||||
|
; see the jidctred.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : October 17, 2004
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
; Descale and correctly round a DWORD value that's scaled by N bits.
|
||||||
|
;
|
||||||
|
%macro descale 2
|
||||||
|
%if (%2)<=7
|
||||||
|
add %1, byte (1<<((%2)-1)) ; add reg32,imm8
|
||||||
|
%else
|
||||||
|
add %1, (1<<((%2)-1)) ; add reg32,imm32
|
||||||
|
%endif
|
||||||
|
sar %1,%2
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_211 equ 1730 ; FIX(0.211164243)
|
||||||
|
F_0_509 equ 4176 ; FIX(0.509795579)
|
||||||
|
F_0_601 equ 4926 ; FIX(0.601344887)
|
||||||
|
F_0_720 equ 5906 ; FIX(0.720959822)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_850 equ 6967 ; FIX(0.850430095)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_061 equ 8697 ; FIX(1.061594337)
|
||||||
|
F_1_272 equ 10426 ; FIX(1.272758580)
|
||||||
|
F_1_451 equ 11893 ; FIX(1.451774981)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_2_172 equ 17799 ; FIX(2.172734803)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_624 equ 29692 ; FIX(3.624509785)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_211 equ DESCALE( 226735879,30-CONST_BITS) ; FIX(0.211164243)
|
||||||
|
F_0_509 equ DESCALE( 547388834,30-CONST_BITS) ; FIX(0.509795579)
|
||||||
|
F_0_601 equ DESCALE( 645689155,30-CONST_BITS) ; FIX(0.601344887)
|
||||||
|
F_0_720 equ DESCALE( 774124714,30-CONST_BITS) ; FIX(0.720959822)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_850 equ DESCALE( 913142361,30-CONST_BITS) ; FIX(0.850430095)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_061 equ DESCALE(1139878239,30-CONST_BITS) ; FIX(1.061594337)
|
||||||
|
F_1_272 equ DESCALE(1366614119,30-CONST_BITS) ; FIX(1.272758580)
|
||||||
|
F_1_451 equ DESCALE(1558831516,30-CONST_BITS) ; FIX(1.451774981)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_2_172 equ DESCALE(2332956230,30-CONST_BITS) ; FIX(2.172734803)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_624 equ DESCALE(3891787747,30-CONST_BITS) ; FIX(3.624509785)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 4x4 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_4x4 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define range_limit ebp-SIZEOF_POINTER ; JSAMPLE * range_limit
|
||||||
|
%define workspace range_limit-(DCTSIZE*4)*SIZEOF_INT
|
||||||
|
; int workspace[DCTSIZE*4]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_4x4)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_4x4):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
lea edi, [workspace] ; int * wsptr
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
; Don't bother to process column 4, because second pass won't use it
|
||||||
|
cmp ecx, byte DCTSIZE-4
|
||||||
|
je near .nextcolumn
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
mov bx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
or bx, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax,bx
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero; we need not examine term 4 for 4x4 output
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
cwde
|
||||||
|
|
||||||
|
sal eax, PASS1_BITS
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(2,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(3,edi,SIZEOF_INT)], eax
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnDCT:
|
||||||
|
push ecx ; ctr
|
||||||
|
push esi ; coef_block
|
||||||
|
push edx ; quantptr
|
||||||
|
push edi ; wsptr
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movsx ebx, JCOEF [COL(2,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ecx, JCOEF [COL(6,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx eax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
imul bx, ISLOW_MULT_TYPE [COL(2,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul cx, ISLOW_MULT_TYPE [COL(6,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
imul ebx,(F_1_847) ; ebx=MULTIPLY(z2,FIX_1_847759065)
|
||||||
|
imul ecx,(-F_0_765) ; ecx=MULTIPLY(z3,-FIX_0_765366865)
|
||||||
|
sal eax,(CONST_BITS+1) ; eax=tmp0
|
||||||
|
add ecx,ebx ; ecx=tmp2
|
||||||
|
|
||||||
|
lea edi,[eax+ecx] ; edi=tmp10
|
||||||
|
sub eax,ecx ; eax=tmp12
|
||||||
|
|
||||||
|
push eax ; tmp12
|
||||||
|
push edi ; tmp10
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movsx edi, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ecx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
imul di, ISLOW_MULT_TYPE [COL(7,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul cx, ISLOW_MULT_TYPE [COL(5,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movsx ebx, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx eax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
imul bx, ISLOW_MULT_TYPE [COL(3,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
mov esi,edi ; esi=edi=z1
|
||||||
|
mov edx,ecx ; edx=ecx=z2
|
||||||
|
imul edi,(-F_0_211) ; edi=MULTIPLY(z1,-FIX_0_211164243)
|
||||||
|
imul ecx,(F_1_451) ; ecx=MULTIPLY(z2,FIX_1_451774981)
|
||||||
|
imul esi,(-F_0_509) ; esi=MULTIPLY(z1,-FIX_0_509795579)
|
||||||
|
imul edx,(-F_0_601) ; edx=MULTIPLY(z2,-FIX_0_601344887)
|
||||||
|
|
||||||
|
add edi,ecx ; edi=(tmp0)
|
||||||
|
add esi,edx ; esi=(tmp2)
|
||||||
|
|
||||||
|
mov ecx,ebx ; ecx=ebx=z3
|
||||||
|
mov edx,eax ; edx=eax=z4
|
||||||
|
imul ebx,(-F_2_172) ; ebx=MULTIPLY(z3,-FIX_2_172734803)
|
||||||
|
imul eax,(F_1_061) ; eax=MULTIPLY(z4,FIX_1_061594337)
|
||||||
|
imul ecx,(F_0_899) ; ecx=MULTIPLY(z3,FIX_0_899976223)
|
||||||
|
imul edx,(F_2_562) ; edx=MULTIPLY(z4,FIX_2_562915447)
|
||||||
|
|
||||||
|
add edi,ebx
|
||||||
|
add esi,ecx
|
||||||
|
add edi,eax ; edi=tmp0
|
||||||
|
add esi,edx ; esi=tmp2
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pop ebx ; ebx=tmp10
|
||||||
|
pop ecx ; ecx=tmp12
|
||||||
|
|
||||||
|
lea eax,[ebx+esi] ; eax=data0(=tmp10+tmp2)
|
||||||
|
sub ebx,esi ; ebx=data3(=tmp10-tmp2)
|
||||||
|
lea edx,[ecx+edi] ; edx=data1(=tmp12+tmp0)
|
||||||
|
sub ecx,edi ; ecx=data2(=tmp12-tmp0)
|
||||||
|
|
||||||
|
pop edi ; wsptr
|
||||||
|
|
||||||
|
descale eax,(CONST_BITS-PASS1_BITS+1)
|
||||||
|
descale ebx,(CONST_BITS-PASS1_BITS+1)
|
||||||
|
descale edx,(CONST_BITS-PASS1_BITS+1)
|
||||||
|
descale ecx,(CONST_BITS-PASS1_BITS+1)
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(3,edi,SIZEOF_INT)], ebx
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], edx
|
||||||
|
mov INT [COL(2,edi,SIZEOF_INT)], ecx
|
||||||
|
|
||||||
|
pop edx ; quantptr
|
||||||
|
pop esi ; coef_block
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte SIZEOF_JCOEF ; advance pointers to next column
|
||||||
|
add edx, byte SIZEOF_ISLOW_MULT_TYPE
|
||||||
|
add edi, byte SIZEOF_INT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process 4 rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, POINTER [jdstruct_sample_range_limit(eax)]
|
||||||
|
sub eax, byte -CENTERJSAMPLE*SIZEOF_JSAMPLE ; JSAMPLE * range_limit
|
||||||
|
mov POINTER [range_limit], eax
|
||||||
|
|
||||||
|
lea esi, [workspace] ; int * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov ecx, DCTSIZE/2 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
mov edi, JSAMPROW [edi] ; (JSAMPLE *)
|
||||||
|
add edi, JDIMENSION [output_col(ebp)] ; edi=outptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_ROW_TEST
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(2,esi,SIZEOF_INT)]
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
mov eax, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(6,esi,SIZEOF_INT)]
|
||||||
|
or ebx, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
or eax,ebx
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
mov edx, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(PASS1_BITS+3)
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], al
|
||||||
|
jmp near .nextrow
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.rowDCT:
|
||||||
|
push esi ; wsptr
|
||||||
|
push ecx ; ctr
|
||||||
|
push edi ; outptr
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(2,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(6,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
imul ebx,(F_1_847) ; ebx=MULTIPLY(z2,FIX_1_847759065)
|
||||||
|
imul ecx,(-F_0_765) ; ecx=MULTIPLY(z3,-FIX_0_765366865)
|
||||||
|
sal eax,(CONST_BITS+1) ; eax=tmp0
|
||||||
|
add ecx,ebx ; ecx=tmp2
|
||||||
|
|
||||||
|
lea edi,[eax+ecx] ; edi=tmp10
|
||||||
|
sub eax,ecx ; eax=tmp12
|
||||||
|
|
||||||
|
push eax ; tmp12
|
||||||
|
push edi ; tmp10
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
mov edi, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
mov esi,edi ; esi=edi=z1
|
||||||
|
mov edx,ecx ; edx=ecx=z2
|
||||||
|
imul edi,(-F_0_211) ; edi=MULTIPLY(z1,-FIX_0_211164243)
|
||||||
|
imul ecx,(F_1_451) ; ecx=MULTIPLY(z2,FIX_1_451774981)
|
||||||
|
imul esi,(-F_0_509) ; esi=MULTIPLY(z1,-FIX_0_509795579)
|
||||||
|
imul edx,(-F_0_601) ; edx=MULTIPLY(z2,-FIX_0_601344887)
|
||||||
|
|
||||||
|
add edi,ecx ; edi=(tmp0)
|
||||||
|
add esi,edx ; esi=(tmp2)
|
||||||
|
|
||||||
|
mov ecx,ebx ; ecx=ebx=z3
|
||||||
|
mov edx,eax ; edx=eax=z4
|
||||||
|
imul ebx,(-F_2_172) ; ebx=MULTIPLY(z3,-FIX_2_172734803)
|
||||||
|
imul eax,(F_1_061) ; eax=MULTIPLY(z4,FIX_1_061594337)
|
||||||
|
imul ecx,(F_0_899) ; ecx=MULTIPLY(z3,FIX_0_899976223)
|
||||||
|
imul edx,(F_2_562) ; edx=MULTIPLY(z4,FIX_2_562915447)
|
||||||
|
|
||||||
|
add edi,ebx
|
||||||
|
add esi,ecx
|
||||||
|
add edi,eax ; edi=tmp0
|
||||||
|
add esi,edx ; esi=tmp2
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pop ebx ; ebx=tmp10
|
||||||
|
pop ecx ; ecx=tmp12
|
||||||
|
|
||||||
|
lea eax,[ebx+esi] ; eax=data0(=tmp10+tmp2)
|
||||||
|
sub ebx,esi ; ebx=data3(=tmp10-tmp2)
|
||||||
|
lea edx,[ecx+edi] ; edx=data1(=tmp12+tmp0)
|
||||||
|
sub ecx,edi ; ecx=data2(=tmp12-tmp0)
|
||||||
|
|
||||||
|
mov esi, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(CONST_BITS+PASS1_BITS+3+1)
|
||||||
|
descale ebx,(CONST_BITS+PASS1_BITS+3+1)
|
||||||
|
descale edx,(CONST_BITS+PASS1_BITS+3+1)
|
||||||
|
descale ecx,(CONST_BITS+PASS1_BITS+3+1)
|
||||||
|
|
||||||
|
pop edi ; outptr
|
||||||
|
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
and edx,RANGE_MASK
|
||||||
|
and ecx,RANGE_MASK
|
||||||
|
|
||||||
|
mov al, JSAMPLE [esi+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov bl, JSAMPLE [esi+ebx*SIZEOF_JSAMPLE]
|
||||||
|
mov dl, JSAMPLE [esi+edx*SIZEOF_JSAMPLE]
|
||||||
|
mov cl, JSAMPLE [esi+ecx*SIZEOF_JSAMPLE]
|
||||||
|
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+3*SIZEOF_JSAMPLE], bl
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], dl
|
||||||
|
mov JSAMPLE [edi+2*SIZEOF_JSAMPLE], cl
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
pop esi ; wsptr
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop edi
|
||||||
|
add esi, byte DCTSIZE*SIZEOF_INT ; advance pointer to next row
|
||||||
|
add edi, byte SIZEOF_JSAMPROW
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 2x2 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_2x2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define range_limit ebp-SIZEOF_POINTER ; JSAMPLE * range_limit
|
||||||
|
%define workspace range_limit-(DCTSIZE*2)*SIZEOF_INT
|
||||||
|
; int workspace[DCTSIZE*2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_2x2)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_2x2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
lea edi, [workspace] ; int * wsptr
|
||||||
|
mov ecx, DCTSIZE ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
; Don't bother to process columns 2,4,6
|
||||||
|
test ecx, 0x09
|
||||||
|
jz near .nextcolumn
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
or ax, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero; we need not examine terms 2,4,6 for 2x2 output
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
cwde
|
||||||
|
|
||||||
|
sal eax, PASS1_BITS
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], eax
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], eax
|
||||||
|
jmp short .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
|
||||||
|
.columnDCT:
|
||||||
|
push ecx ; ctr
|
||||||
|
push edi ; wsptr
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movsx eax, JCOEF [COL(1,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx ebx, JCOEF [COL(3,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul bx, ISLOW_MULT_TYPE [COL(3,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movsx ecx, JCOEF [COL(5,esi,SIZEOF_JCOEF)]
|
||||||
|
movsx edi, JCOEF [COL(7,esi,SIZEOF_JCOEF)]
|
||||||
|
imul cx, ISLOW_MULT_TYPE [COL(5,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
imul di, ISLOW_MULT_TYPE [COL(7,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
imul eax,(F_3_624) ; eax=MULTIPLY(data1,FIX_3_624509785)
|
||||||
|
imul ebx,(-F_1_272) ; ebx=MULTIPLY(data3,-FIX_1_272758580)
|
||||||
|
imul ecx,(F_0_850) ; ecx=MULTIPLY(data5,FIX_0_850430095)
|
||||||
|
imul edi,(-F_0_720) ; edi=MULTIPLY(data7,-FIX_0_720959822)
|
||||||
|
|
||||||
|
add eax,ebx
|
||||||
|
add ecx,edi
|
||||||
|
add ecx,eax ; ecx=tmp0
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(0,esi,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
cwde
|
||||||
|
|
||||||
|
sal eax,(CONST_BITS+2) ; eax=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
pop edi ; wsptr
|
||||||
|
|
||||||
|
lea ebx,[eax+ecx] ; ebx=data0(=tmp10+tmp0)
|
||||||
|
sub eax,ecx ; eax=data1(=tmp10-tmp0)
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
descale ebx,(CONST_BITS-PASS1_BITS+2)
|
||||||
|
descale eax,(CONST_BITS-PASS1_BITS+2)
|
||||||
|
|
||||||
|
mov INT [COL(0,edi,SIZEOF_INT)], ebx
|
||||||
|
mov INT [COL(1,edi,SIZEOF_INT)], eax
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte SIZEOF_JCOEF ; advance pointers to next column
|
||||||
|
add edx, byte SIZEOF_ISLOW_MULT_TYPE
|
||||||
|
add edi, byte SIZEOF_INT
|
||||||
|
dec ecx
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process 2 rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, POINTER [cinfo(ebp)]
|
||||||
|
mov eax, POINTER [jdstruct_sample_range_limit(eax)]
|
||||||
|
sub eax, byte -CENTERJSAMPLE*SIZEOF_JSAMPLE ; JSAMPLE * range_limit
|
||||||
|
mov POINTER [range_limit], eax
|
||||||
|
|
||||||
|
lea esi, [workspace] ; int * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
push edi
|
||||||
|
mov edi, JSAMPROW [edi] ; (JSAMPLE *)
|
||||||
|
add edi, JDIMENSION [output_col(ebp)] ; edi=outptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_ROW_TEST
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
mov eax, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
or eax, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
jnz short .rowDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
mov edx, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
descale eax,(PASS1_BITS+3)
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], al
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], al
|
||||||
|
jmp short .nextrow
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.rowDCT:
|
||||||
|
push ecx ; ctr
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(1,esi,SIZEOF_INT)]
|
||||||
|
mov ebx, INT [ROW(3,esi,SIZEOF_INT)]
|
||||||
|
mov ecx, INT [ROW(5,esi,SIZEOF_INT)]
|
||||||
|
mov edx, INT [ROW(7,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
imul eax,(F_3_624) ; eax=MULTIPLY(data1,FIX_3_624509785)
|
||||||
|
imul ebx,(-F_1_272) ; ebx=MULTIPLY(data3,-FIX_1_272758580)
|
||||||
|
imul ecx,(F_0_850) ; ecx=MULTIPLY(data5,FIX_0_850430095)
|
||||||
|
imul edx,(-F_0_720) ; edx=MULTIPLY(data7,-FIX_0_720959822)
|
||||||
|
|
||||||
|
add eax,ebx
|
||||||
|
add ecx,edx
|
||||||
|
add ecx,eax ; ecx=tmp0
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
mov eax, INT [ROW(0,esi,SIZEOF_INT)]
|
||||||
|
|
||||||
|
sal eax,(CONST_BITS+2) ; eax=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
mov edx, POINTER [range_limit] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
lea ebx,[eax+ecx] ; ebx=data0(=tmp10+tmp0)
|
||||||
|
sub eax,ecx ; eax=data1(=tmp10-tmp0)
|
||||||
|
|
||||||
|
pop ecx ; ctr
|
||||||
|
|
||||||
|
descale ebx,(CONST_BITS+PASS1_BITS+3+2)
|
||||||
|
descale eax,(CONST_BITS+PASS1_BITS+3+2)
|
||||||
|
|
||||||
|
and ebx,RANGE_MASK
|
||||||
|
and eax,RANGE_MASK
|
||||||
|
mov bl, JSAMPLE [edx+ebx*SIZEOF_JSAMPLE]
|
||||||
|
mov al, JSAMPLE [edx+eax*SIZEOF_JSAMPLE]
|
||||||
|
mov JSAMPLE [edi+0*SIZEOF_JSAMPLE], bl
|
||||||
|
mov JSAMPLE [edi+1*SIZEOF_JSAMPLE], al
|
||||||
|
|
||||||
|
.nextrow:
|
||||||
|
pop edi
|
||||||
|
add esi, byte DCTSIZE*SIZEOF_INT ; advance pointer to next row
|
||||||
|
add edi, byte SIZEOF_JSAMPROW
|
||||||
|
dec ecx
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 1x1 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_1x1 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define ebp esp-4 ; use esp instead of ebp
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_1x1)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_1x1):
|
||||||
|
; push ebp
|
||||||
|
; mov ebp,esp
|
||||||
|
; push ebx ; unused
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
; push esi ; unused
|
||||||
|
; push edi ; unused
|
||||||
|
|
||||||
|
; We hardly need an inverse DCT routine for this: just take the
|
||||||
|
; average pixel value, which is one-eighth of the DC coefficient.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov ecx, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
|
||||||
|
mov ax, JCOEF [COL(0,ecx,SIZEOF_JCOEF)]
|
||||||
|
imul ax, ISLOW_MULT_TYPE [COL(0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
mov ecx, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov edx, JDIMENSION [output_col(ebp)]
|
||||||
|
mov ecx, JSAMPROW [ecx] ; (JSAMPLE *)
|
||||||
|
|
||||||
|
add ax, (1 << (3-1)) + (CENTERJSAMPLE << 3)
|
||||||
|
sar ax,3 ; descale
|
||||||
|
|
||||||
|
test ah,ah ; unsigned saturation
|
||||||
|
jz short .output
|
||||||
|
not ax
|
||||||
|
sar ax,15
|
||||||
|
alignx 16,3
|
||||||
|
.output:
|
||||||
|
mov JSAMPLE [ecx+edx*SIZEOF_JSAMPLE], al
|
||||||
|
|
||||||
|
; pop edi ; unused
|
||||||
|
; pop esi ; unused
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
; pop ebx ; unused
|
||||||
|
; pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; IDCT_SCALING_SUPPORTED
|
||||||
510
jimmxfst.asm
Normal file
510
jimmxfst.asm
Normal file
@@ -0,0 +1,510 @@
|
|||||||
|
;
|
||||||
|
; jimmxfst.asm - fast integer IDCT (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a fast, not so accurate integer implementation of
|
||||||
|
; the inverse DCT (Discrete Cosine Transform). The following code is
|
||||||
|
; based directly on the IJG's original jidctfst.c; see the jidctfst.c
|
||||||
|
; for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_IFAST_SUPPORTED
|
||||||
|
%ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 8 ; 14 is also OK.
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%if IFAST_SCALE_BITS != PASS1_BITS
|
||||||
|
%error "'IFAST_SCALE_BITS' must be equal to 'PASS1_BITS'."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if CONST_BITS == 8
|
||||||
|
F_1_082 equ 277 ; FIX(1.082392200)
|
||||||
|
F_1_414 equ 362 ; FIX(1.414213562)
|
||||||
|
F_1_847 equ 473 ; FIX(1.847759065)
|
||||||
|
F_2_613 equ 669 ; FIX(2.613125930)
|
||||||
|
F_1_613 equ (F_2_613 - 256) ; FIX(2.613125930) - FIX(1)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_1_082 equ DESCALE(1162209775,30-CONST_BITS) ; FIX(1.082392200)
|
||||||
|
F_1_414 equ DESCALE(1518500249,30-CONST_BITS) ; FIX(1.414213562)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_2_613 equ DESCALE(2805822602,30-CONST_BITS) ; FIX(2.613125930)
|
||||||
|
F_1_613 equ (F_2_613 - (1 << CONST_BITS)) ; FIX(2.613125930) - FIX(1)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
; PRE_MULTIPLY_SCALE_BITS <= 2 (to avoid overflow)
|
||||||
|
; CONST_BITS + CONST_SHIFT + PRE_MULTIPLY_SCALE_BITS == 16 (for pmulhw)
|
||||||
|
|
||||||
|
%define PRE_MULTIPLY_SCALE_BITS 2
|
||||||
|
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_ifast_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_ifast_mmx):
|
||||||
|
|
||||||
|
PW_F1414 times 4 dw F_1_414 << CONST_SHIFT
|
||||||
|
PW_F1847 times 4 dw F_1_847 << CONST_SHIFT
|
||||||
|
PW_MF1613 times 4 dw -F_1_613 << CONST_SHIFT
|
||||||
|
PW_F1082 times 4 dw F_1_082 << CONST_SHIFT
|
||||||
|
PB_CENTERJSAMP times 8 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_ifast_mmx (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
%define workspace wk(0)-DCTSIZE2*SIZEOF_JCOEF
|
||||||
|
; JCOEF workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_ifast_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_ifast_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
lea edi, [workspace] ; JCOEF * wsptr
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_IFAST_MMX
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1,mm0
|
||||||
|
packsswb mm1,mm1
|
||||||
|
movd eax,mm1
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm2,mm0 ; mm0=in0=(00 01 02 03)
|
||||||
|
punpcklwd mm0,mm0 ; mm0=(00 00 01 01)
|
||||||
|
punpckhwd mm2,mm2 ; mm2=(02 02 03 03)
|
||||||
|
|
||||||
|
movq mm1,mm0
|
||||||
|
punpckldq mm0,mm0 ; mm0=(00 00 00 00)
|
||||||
|
punpckhdq mm1,mm1 ; mm1=(01 01 01 01)
|
||||||
|
movq mm3,mm2
|
||||||
|
punpckldq mm2,mm2 ; mm2=(02 02 02 02)
|
||||||
|
punpckhdq mm3,mm3 ; mm3=(03 03 03 03)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(2,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm2, MMWORD [MMBLOCK(4,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(6,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
psubw mm0,mm2 ; mm0=tmp11
|
||||||
|
psubw mm1,mm3
|
||||||
|
paddw mm4,mm2 ; mm4=tmp10
|
||||||
|
paddw mm5,mm3 ; mm5=tmp13
|
||||||
|
|
||||||
|
psllw mm1,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm1,[GOTOFF(ebx,PW_F1414)]
|
||||||
|
psubw mm1,mm5 ; mm1=tmp12
|
||||||
|
|
||||||
|
movq mm6,mm4
|
||||||
|
movq mm7,mm0
|
||||||
|
psubw mm4,mm5 ; mm4=tmp3
|
||||||
|
psubw mm0,mm1 ; mm0=tmp2
|
||||||
|
paddw mm6,mm5 ; mm6=tmp0
|
||||||
|
paddw mm7,mm1 ; mm7=tmp1
|
||||||
|
|
||||||
|
movq MMWORD [wk(1)], mm4 ; wk(1)=tmp3
|
||||||
|
movq MMWORD [wk(0)], mm0 ; wk(0)=tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm2, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm2, MMWORD [MMBLOCK(1,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(3,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm5, MMWORD [MMBLOCK(5,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(7,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm0,mm5
|
||||||
|
psubw mm2,mm1 ; mm2=z12
|
||||||
|
psubw mm5,mm3 ; mm5=z10
|
||||||
|
paddw mm4,mm1 ; mm4=z11
|
||||||
|
paddw mm0,mm3 ; mm0=z13
|
||||||
|
|
||||||
|
movq mm1,mm5 ; mm1=z10(unscaled)
|
||||||
|
psllw mm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw mm5,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
movq mm3,mm4
|
||||||
|
psubw mm4,mm0
|
||||||
|
paddw mm3,mm0 ; mm3=tmp7
|
||||||
|
|
||||||
|
psllw mm4,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm4,[GOTOFF(ebx,PW_F1414)] ; mm4=tmp11
|
||||||
|
|
||||||
|
; To avoid overflow...
|
||||||
|
;
|
||||||
|
; (Original)
|
||||||
|
; tmp12 = -2.613125930 * z10 + z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp12 = (-1.613125930 - 1) * z10 + z5;
|
||||||
|
; = -1.613125930 * z10 - z10 + z5;
|
||||||
|
|
||||||
|
movq mm0,mm5
|
||||||
|
paddw mm5,mm2
|
||||||
|
pmulhw mm5,[GOTOFF(ebx,PW_F1847)] ; mm5=z5
|
||||||
|
pmulhw mm0,[GOTOFF(ebx,PW_MF1613)]
|
||||||
|
pmulhw mm2,[GOTOFF(ebx,PW_F1082)]
|
||||||
|
psubw mm0,mm1
|
||||||
|
psubw mm2,mm5 ; mm2=tmp10
|
||||||
|
paddw mm0,mm5 ; mm0=tmp12
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
psubw mm0,mm3 ; mm0=tmp6
|
||||||
|
movq mm1,mm6
|
||||||
|
movq mm5,mm7
|
||||||
|
paddw mm6,mm3 ; mm6=data0=(00 01 02 03)
|
||||||
|
paddw mm7,mm0 ; mm7=data1=(10 11 12 13)
|
||||||
|
psubw mm1,mm3 ; mm1=data7=(70 71 72 73)
|
||||||
|
psubw mm5,mm0 ; mm5=data6=(60 61 62 63)
|
||||||
|
psubw mm4,mm0 ; mm4=tmp5
|
||||||
|
|
||||||
|
movq mm3,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm6,mm7 ; mm6=(00 10 01 11)
|
||||||
|
punpckhwd mm3,mm7 ; mm3=(02 12 03 13)
|
||||||
|
movq mm0,mm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm5,mm1 ; mm5=(60 70 61 71)
|
||||||
|
punpckhwd mm0,mm1 ; mm0=(62 72 63 73)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [wk(0)] ; mm7=tmp2
|
||||||
|
movq mm1, MMWORD [wk(1)] ; mm1=tmp3
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm5 ; wk(0)=(60 70 61 71)
|
||||||
|
movq MMWORD [wk(1)], mm0 ; wk(1)=(62 72 63 73)
|
||||||
|
|
||||||
|
paddw mm2,mm4 ; mm2=tmp4
|
||||||
|
movq mm5,mm7
|
||||||
|
movq mm0,mm1
|
||||||
|
paddw mm7,mm4 ; mm7=data2=(20 21 22 23)
|
||||||
|
paddw mm1,mm2 ; mm1=data4=(40 41 42 43)
|
||||||
|
psubw mm5,mm4 ; mm5=data5=(50 51 52 53)
|
||||||
|
psubw mm0,mm2 ; mm0=data3=(30 31 32 33)
|
||||||
|
|
||||||
|
movq mm4,mm7 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm7,mm0 ; mm7=(20 30 21 31)
|
||||||
|
punpckhwd mm4,mm0 ; mm4=(22 32 23 33)
|
||||||
|
movq mm2,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm1,mm5 ; mm1=(40 50 41 51)
|
||||||
|
punpckhwd mm2,mm5 ; mm2=(42 52 43 53)
|
||||||
|
|
||||||
|
movq mm0,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm7 ; mm6=(00 10 20 30)
|
||||||
|
punpckhdq mm0,mm7 ; mm0=(01 11 21 31)
|
||||||
|
movq mm5,mm3 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm3,mm4 ; mm3=(02 12 22 32)
|
||||||
|
punpckhdq mm5,mm4 ; mm5=(03 13 23 33)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [wk(0)] ; mm7=(60 70 61 71)
|
||||||
|
movq mm4, MMWORD [wk(1)] ; mm4=(62 72 63 73)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm5
|
||||||
|
|
||||||
|
movq mm6,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm1,mm7 ; mm1=(40 50 60 70)
|
||||||
|
punpckhdq mm6,mm7 ; mm6=(41 51 61 71)
|
||||||
|
movq mm0,mm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm2,mm4 ; mm2=(42 52 62 72)
|
||||||
|
punpckhdq mm0,mm4 ; mm0=(43 53 63 73)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte 4*SIZEOF_JCOEF ; coef_block
|
||||||
|
add edx, byte 4*SIZEOF_IFAST_MULT_TYPE ; quantptr
|
||||||
|
add edi, byte 4*DCTSIZE*SIZEOF_JCOEF ; wsptr
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
lea esi, [workspace] ; JCOEF * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm1
|
||||||
|
psubw mm0,mm2 ; mm0=tmp11
|
||||||
|
psubw mm1,mm3
|
||||||
|
paddw mm4,mm2 ; mm4=tmp10
|
||||||
|
paddw mm5,mm3 ; mm5=tmp13
|
||||||
|
|
||||||
|
psllw mm1,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm1,[GOTOFF(ebx,PW_F1414)]
|
||||||
|
psubw mm1,mm5 ; mm1=tmp12
|
||||||
|
|
||||||
|
movq mm6,mm4
|
||||||
|
movq mm7,mm0
|
||||||
|
psubw mm4,mm5 ; mm4=tmp3
|
||||||
|
psubw mm0,mm1 ; mm0=tmp2
|
||||||
|
paddw mm6,mm5 ; mm6=tmp0
|
||||||
|
paddw mm7,mm1 ; mm7=tmp1
|
||||||
|
|
||||||
|
movq MMWORD [wk(1)], mm4 ; wk(1)=tmp3
|
||||||
|
movq MMWORD [wk(0)], mm0 ; wk(0)=tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm2, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm0,mm5
|
||||||
|
psubw mm2,mm1 ; mm2=z12
|
||||||
|
psubw mm5,mm3 ; mm5=z10
|
||||||
|
paddw mm4,mm1 ; mm4=z11
|
||||||
|
paddw mm0,mm3 ; mm0=z13
|
||||||
|
|
||||||
|
movq mm1,mm5 ; mm1=z10(unscaled)
|
||||||
|
psllw mm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw mm5,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
movq mm3,mm4
|
||||||
|
psubw mm4,mm0
|
||||||
|
paddw mm3,mm0 ; mm3=tmp7
|
||||||
|
|
||||||
|
psllw mm4,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw mm4,[GOTOFF(ebx,PW_F1414)] ; mm4=tmp11
|
||||||
|
|
||||||
|
; To avoid overflow...
|
||||||
|
;
|
||||||
|
; (Original)
|
||||||
|
; tmp12 = -2.613125930 * z10 + z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp12 = (-1.613125930 - 1) * z10 + z5;
|
||||||
|
; = -1.613125930 * z10 - z10 + z5;
|
||||||
|
|
||||||
|
movq mm0,mm5
|
||||||
|
paddw mm5,mm2
|
||||||
|
pmulhw mm5,[GOTOFF(ebx,PW_F1847)] ; mm5=z5
|
||||||
|
pmulhw mm0,[GOTOFF(ebx,PW_MF1613)]
|
||||||
|
pmulhw mm2,[GOTOFF(ebx,PW_F1082)]
|
||||||
|
psubw mm0,mm1
|
||||||
|
psubw mm2,mm5 ; mm2=tmp10
|
||||||
|
paddw mm0,mm5 ; mm0=tmp12
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
psubw mm0,mm3 ; mm0=tmp6
|
||||||
|
movq mm1,mm6
|
||||||
|
movq mm5,mm7
|
||||||
|
paddw mm6,mm3 ; mm6=data0=(00 10 20 30)
|
||||||
|
paddw mm7,mm0 ; mm7=data1=(01 11 21 31)
|
||||||
|
psraw mm6,(PASS1_BITS+3) ; descale
|
||||||
|
psraw mm7,(PASS1_BITS+3) ; descale
|
||||||
|
psubw mm1,mm3 ; mm1=data7=(07 17 27 37)
|
||||||
|
psubw mm5,mm0 ; mm5=data6=(06 16 26 36)
|
||||||
|
psraw mm1,(PASS1_BITS+3) ; descale
|
||||||
|
psraw mm5,(PASS1_BITS+3) ; descale
|
||||||
|
psubw mm4,mm0 ; mm4=tmp5
|
||||||
|
|
||||||
|
packsswb mm6,mm5 ; mm6=(00 10 20 30 06 16 26 36)
|
||||||
|
packsswb mm7,mm1 ; mm7=(01 11 21 31 07 17 27 37)
|
||||||
|
|
||||||
|
movq mm3, MMWORD [wk(0)] ; mm3=tmp2
|
||||||
|
movq mm0, MMWORD [wk(1)] ; mm0=tmp3
|
||||||
|
|
||||||
|
paddw mm2,mm4 ; mm2=tmp4
|
||||||
|
movq mm5,mm3
|
||||||
|
movq mm1,mm0
|
||||||
|
paddw mm3,mm4 ; mm3=data2=(02 12 22 32)
|
||||||
|
paddw mm0,mm2 ; mm0=data4=(04 14 24 34)
|
||||||
|
psraw mm3,(PASS1_BITS+3) ; descale
|
||||||
|
psraw mm0,(PASS1_BITS+3) ; descale
|
||||||
|
psubw mm5,mm4 ; mm5=data5=(05 15 25 35)
|
||||||
|
psubw mm1,mm2 ; mm1=data3=(03 13 23 33)
|
||||||
|
psraw mm5,(PASS1_BITS+3) ; descale
|
||||||
|
psraw mm1,(PASS1_BITS+3) ; descale
|
||||||
|
|
||||||
|
movq mm4,[GOTOFF(ebx,PB_CENTERJSAMP)] ; mm4=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packsswb mm3,mm0 ; mm3=(02 12 22 32 04 14 24 34)
|
||||||
|
packsswb mm1,mm5 ; mm1=(03 13 23 33 05 15 25 35)
|
||||||
|
|
||||||
|
paddb mm6,mm4
|
||||||
|
paddb mm7,mm4
|
||||||
|
paddb mm3,mm4
|
||||||
|
paddb mm1,mm4
|
||||||
|
|
||||||
|
movq mm2,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw mm6,mm7 ; mm6=(00 01 10 11 20 21 30 31)
|
||||||
|
punpckhbw mm2,mm7 ; mm2=(06 07 16 17 26 27 36 37)
|
||||||
|
movq mm0,mm3 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw mm3,mm1 ; mm3=(02 03 12 13 22 23 32 33)
|
||||||
|
punpckhbw mm0,mm1 ; mm0=(04 05 14 15 24 25 34 35)
|
||||||
|
|
||||||
|
movq mm5,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd mm6,mm3 ; mm6=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhwd mm5,mm3 ; mm5=(20 21 22 23 30 31 32 33)
|
||||||
|
movq mm4,mm0 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd mm0,mm2 ; mm0=(04 05 06 07 14 15 16 17)
|
||||||
|
punpckhwd mm4,mm2 ; mm4=(24 25 26 27 34 35 36 37)
|
||||||
|
|
||||||
|
movq mm7,mm6 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq mm6,mm0 ; mm6=(00 01 02 03 04 05 06 07)
|
||||||
|
punpckhdq mm7,mm0 ; mm7=(10 11 12 13 14 15 16 17)
|
||||||
|
movq mm1,mm5 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq mm5,mm4 ; mm5=(20 21 22 23 24 25 26 27)
|
||||||
|
punpckhdq mm1,mm4 ; mm1=(30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
pushpic ebx ; save GOT address
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm6
|
||||||
|
movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm7
|
||||||
|
mov edx, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm5
|
||||||
|
movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm1
|
||||||
|
|
||||||
|
poppic ebx ; restore GOT address
|
||||||
|
|
||||||
|
add esi, byte 4*SIZEOF_JCOEF ; wsptr
|
||||||
|
add edi, byte 4*SIZEOF_JSAMPROW
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_INT_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_IFAST_SUPPORTED
|
||||||
862
jimmxint.asm
Normal file
862
jimmxint.asm
Normal file
@@ -0,0 +1,862 @@
|
|||||||
|
;
|
||||||
|
; jimmxint.asm - accurate integer IDCT (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a slow-but-accurate integer implementation of the
|
||||||
|
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
%ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%define DESCALE_P1 (CONST_BITS-PASS1_BITS)
|
||||||
|
%define DESCALE_P2 (CONST_BITS+PASS1_BITS+3)
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_298 equ 2446 ; FIX(0.298631336)
|
||||||
|
F_0_390 equ 3196 ; FIX(0.390180644)
|
||||||
|
F_0_541 equ 4433 ; FIX(0.541196100)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_175 equ 9633 ; FIX(1.175875602)
|
||||||
|
F_1_501 equ 12299 ; FIX(1.501321110)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_1_961 equ 16069 ; FIX(1.961570560)
|
||||||
|
F_2_053 equ 16819 ; FIX(2.053119869)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_072 equ 25172 ; FIX(3.072711026)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_298 equ DESCALE( 320652955,30-CONST_BITS) ; FIX(0.298631336)
|
||||||
|
F_0_390 equ DESCALE( 418953276,30-CONST_BITS) ; FIX(0.390180644)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_175 equ DESCALE(1262586813,30-CONST_BITS) ; FIX(1.175875602)
|
||||||
|
F_1_501 equ DESCALE(1612031267,30-CONST_BITS) ; FIX(1.501321110)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_1_961 equ DESCALE(2106220350,30-CONST_BITS) ; FIX(1.961570560)
|
||||||
|
F_2_053 equ DESCALE(2204520673,30-CONST_BITS) ; FIX(2.053119869)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_072 equ DESCALE(3299298341,30-CONST_BITS) ; FIX(3.072711026)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_islow_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_islow_mmx):
|
||||||
|
|
||||||
|
PW_F130_F054 times 2 dw (F_0_541+F_0_765), F_0_541
|
||||||
|
PW_F054_MF130 times 2 dw F_0_541, (F_0_541-F_1_847)
|
||||||
|
PW_MF078_F117 times 2 dw (F_1_175-F_1_961), F_1_175
|
||||||
|
PW_F117_F078 times 2 dw F_1_175, (F_1_175-F_0_390)
|
||||||
|
PW_MF060_MF089 times 2 dw (F_0_298-F_0_899),-F_0_899
|
||||||
|
PW_MF089_F060 times 2 dw -F_0_899, (F_1_501-F_0_899)
|
||||||
|
PW_MF050_MF256 times 2 dw (F_2_053-F_2_562),-F_2_562
|
||||||
|
PW_MF256_F050 times 2 dw -F_2_562, (F_3_072-F_2_562)
|
||||||
|
PD_DESCALE_P1 times 2 dd 1 << (DESCALE_P1-1)
|
||||||
|
PD_DESCALE_P2 times 2 dd 1 << (DESCALE_P2-1)
|
||||||
|
PB_CENTERJSAMP times 8 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_islow_mmx (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 12
|
||||||
|
%define workspace wk(0)-DCTSIZE2*SIZEOF_JCOEF
|
||||||
|
; JCOEF workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_islow_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_islow_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
lea edi, [workspace] ; JCOEF * wsptr
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_ISLOW_MMX
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1,mm0
|
||||||
|
packsswb mm1,mm1
|
||||||
|
movd eax,mm1
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
psllw mm0,PASS1_BITS
|
||||||
|
|
||||||
|
movq mm2,mm0 ; mm0=in0=(00 01 02 03)
|
||||||
|
punpcklwd mm0,mm0 ; mm0=(00 00 01 01)
|
||||||
|
punpckhwd mm2,mm2 ; mm2=(02 02 03 03)
|
||||||
|
|
||||||
|
movq mm1,mm0
|
||||||
|
punpckldq mm0,mm0 ; mm0=(00 00 00 00)
|
||||||
|
punpckhdq mm1,mm1 ; mm1=(01 01 01 01)
|
||||||
|
movq mm3,mm2
|
||||||
|
punpckldq mm2,mm2 ; mm2=(02 02 02 02)
|
||||||
|
punpckhdq mm3,mm3 ; mm3=(03 03 03 03)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(2,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm2, MMWORD [MMBLOCK(4,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(6,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (z2 + z3) * 0.541196100;
|
||||||
|
; tmp2 = z1 + z3 * -1.847759065;
|
||||||
|
; tmp3 = z1 + z2 * 0.765366865;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp2 = z2 * 0.541196100 + z3 * (0.541196100 - 1.847759065);
|
||||||
|
; tmp3 = z2 * (0.541196100 + 0.765366865) + z3 * 0.541196100;
|
||||||
|
|
||||||
|
movq mm4,mm1 ; mm1=in2=z2
|
||||||
|
movq mm5,mm1
|
||||||
|
punpcklwd mm4,mm3 ; mm3=in6=z3
|
||||||
|
punpckhwd mm5,mm3
|
||||||
|
movq mm1,mm4
|
||||||
|
movq mm3,mm5
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F130_F054)] ; mm4=tmp3L
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F130_F054)] ; mm5=tmp3H
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_F054_MF130)] ; mm1=tmp2L
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F054_MF130)] ; mm3=tmp2H
|
||||||
|
|
||||||
|
movq mm6,mm0
|
||||||
|
paddw mm0,mm2 ; mm0=in0+in4
|
||||||
|
psubw mm6,mm2 ; mm6=in0-in4
|
||||||
|
|
||||||
|
pxor mm7,mm7
|
||||||
|
pxor mm2,mm2
|
||||||
|
punpcklwd mm7,mm0 ; mm7=tmp0L
|
||||||
|
punpckhwd mm2,mm0 ; mm2=tmp0H
|
||||||
|
psrad mm7,(16-CONST_BITS) ; psrad mm7,16 & pslld mm7,CONST_BITS
|
||||||
|
psrad mm2,(16-CONST_BITS) ; psrad mm2,16 & pslld mm2,CONST_BITS
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
paddd mm7,mm4 ; mm7=tmp10L
|
||||||
|
psubd mm0,mm4 ; mm0=tmp13L
|
||||||
|
movq mm4,mm2
|
||||||
|
paddd mm2,mm5 ; mm2=tmp10H
|
||||||
|
psubd mm4,mm5 ; mm4=tmp13H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm7 ; wk(0)=tmp10L
|
||||||
|
movq MMWORD [wk(1)], mm2 ; wk(1)=tmp10H
|
||||||
|
movq MMWORD [wk(2)], mm0 ; wk(2)=tmp13L
|
||||||
|
movq MMWORD [wk(3)], mm4 ; wk(3)=tmp13H
|
||||||
|
|
||||||
|
pxor mm5,mm5
|
||||||
|
pxor mm7,mm7
|
||||||
|
punpcklwd mm5,mm6 ; mm5=tmp1L
|
||||||
|
punpckhwd mm7,mm6 ; mm7=tmp1H
|
||||||
|
psrad mm5,(16-CONST_BITS) ; psrad mm5,16 & pslld mm5,CONST_BITS
|
||||||
|
psrad mm7,(16-CONST_BITS) ; psrad mm7,16 & pslld mm7,CONST_BITS
|
||||||
|
|
||||||
|
movq mm2,mm5
|
||||||
|
paddd mm5,mm1 ; mm5=tmp11L
|
||||||
|
psubd mm2,mm1 ; mm2=tmp12L
|
||||||
|
movq mm0,mm7
|
||||||
|
paddd mm7,mm3 ; mm7=tmp11H
|
||||||
|
psubd mm0,mm3 ; mm0=tmp12H
|
||||||
|
|
||||||
|
movq MMWORD [wk(4)], mm5 ; wk(4)=tmp11L
|
||||||
|
movq MMWORD [wk(5)], mm7 ; wk(5)=tmp11H
|
||||||
|
movq MMWORD [wk(6)], mm2 ; wk(6)=tmp12L
|
||||||
|
movq MMWORD [wk(7)], mm0 ; wk(7)=tmp12H
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm4, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm6, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm4, MMWORD [MMBLOCK(1,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm6, MMWORD [MMBLOCK(3,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(5,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(7,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm5,mm6
|
||||||
|
movq mm7,mm4
|
||||||
|
paddw mm5,mm3 ; mm5=z3
|
||||||
|
paddw mm7,mm1 ; mm7=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movq mm2,mm5
|
||||||
|
movq mm0,mm5
|
||||||
|
punpcklwd mm2,mm7
|
||||||
|
punpckhwd mm0,mm7
|
||||||
|
movq mm5,mm2
|
||||||
|
movq mm7,mm0
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF078_F117)] ; mm2=z3L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_MF078_F117)] ; mm0=z3H
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F117_F078)] ; mm5=z4L
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_F117_F078)] ; mm7=z4H
|
||||||
|
|
||||||
|
movq MMWORD [wk(10)], mm2 ; wk(10)=z3L
|
||||||
|
movq MMWORD [wk(11)], mm0 ; wk(11)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp0 + tmp3; z2 = tmp1 + tmp2;
|
||||||
|
; tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869;
|
||||||
|
; tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; tmp0 += z1 + z3; tmp1 += z2 + z4;
|
||||||
|
; tmp2 += z2 + z3; tmp3 += z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223;
|
||||||
|
; tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447;
|
||||||
|
; tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223);
|
||||||
|
; tmp0 += z3; tmp1 += z4;
|
||||||
|
; tmp2 += z3; tmp3 += z4;
|
||||||
|
|
||||||
|
movq mm2,mm3
|
||||||
|
movq mm0,mm3
|
||||||
|
punpcklwd mm2,mm4
|
||||||
|
punpckhwd mm0,mm4
|
||||||
|
movq mm3,mm2
|
||||||
|
movq mm4,mm0
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF060_MF089)] ; mm2=tmp0L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_MF060_MF089)] ; mm0=tmp0H
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_MF089_F060)] ; mm3=tmp3L
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_MF089_F060)] ; mm4=tmp3H
|
||||||
|
|
||||||
|
paddd mm2, MMWORD [wk(10)] ; mm2=tmp0L
|
||||||
|
paddd mm0, MMWORD [wk(11)] ; mm0=tmp0H
|
||||||
|
paddd mm3,mm5 ; mm3=tmp3L
|
||||||
|
paddd mm4,mm7 ; mm4=tmp3H
|
||||||
|
|
||||||
|
movq MMWORD [wk(8)], mm2 ; wk(8)=tmp0L
|
||||||
|
movq MMWORD [wk(9)], mm0 ; wk(9)=tmp0H
|
||||||
|
|
||||||
|
movq mm2,mm1
|
||||||
|
movq mm0,mm1
|
||||||
|
punpcklwd mm2,mm6
|
||||||
|
punpckhwd mm0,mm6
|
||||||
|
movq mm1,mm2
|
||||||
|
movq mm6,mm0
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF050_MF256)] ; mm2=tmp1L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_MF050_MF256)] ; mm0=tmp1H
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF256_F050)] ; mm1=tmp2L
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_MF256_F050)] ; mm6=tmp2H
|
||||||
|
|
||||||
|
paddd mm2,mm5 ; mm2=tmp1L
|
||||||
|
paddd mm0,mm7 ; mm0=tmp1H
|
||||||
|
paddd mm1, MMWORD [wk(10)] ; mm1=tmp2L
|
||||||
|
paddd mm6, MMWORD [wk(11)] ; mm6=tmp2H
|
||||||
|
|
||||||
|
movq MMWORD [wk(10)], mm2 ; wk(10)=tmp1L
|
||||||
|
movq MMWORD [wk(11)], mm0 ; wk(11)=tmp1H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movq mm5, MMWORD [wk(0)] ; mm5=tmp10L
|
||||||
|
movq mm7, MMWORD [wk(1)] ; mm7=tmp10H
|
||||||
|
|
||||||
|
movq mm2,mm5
|
||||||
|
movq mm0,mm7
|
||||||
|
paddd mm5,mm3 ; mm5=data0L
|
||||||
|
paddd mm7,mm4 ; mm7=data0H
|
||||||
|
psubd mm2,mm3 ; mm2=data7L
|
||||||
|
psubd mm0,mm4 ; mm0=data7H
|
||||||
|
|
||||||
|
movq mm3,[GOTOFF(ebx,PD_DESCALE_P1)] ; mm3=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd mm5,mm3
|
||||||
|
paddd mm7,mm3
|
||||||
|
psrad mm5,DESCALE_P1
|
||||||
|
psrad mm7,DESCALE_P1
|
||||||
|
paddd mm2,mm3
|
||||||
|
paddd mm0,mm3
|
||||||
|
psrad mm2,DESCALE_P1
|
||||||
|
psrad mm0,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm5,mm7 ; mm5=data0=(00 01 02 03)
|
||||||
|
packssdw mm2,mm0 ; mm2=data7=(70 71 72 73)
|
||||||
|
|
||||||
|
movq mm4, MMWORD [wk(4)] ; mm4=tmp11L
|
||||||
|
movq mm3, MMWORD [wk(5)] ; mm3=tmp11H
|
||||||
|
|
||||||
|
movq mm7,mm4
|
||||||
|
movq mm0,mm3
|
||||||
|
paddd mm4,mm1 ; mm4=data1L
|
||||||
|
paddd mm3,mm6 ; mm3=data1H
|
||||||
|
psubd mm7,mm1 ; mm7=data6L
|
||||||
|
psubd mm0,mm6 ; mm0=data6H
|
||||||
|
|
||||||
|
movq mm1,[GOTOFF(ebx,PD_DESCALE_P1)] ; mm1=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd mm4,mm1
|
||||||
|
paddd mm3,mm1
|
||||||
|
psrad mm4,DESCALE_P1
|
||||||
|
psrad mm3,DESCALE_P1
|
||||||
|
paddd mm7,mm1
|
||||||
|
paddd mm0,mm1
|
||||||
|
psrad mm7,DESCALE_P1
|
||||||
|
psrad mm0,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm4,mm3 ; mm4=data1=(10 11 12 13)
|
||||||
|
packssdw mm7,mm0 ; mm7=data6=(60 61 62 63)
|
||||||
|
|
||||||
|
movq mm6,mm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm5,mm4 ; mm5=(00 10 01 11)
|
||||||
|
punpckhwd mm6,mm4 ; mm6=(02 12 03 13)
|
||||||
|
movq mm1,mm7 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm7,mm2 ; mm7=(60 70 61 71)
|
||||||
|
punpckhwd mm1,mm2 ; mm1=(62 72 63 73)
|
||||||
|
|
||||||
|
movq mm3, MMWORD [wk(6)] ; mm3=tmp12L
|
||||||
|
movq mm0, MMWORD [wk(7)] ; mm0=tmp12H
|
||||||
|
movq mm4, MMWORD [wk(10)] ; mm4=tmp1L
|
||||||
|
movq mm2, MMWORD [wk(11)] ; mm2=tmp1H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm5 ; wk(0)=(00 10 01 11)
|
||||||
|
movq MMWORD [wk(1)], mm6 ; wk(1)=(02 12 03 13)
|
||||||
|
movq MMWORD [wk(4)], mm7 ; wk(4)=(60 70 61 71)
|
||||||
|
movq MMWORD [wk(5)], mm1 ; wk(5)=(62 72 63 73)
|
||||||
|
|
||||||
|
movq mm5,mm3
|
||||||
|
movq mm6,mm0
|
||||||
|
paddd mm3,mm4 ; mm3=data2L
|
||||||
|
paddd mm0,mm2 ; mm0=data2H
|
||||||
|
psubd mm5,mm4 ; mm5=data5L
|
||||||
|
psubd mm6,mm2 ; mm6=data5H
|
||||||
|
|
||||||
|
movq mm7,[GOTOFF(ebx,PD_DESCALE_P1)] ; mm7=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd mm3,mm7
|
||||||
|
paddd mm0,mm7
|
||||||
|
psrad mm3,DESCALE_P1
|
||||||
|
psrad mm0,DESCALE_P1
|
||||||
|
paddd mm5,mm7
|
||||||
|
paddd mm6,mm7
|
||||||
|
psrad mm5,DESCALE_P1
|
||||||
|
psrad mm6,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm3,mm0 ; mm3=data2=(20 21 22 23)
|
||||||
|
packssdw mm5,mm6 ; mm5=data5=(50 51 52 53)
|
||||||
|
|
||||||
|
movq mm1, MMWORD [wk(2)] ; mm1=tmp13L
|
||||||
|
movq mm4, MMWORD [wk(3)] ; mm4=tmp13H
|
||||||
|
movq mm2, MMWORD [wk(8)] ; mm2=tmp0L
|
||||||
|
movq mm7, MMWORD [wk(9)] ; mm7=tmp0H
|
||||||
|
|
||||||
|
movq mm0,mm1
|
||||||
|
movq mm6,mm4
|
||||||
|
paddd mm1,mm2 ; mm1=data3L
|
||||||
|
paddd mm4,mm7 ; mm4=data3H
|
||||||
|
psubd mm0,mm2 ; mm0=data4L
|
||||||
|
psubd mm6,mm7 ; mm6=data4H
|
||||||
|
|
||||||
|
movq mm2,[GOTOFF(ebx,PD_DESCALE_P1)] ; mm2=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd mm1,mm2
|
||||||
|
paddd mm4,mm2
|
||||||
|
psrad mm1,DESCALE_P1
|
||||||
|
psrad mm4,DESCALE_P1
|
||||||
|
paddd mm0,mm2
|
||||||
|
paddd mm6,mm2
|
||||||
|
psrad mm0,DESCALE_P1
|
||||||
|
psrad mm6,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw mm1,mm4 ; mm1=data3=(30 31 32 33)
|
||||||
|
packssdw mm0,mm6 ; mm0=data4=(40 41 42 43)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [wk(0)] ; mm7=(00 10 01 11)
|
||||||
|
movq mm2, MMWORD [wk(1)] ; mm2=(02 12 03 13)
|
||||||
|
|
||||||
|
movq mm4,mm3 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm3,mm1 ; mm3=(20 30 21 31)
|
||||||
|
punpckhwd mm4,mm1 ; mm4=(22 32 23 33)
|
||||||
|
movq mm6,mm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm0,mm5 ; mm0=(40 50 41 51)
|
||||||
|
punpckhwd mm6,mm5 ; mm6=(42 52 43 53)
|
||||||
|
|
||||||
|
movq mm1,mm7 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm7,mm3 ; mm7=(00 10 20 30)
|
||||||
|
punpckhdq mm1,mm3 ; mm1=(01 11 21 31)
|
||||||
|
movq mm5,mm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm2,mm4 ; mm2=(02 12 22 32)
|
||||||
|
punpckhdq mm5,mm4 ; mm5=(03 13 23 33)
|
||||||
|
|
||||||
|
movq mm3, MMWORD [wk(4)] ; mm3=(60 70 61 71)
|
||||||
|
movq mm4, MMWORD [wk(5)] ; mm4=(62 72 63 73)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm5
|
||||||
|
|
||||||
|
movq mm7,mm0 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm0,mm3 ; mm0=(40 50 60 70)
|
||||||
|
punpckhdq mm7,mm3 ; mm7=(41 51 61 71)
|
||||||
|
movq mm1,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm4 ; mm6=(42 52 62 72)
|
||||||
|
punpckhdq mm1,mm4 ; mm1=(43 53 63 73)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm7
|
||||||
|
movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_JCOEF)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte 4*SIZEOF_JCOEF ; coef_block
|
||||||
|
add edx, byte 4*SIZEOF_ISLOW_MULT_TYPE ; quantptr
|
||||||
|
add edi, byte 4*DCTSIZE*SIZEOF_JCOEF ; wsptr
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
lea esi, [workspace] ; JCOEF * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (z2 + z3) * 0.541196100;
|
||||||
|
; tmp2 = z1 + z3 * -1.847759065;
|
||||||
|
; tmp3 = z1 + z2 * 0.765366865;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp2 = z2 * 0.541196100 + z3 * (0.541196100 - 1.847759065);
|
||||||
|
; tmp3 = z2 * (0.541196100 + 0.765366865) + z3 * 0.541196100;
|
||||||
|
|
||||||
|
movq mm4,mm1 ; mm1=in2=z2
|
||||||
|
movq mm5,mm1
|
||||||
|
punpcklwd mm4,mm3 ; mm3=in6=z3
|
||||||
|
punpckhwd mm5,mm3
|
||||||
|
movq mm1,mm4
|
||||||
|
movq mm3,mm5
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F130_F054)] ; mm4=tmp3L
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F130_F054)] ; mm5=tmp3H
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_F054_MF130)] ; mm1=tmp2L
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F054_MF130)] ; mm3=tmp2H
|
||||||
|
|
||||||
|
movq mm6,mm0
|
||||||
|
paddw mm0,mm2 ; mm0=in0+in4
|
||||||
|
psubw mm6,mm2 ; mm6=in0-in4
|
||||||
|
|
||||||
|
pxor mm7,mm7
|
||||||
|
pxor mm2,mm2
|
||||||
|
punpcklwd mm7,mm0 ; mm7=tmp0L
|
||||||
|
punpckhwd mm2,mm0 ; mm2=tmp0H
|
||||||
|
psrad mm7,(16-CONST_BITS) ; psrad mm7,16 & pslld mm7,CONST_BITS
|
||||||
|
psrad mm2,(16-CONST_BITS) ; psrad mm2,16 & pslld mm2,CONST_BITS
|
||||||
|
|
||||||
|
movq mm0,mm7
|
||||||
|
paddd mm7,mm4 ; mm7=tmp10L
|
||||||
|
psubd mm0,mm4 ; mm0=tmp13L
|
||||||
|
movq mm4,mm2
|
||||||
|
paddd mm2,mm5 ; mm2=tmp10H
|
||||||
|
psubd mm4,mm5 ; mm4=tmp13H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm7 ; wk(0)=tmp10L
|
||||||
|
movq MMWORD [wk(1)], mm2 ; wk(1)=tmp10H
|
||||||
|
movq MMWORD [wk(2)], mm0 ; wk(2)=tmp13L
|
||||||
|
movq MMWORD [wk(3)], mm4 ; wk(3)=tmp13H
|
||||||
|
|
||||||
|
pxor mm5,mm5
|
||||||
|
pxor mm7,mm7
|
||||||
|
punpcklwd mm5,mm6 ; mm5=tmp1L
|
||||||
|
punpckhwd mm7,mm6 ; mm7=tmp1H
|
||||||
|
psrad mm5,(16-CONST_BITS) ; psrad mm5,16 & pslld mm5,CONST_BITS
|
||||||
|
psrad mm7,(16-CONST_BITS) ; psrad mm7,16 & pslld mm7,CONST_BITS
|
||||||
|
|
||||||
|
movq mm2,mm5
|
||||||
|
paddd mm5,mm1 ; mm5=tmp11L
|
||||||
|
psubd mm2,mm1 ; mm2=tmp12L
|
||||||
|
movq mm0,mm7
|
||||||
|
paddd mm7,mm3 ; mm7=tmp11H
|
||||||
|
psubd mm0,mm3 ; mm0=tmp12H
|
||||||
|
|
||||||
|
movq MMWORD [wk(4)], mm5 ; wk(4)=tmp11L
|
||||||
|
movq MMWORD [wk(5)], mm7 ; wk(5)=tmp11H
|
||||||
|
movq MMWORD [wk(6)], mm2 ; wk(6)=tmp12L
|
||||||
|
movq MMWORD [wk(7)], mm0 ; wk(7)=tmp12H
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm4, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm6, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
movq mm5,mm6
|
||||||
|
movq mm7,mm4
|
||||||
|
paddw mm5,mm3 ; mm5=z3
|
||||||
|
paddw mm7,mm1 ; mm7=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movq mm2,mm5
|
||||||
|
movq mm0,mm5
|
||||||
|
punpcklwd mm2,mm7
|
||||||
|
punpckhwd mm0,mm7
|
||||||
|
movq mm5,mm2
|
||||||
|
movq mm7,mm0
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF078_F117)] ; mm2=z3L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_MF078_F117)] ; mm0=z3H
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F117_F078)] ; mm5=z4L
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_F117_F078)] ; mm7=z4H
|
||||||
|
|
||||||
|
movq MMWORD [wk(10)], mm2 ; wk(10)=z3L
|
||||||
|
movq MMWORD [wk(11)], mm0 ; wk(11)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp0 + tmp3; z2 = tmp1 + tmp2;
|
||||||
|
; tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869;
|
||||||
|
; tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; tmp0 += z1 + z3; tmp1 += z2 + z4;
|
||||||
|
; tmp2 += z2 + z3; tmp3 += z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223;
|
||||||
|
; tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447;
|
||||||
|
; tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223);
|
||||||
|
; tmp0 += z3; tmp1 += z4;
|
||||||
|
; tmp2 += z3; tmp3 += z4;
|
||||||
|
|
||||||
|
movq mm2,mm3
|
||||||
|
movq mm0,mm3
|
||||||
|
punpcklwd mm2,mm4
|
||||||
|
punpckhwd mm0,mm4
|
||||||
|
movq mm3,mm2
|
||||||
|
movq mm4,mm0
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF060_MF089)] ; mm2=tmp0L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_MF060_MF089)] ; mm0=tmp0H
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_MF089_F060)] ; mm3=tmp3L
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_MF089_F060)] ; mm4=tmp3H
|
||||||
|
|
||||||
|
paddd mm2, MMWORD [wk(10)] ; mm2=tmp0L
|
||||||
|
paddd mm0, MMWORD [wk(11)] ; mm0=tmp0H
|
||||||
|
paddd mm3,mm5 ; mm3=tmp3L
|
||||||
|
paddd mm4,mm7 ; mm4=tmp3H
|
||||||
|
|
||||||
|
movq MMWORD [wk(8)], mm2 ; wk(8)=tmp0L
|
||||||
|
movq MMWORD [wk(9)], mm0 ; wk(9)=tmp0H
|
||||||
|
|
||||||
|
movq mm2,mm1
|
||||||
|
movq mm0,mm1
|
||||||
|
punpcklwd mm2,mm6
|
||||||
|
punpckhwd mm0,mm6
|
||||||
|
movq mm1,mm2
|
||||||
|
movq mm6,mm0
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_MF050_MF256)] ; mm2=tmp1L
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_MF050_MF256)] ; mm0=tmp1H
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_MF256_F050)] ; mm1=tmp2L
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_MF256_F050)] ; mm6=tmp2H
|
||||||
|
|
||||||
|
paddd mm2,mm5 ; mm2=tmp1L
|
||||||
|
paddd mm0,mm7 ; mm0=tmp1H
|
||||||
|
paddd mm1, MMWORD [wk(10)] ; mm1=tmp2L
|
||||||
|
paddd mm6, MMWORD [wk(11)] ; mm6=tmp2H
|
||||||
|
|
||||||
|
movq MMWORD [wk(10)], mm2 ; wk(10)=tmp1L
|
||||||
|
movq MMWORD [wk(11)], mm0 ; wk(11)=tmp1H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movq mm5, MMWORD [wk(0)] ; mm5=tmp10L
|
||||||
|
movq mm7, MMWORD [wk(1)] ; mm7=tmp10H
|
||||||
|
|
||||||
|
movq mm2,mm5
|
||||||
|
movq mm0,mm7
|
||||||
|
paddd mm5,mm3 ; mm5=data0L
|
||||||
|
paddd mm7,mm4 ; mm7=data0H
|
||||||
|
psubd mm2,mm3 ; mm2=data7L
|
||||||
|
psubd mm0,mm4 ; mm0=data7H
|
||||||
|
|
||||||
|
movq mm3,[GOTOFF(ebx,PD_DESCALE_P2)] ; mm3=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd mm5,mm3
|
||||||
|
paddd mm7,mm3
|
||||||
|
psrad mm5,DESCALE_P2
|
||||||
|
psrad mm7,DESCALE_P2
|
||||||
|
paddd mm2,mm3
|
||||||
|
paddd mm0,mm3
|
||||||
|
psrad mm2,DESCALE_P2
|
||||||
|
psrad mm0,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw mm5,mm7 ; mm5=data0=(00 10 20 30)
|
||||||
|
packssdw mm2,mm0 ; mm2=data7=(07 17 27 37)
|
||||||
|
|
||||||
|
movq mm4, MMWORD [wk(4)] ; mm4=tmp11L
|
||||||
|
movq mm3, MMWORD [wk(5)] ; mm3=tmp11H
|
||||||
|
|
||||||
|
movq mm7,mm4
|
||||||
|
movq mm0,mm3
|
||||||
|
paddd mm4,mm1 ; mm4=data1L
|
||||||
|
paddd mm3,mm6 ; mm3=data1H
|
||||||
|
psubd mm7,mm1 ; mm7=data6L
|
||||||
|
psubd mm0,mm6 ; mm0=data6H
|
||||||
|
|
||||||
|
movq mm1,[GOTOFF(ebx,PD_DESCALE_P2)] ; mm1=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd mm4,mm1
|
||||||
|
paddd mm3,mm1
|
||||||
|
psrad mm4,DESCALE_P2
|
||||||
|
psrad mm3,DESCALE_P2
|
||||||
|
paddd mm7,mm1
|
||||||
|
paddd mm0,mm1
|
||||||
|
psrad mm7,DESCALE_P2
|
||||||
|
psrad mm0,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw mm4,mm3 ; mm4=data1=(01 11 21 31)
|
||||||
|
packssdw mm7,mm0 ; mm7=data6=(06 16 26 36)
|
||||||
|
|
||||||
|
packsswb mm5,mm7 ; mm5=(00 10 20 30 06 16 26 36)
|
||||||
|
packsswb mm4,mm2 ; mm4=(01 11 21 31 07 17 27 37)
|
||||||
|
|
||||||
|
movq mm6, MMWORD [wk(6)] ; mm6=tmp12L
|
||||||
|
movq mm1, MMWORD [wk(7)] ; mm1=tmp12H
|
||||||
|
movq mm3, MMWORD [wk(10)] ; mm3=tmp1L
|
||||||
|
movq mm0, MMWORD [wk(11)] ; mm0=tmp1H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm5 ; wk(0)=(00 10 20 30 06 16 26 36)
|
||||||
|
movq MMWORD [wk(1)], mm4 ; wk(1)=(01 11 21 31 07 17 27 37)
|
||||||
|
|
||||||
|
movq mm7,mm6
|
||||||
|
movq mm2,mm1
|
||||||
|
paddd mm6,mm3 ; mm6=data2L
|
||||||
|
paddd mm1,mm0 ; mm1=data2H
|
||||||
|
psubd mm7,mm3 ; mm7=data5L
|
||||||
|
psubd mm2,mm0 ; mm2=data5H
|
||||||
|
|
||||||
|
movq mm5,[GOTOFF(ebx,PD_DESCALE_P2)] ; mm5=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd mm6,mm5
|
||||||
|
paddd mm1,mm5
|
||||||
|
psrad mm6,DESCALE_P2
|
||||||
|
psrad mm1,DESCALE_P2
|
||||||
|
paddd mm7,mm5
|
||||||
|
paddd mm2,mm5
|
||||||
|
psrad mm7,DESCALE_P2
|
||||||
|
psrad mm2,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw mm6,mm1 ; mm6=data2=(02 12 22 32)
|
||||||
|
packssdw mm7,mm2 ; mm7=data5=(05 15 25 35)
|
||||||
|
|
||||||
|
movq mm4, MMWORD [wk(2)] ; mm4=tmp13L
|
||||||
|
movq mm3, MMWORD [wk(3)] ; mm3=tmp13H
|
||||||
|
movq mm0, MMWORD [wk(8)] ; mm0=tmp0L
|
||||||
|
movq mm5, MMWORD [wk(9)] ; mm5=tmp0H
|
||||||
|
|
||||||
|
movq mm1,mm4
|
||||||
|
movq mm2,mm3
|
||||||
|
paddd mm4,mm0 ; mm4=data3L
|
||||||
|
paddd mm3,mm5 ; mm3=data3H
|
||||||
|
psubd mm1,mm0 ; mm1=data4L
|
||||||
|
psubd mm2,mm5 ; mm2=data4H
|
||||||
|
|
||||||
|
movq mm0,[GOTOFF(ebx,PD_DESCALE_P2)] ; mm0=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd mm4,mm0
|
||||||
|
paddd mm3,mm0
|
||||||
|
psrad mm4,DESCALE_P2
|
||||||
|
psrad mm3,DESCALE_P2
|
||||||
|
paddd mm1,mm0
|
||||||
|
paddd mm2,mm0
|
||||||
|
psrad mm1,DESCALE_P2
|
||||||
|
psrad mm2,DESCALE_P2
|
||||||
|
|
||||||
|
movq mm5,[GOTOFF(ebx,PB_CENTERJSAMP)] ; mm5=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packssdw mm4,mm3 ; mm4=data3=(03 13 23 33)
|
||||||
|
packssdw mm1,mm2 ; mm1=data4=(04 14 24 34)
|
||||||
|
|
||||||
|
movq mm0, MMWORD [wk(0)] ; mm0=(00 10 20 30 06 16 26 36)
|
||||||
|
movq mm3, MMWORD [wk(1)] ; mm3=(01 11 21 31 07 17 27 37)
|
||||||
|
|
||||||
|
packsswb mm6,mm1 ; mm6=(02 12 22 32 04 14 24 34)
|
||||||
|
packsswb mm4,mm7 ; mm4=(03 13 23 33 05 15 25 35)
|
||||||
|
|
||||||
|
paddb mm0,mm5
|
||||||
|
paddb mm3,mm5
|
||||||
|
paddb mm6,mm5
|
||||||
|
paddb mm4,mm5
|
||||||
|
|
||||||
|
movq mm2,mm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw mm0,mm3 ; mm0=(00 01 10 11 20 21 30 31)
|
||||||
|
punpckhbw mm2,mm3 ; mm2=(06 07 16 17 26 27 36 37)
|
||||||
|
movq mm1,mm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw mm6,mm4 ; mm6=(02 03 12 13 22 23 32 33)
|
||||||
|
punpckhbw mm1,mm4 ; mm1=(04 05 14 15 24 25 34 35)
|
||||||
|
|
||||||
|
movq mm7,mm0 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd mm0,mm6 ; mm0=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhwd mm7,mm6 ; mm7=(20 21 22 23 30 31 32 33)
|
||||||
|
movq mm5,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd mm1,mm2 ; mm1=(04 05 06 07 14 15 16 17)
|
||||||
|
punpckhwd mm5,mm2 ; mm5=(24 25 26 27 34 35 36 37)
|
||||||
|
|
||||||
|
movq mm3,mm0 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq mm0,mm1 ; mm0=(00 01 02 03 04 05 06 07)
|
||||||
|
punpckhdq mm3,mm1 ; mm3=(10 11 12 13 14 15 16 17)
|
||||||
|
movq mm4,mm7 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq mm7,mm5 ; mm7=(20 21 22 23 24 25 26 27)
|
||||||
|
punpckhdq mm4,mm5 ; mm4=(30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
pushpic ebx ; save GOT address
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm0
|
||||||
|
movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm3
|
||||||
|
mov edx, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm7
|
||||||
|
movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm4
|
||||||
|
|
||||||
|
poppic ebx ; restore GOT address
|
||||||
|
|
||||||
|
add esi, byte 4*SIZEOF_JCOEF ; wsptr
|
||||||
|
add edi, byte 4*SIZEOF_JSAMPROW
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_INT_MMX_SUPPORTED
|
||||||
|
%endif ; DCT_ISLOW_SUPPORTED
|
||||||
719
jimmxred.asm
Normal file
719
jimmxred.asm
Normal file
@@ -0,0 +1,719 @@
|
|||||||
|
;
|
||||||
|
; jimmxred.asm - reduced-size IDCT (MMX)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains inverse-DCT routines that produce reduced-size
|
||||||
|
; output: either 4x4 or 2x2 pixels from an 8x8 DCT block.
|
||||||
|
; The following code is based directly on the IJG's original jidctred.c;
|
||||||
|
; see the jidctred.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
%ifdef JIDCT_INT_MMX_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%define DESCALE_P1_4 (CONST_BITS-PASS1_BITS+1)
|
||||||
|
%define DESCALE_P2_4 (CONST_BITS+PASS1_BITS+3+1)
|
||||||
|
%define DESCALE_P1_2 (CONST_BITS-PASS1_BITS+2)
|
||||||
|
%define DESCALE_P2_2 (CONST_BITS+PASS1_BITS+3+2)
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_211 equ 1730 ; FIX(0.211164243)
|
||||||
|
F_0_509 equ 4176 ; FIX(0.509795579)
|
||||||
|
F_0_601 equ 4926 ; FIX(0.601344887)
|
||||||
|
F_0_720 equ 5906 ; FIX(0.720959822)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_850 equ 6967 ; FIX(0.850430095)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_061 equ 8697 ; FIX(1.061594337)
|
||||||
|
F_1_272 equ 10426 ; FIX(1.272758580)
|
||||||
|
F_1_451 equ 11893 ; FIX(1.451774981)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_2_172 equ 17799 ; FIX(2.172734803)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_624 equ 29692 ; FIX(3.624509785)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_211 equ DESCALE( 226735879,30-CONST_BITS) ; FIX(0.211164243)
|
||||||
|
F_0_509 equ DESCALE( 547388834,30-CONST_BITS) ; FIX(0.509795579)
|
||||||
|
F_0_601 equ DESCALE( 645689155,30-CONST_BITS) ; FIX(0.601344887)
|
||||||
|
F_0_720 equ DESCALE( 774124714,30-CONST_BITS) ; FIX(0.720959822)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_850 equ DESCALE( 913142361,30-CONST_BITS) ; FIX(0.850430095)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_061 equ DESCALE(1139878239,30-CONST_BITS) ; FIX(1.061594337)
|
||||||
|
F_1_272 equ DESCALE(1366614119,30-CONST_BITS) ; FIX(1.272758580)
|
||||||
|
F_1_451 equ DESCALE(1558831516,30-CONST_BITS) ; FIX(1.451774981)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_2_172 equ DESCALE(2332956230,30-CONST_BITS) ; FIX(2.172734803)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_624 equ DESCALE(3891787747,30-CONST_BITS) ; FIX(3.624509785)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_red_mmx)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_red_mmx):
|
||||||
|
|
||||||
|
PW_F184_MF076 times 2 dw F_1_847,-F_0_765
|
||||||
|
PW_F256_F089 times 2 dw F_2_562, F_0_899
|
||||||
|
PW_F106_MF217 times 2 dw F_1_061,-F_2_172
|
||||||
|
PW_MF060_MF050 times 2 dw -F_0_601,-F_0_509
|
||||||
|
PW_F145_MF021 times 2 dw F_1_451,-F_0_211
|
||||||
|
PW_F362_MF127 times 2 dw F_3_624,-F_1_272
|
||||||
|
PW_F085_MF072 times 2 dw F_0_850,-F_0_720
|
||||||
|
PD_DESCALE_P1_4 times 2 dd 1 << (DESCALE_P1_4-1)
|
||||||
|
PD_DESCALE_P2_4 times 2 dd 1 << (DESCALE_P2_4-1)
|
||||||
|
PD_DESCALE_P1_2 times 2 dd 1 << (DESCALE_P1_2-1)
|
||||||
|
PD_DESCALE_P2_2 times 2 dd 1 << (DESCALE_P2_2-1)
|
||||||
|
PB_CENTERJSAMP times 8 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 4x4 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_4x4_mmx (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_MMWORD ; mmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
%define workspace wk(0)-DCTSIZE2*SIZEOF_JCOEF
|
||||||
|
; JCOEF workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_4x4_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_4x4_mmx):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [workspace]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
lea edi, [workspace] ; JCOEF * wsptr
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_4X4_MMX
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm1, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por mm0,mm1
|
||||||
|
packsswb mm0,mm0
|
||||||
|
movd eax,mm0
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
psllw mm0,PASS1_BITS
|
||||||
|
|
||||||
|
movq mm2,mm0 ; mm0=in0=(00 01 02 03)
|
||||||
|
punpcklwd mm0,mm0 ; mm0=(00 00 01 01)
|
||||||
|
punpckhwd mm2,mm2 ; mm2=(02 02 03 03)
|
||||||
|
|
||||||
|
movq mm1,mm0
|
||||||
|
punpckldq mm0,mm0 ; mm0=(00 00 00 00)
|
||||||
|
punpckhdq mm1,mm1 ; mm1=(01 01 01 01)
|
||||||
|
movq mm3,mm2
|
||||||
|
punpckldq mm2,mm2 ; mm2=(02 02 02 02)
|
||||||
|
punpckhdq mm3,mm3 ; mm3=(03 03 03 03)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm2
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(1,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(3,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm2, MMWORD [MMBLOCK(5,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(7,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm0
|
||||||
|
punpcklwd mm4,mm1
|
||||||
|
punpckhwd mm5,mm1
|
||||||
|
movq mm0,mm4
|
||||||
|
movq mm1,mm5
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F256_F089)] ; mm4=(tmp2L)
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F256_F089)] ; mm5=(tmp2H)
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F106_MF217)] ; mm0=(tmp0L)
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_F106_MF217)] ; mm1=(tmp0H)
|
||||||
|
|
||||||
|
movq mm6,mm2
|
||||||
|
movq mm7,mm2
|
||||||
|
punpcklwd mm6,mm3
|
||||||
|
punpckhwd mm7,mm3
|
||||||
|
movq mm2,mm6
|
||||||
|
movq mm3,mm7
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_MF060_MF050)] ; mm6=(tmp2L)
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_MF060_MF050)] ; mm7=(tmp2H)
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_F145_MF021)] ; mm2=(tmp0L)
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F145_MF021)] ; mm3=(tmp0H)
|
||||||
|
|
||||||
|
paddd mm6,mm4 ; mm6=tmp2L
|
||||||
|
paddd mm7,mm5 ; mm7=tmp2H
|
||||||
|
paddd mm2,mm0 ; mm2=tmp0L
|
||||||
|
paddd mm3,mm1 ; mm3=tmp0H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm2 ; wk(0)=tmp0L
|
||||||
|
movq MMWORD [wk(1)], mm3 ; wk(1)=tmp0H
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm4, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm0, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm4, MMWORD [MMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm5, MMWORD [MMBLOCK(2,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(6,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
pxor mm1,mm1
|
||||||
|
pxor mm2,mm2
|
||||||
|
punpcklwd mm1,mm4 ; mm1=tmp0L
|
||||||
|
punpckhwd mm2,mm4 ; mm2=tmp0H
|
||||||
|
psrad mm1,(16-CONST_BITS-1) ; psrad mm1,16 & pslld mm1,CONST_BITS+1
|
||||||
|
psrad mm2,(16-CONST_BITS-1) ; psrad mm2,16 & pslld mm2,CONST_BITS+1
|
||||||
|
|
||||||
|
movq mm3,mm5 ; mm5=in2=z2
|
||||||
|
punpcklwd mm5,mm0 ; mm0=in6=z3
|
||||||
|
punpckhwd mm3,mm0
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F184_MF076)] ; mm5=tmp2L
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F184_MF076)] ; mm3=tmp2H
|
||||||
|
|
||||||
|
movq mm4,mm1
|
||||||
|
movq mm0,mm2
|
||||||
|
paddd mm1,mm5 ; mm1=tmp10L
|
||||||
|
paddd mm2,mm3 ; mm2=tmp10H
|
||||||
|
psubd mm4,mm5 ; mm4=tmp12L
|
||||||
|
psubd mm0,mm3 ; mm0=tmp12H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movq mm5,mm1
|
||||||
|
movq mm3,mm2
|
||||||
|
paddd mm1,mm6 ; mm1=data0L
|
||||||
|
paddd mm2,mm7 ; mm2=data0H
|
||||||
|
psubd mm5,mm6 ; mm5=data3L
|
||||||
|
psubd mm3,mm7 ; mm3=data3H
|
||||||
|
|
||||||
|
movq mm6,[GOTOFF(ebx,PD_DESCALE_P1_4)] ; mm6=[PD_DESCALE_P1_4]
|
||||||
|
|
||||||
|
paddd mm1,mm6
|
||||||
|
paddd mm2,mm6
|
||||||
|
psrad mm1,DESCALE_P1_4
|
||||||
|
psrad mm2,DESCALE_P1_4
|
||||||
|
paddd mm5,mm6
|
||||||
|
paddd mm3,mm6
|
||||||
|
psrad mm5,DESCALE_P1_4
|
||||||
|
psrad mm3,DESCALE_P1_4
|
||||||
|
|
||||||
|
packssdw mm1,mm2 ; mm1=data0=(00 01 02 03)
|
||||||
|
packssdw mm5,mm3 ; mm5=data3=(30 31 32 33)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [wk(0)] ; mm7=tmp0L
|
||||||
|
movq mm6, MMWORD [wk(1)] ; mm6=tmp0H
|
||||||
|
|
||||||
|
movq mm2,mm4
|
||||||
|
movq mm3,mm0
|
||||||
|
paddd mm4,mm7 ; mm4=data1L
|
||||||
|
paddd mm0,mm6 ; mm0=data1H
|
||||||
|
psubd mm2,mm7 ; mm2=data2L
|
||||||
|
psubd mm3,mm6 ; mm3=data2H
|
||||||
|
|
||||||
|
movq mm7,[GOTOFF(ebx,PD_DESCALE_P1_4)] ; mm7=[PD_DESCALE_P1_4]
|
||||||
|
|
||||||
|
paddd mm4,mm7
|
||||||
|
paddd mm0,mm7
|
||||||
|
psrad mm4,DESCALE_P1_4
|
||||||
|
psrad mm0,DESCALE_P1_4
|
||||||
|
paddd mm2,mm7
|
||||||
|
paddd mm3,mm7
|
||||||
|
psrad mm2,DESCALE_P1_4
|
||||||
|
psrad mm3,DESCALE_P1_4
|
||||||
|
|
||||||
|
packssdw mm4,mm0 ; mm4=data1=(10 11 12 13)
|
||||||
|
packssdw mm2,mm3 ; mm2=data2=(20 21 22 23)
|
||||||
|
|
||||||
|
movq mm6,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm1,mm4 ; mm1=(00 10 01 11)
|
||||||
|
punpckhwd mm6,mm4 ; mm6=(02 12 03 13)
|
||||||
|
movq mm7,mm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd mm2,mm5 ; mm2=(20 30 21 31)
|
||||||
|
punpckhwd mm7,mm5 ; mm7=(22 32 23 33)
|
||||||
|
|
||||||
|
movq mm0,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm1,mm2 ; mm1=(00 10 20 30)
|
||||||
|
punpckhdq mm0,mm2 ; mm0=(01 11 21 31)
|
||||||
|
movq mm3,mm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq mm6,mm7 ; mm6=(02 12 22 32)
|
||||||
|
punpckhdq mm3,mm7 ; mm3=(03 13 23 33)
|
||||||
|
|
||||||
|
movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm1
|
||||||
|
movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm0
|
||||||
|
movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm6
|
||||||
|
movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm3
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte 4*SIZEOF_JCOEF ; coef_block
|
||||||
|
add edx, byte 4*SIZEOF_ISLOW_MULT_TYPE ; quantptr
|
||||||
|
add edi, byte 4*DCTSIZE*SIZEOF_JCOEF ; wsptr
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
lea esi, [workspace] ; JCOEF * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
movq mm4,mm0
|
||||||
|
movq mm5,mm0
|
||||||
|
punpcklwd mm4,mm1
|
||||||
|
punpckhwd mm5,mm1
|
||||||
|
movq mm0,mm4
|
||||||
|
movq mm1,mm5
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F256_F089)] ; mm4=(tmp2L)
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F256_F089)] ; mm5=(tmp2H)
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F106_MF217)] ; mm0=(tmp0L)
|
||||||
|
pmaddwd mm1,[GOTOFF(ebx,PW_F106_MF217)] ; mm1=(tmp0H)
|
||||||
|
|
||||||
|
movq mm6,mm2
|
||||||
|
movq mm7,mm2
|
||||||
|
punpcklwd mm6,mm3
|
||||||
|
punpckhwd mm7,mm3
|
||||||
|
movq mm2,mm6
|
||||||
|
movq mm3,mm7
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_MF060_MF050)] ; mm6=(tmp2L)
|
||||||
|
pmaddwd mm7,[GOTOFF(ebx,PW_MF060_MF050)] ; mm7=(tmp2H)
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_F145_MF021)] ; mm2=(tmp0L)
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F145_MF021)] ; mm3=(tmp0H)
|
||||||
|
|
||||||
|
paddd mm6,mm4 ; mm6=tmp2L
|
||||||
|
paddd mm7,mm5 ; mm7=tmp2H
|
||||||
|
paddd mm2,mm0 ; mm2=tmp0L
|
||||||
|
paddd mm3,mm1 ; mm3=tmp0H
|
||||||
|
|
||||||
|
movq MMWORD [wk(0)], mm2 ; wk(0)=tmp0L
|
||||||
|
movq MMWORD [wk(1)], mm3 ; wk(1)=tmp0H
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm4, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm0, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
pxor mm1,mm1
|
||||||
|
pxor mm2,mm2
|
||||||
|
punpcklwd mm1,mm4 ; mm1=tmp0L
|
||||||
|
punpckhwd mm2,mm4 ; mm2=tmp0H
|
||||||
|
psrad mm1,(16-CONST_BITS-1) ; psrad mm1,16 & pslld mm1,CONST_BITS+1
|
||||||
|
psrad mm2,(16-CONST_BITS-1) ; psrad mm2,16 & pslld mm2,CONST_BITS+1
|
||||||
|
|
||||||
|
movq mm3,mm5 ; mm5=in2=z2
|
||||||
|
punpcklwd mm5,mm0 ; mm0=in6=z3
|
||||||
|
punpckhwd mm3,mm0
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F184_MF076)] ; mm5=tmp2L
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F184_MF076)] ; mm3=tmp2H
|
||||||
|
|
||||||
|
movq mm4,mm1
|
||||||
|
movq mm0,mm2
|
||||||
|
paddd mm1,mm5 ; mm1=tmp10L
|
||||||
|
paddd mm2,mm3 ; mm2=tmp10H
|
||||||
|
psubd mm4,mm5 ; mm4=tmp12L
|
||||||
|
psubd mm0,mm3 ; mm0=tmp12H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movq mm5,mm1
|
||||||
|
movq mm3,mm2
|
||||||
|
paddd mm1,mm6 ; mm1=data0L
|
||||||
|
paddd mm2,mm7 ; mm2=data0H
|
||||||
|
psubd mm5,mm6 ; mm5=data3L
|
||||||
|
psubd mm3,mm7 ; mm3=data3H
|
||||||
|
|
||||||
|
movq mm6,[GOTOFF(ebx,PD_DESCALE_P2_4)] ; mm6=[PD_DESCALE_P2_4]
|
||||||
|
|
||||||
|
paddd mm1,mm6
|
||||||
|
paddd mm2,mm6
|
||||||
|
psrad mm1,DESCALE_P2_4
|
||||||
|
psrad mm2,DESCALE_P2_4
|
||||||
|
paddd mm5,mm6
|
||||||
|
paddd mm3,mm6
|
||||||
|
psrad mm5,DESCALE_P2_4
|
||||||
|
psrad mm3,DESCALE_P2_4
|
||||||
|
|
||||||
|
packssdw mm1,mm2 ; mm1=data0=(00 10 20 30)
|
||||||
|
packssdw mm5,mm3 ; mm5=data3=(03 13 23 33)
|
||||||
|
|
||||||
|
movq mm7, MMWORD [wk(0)] ; mm7=tmp0L
|
||||||
|
movq mm6, MMWORD [wk(1)] ; mm6=tmp0H
|
||||||
|
|
||||||
|
movq mm2,mm4
|
||||||
|
movq mm3,mm0
|
||||||
|
paddd mm4,mm7 ; mm4=data1L
|
||||||
|
paddd mm0,mm6 ; mm0=data1H
|
||||||
|
psubd mm2,mm7 ; mm2=data2L
|
||||||
|
psubd mm3,mm6 ; mm3=data2H
|
||||||
|
|
||||||
|
movq mm7,[GOTOFF(ebx,PD_DESCALE_P2_4)] ; mm7=[PD_DESCALE_P2_4]
|
||||||
|
|
||||||
|
paddd mm4,mm7
|
||||||
|
paddd mm0,mm7
|
||||||
|
psrad mm4,DESCALE_P2_4
|
||||||
|
psrad mm0,DESCALE_P2_4
|
||||||
|
paddd mm2,mm7
|
||||||
|
paddd mm3,mm7
|
||||||
|
psrad mm2,DESCALE_P2_4
|
||||||
|
psrad mm3,DESCALE_P2_4
|
||||||
|
|
||||||
|
packssdw mm4,mm0 ; mm4=data1=(01 11 21 31)
|
||||||
|
packssdw mm2,mm3 ; mm2=data2=(02 12 22 32)
|
||||||
|
|
||||||
|
movq mm6,[GOTOFF(ebx,PB_CENTERJSAMP)] ; mm6=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packsswb mm1,mm2 ; mm1=(00 10 20 30 02 12 22 32)
|
||||||
|
packsswb mm4,mm5 ; mm4=(01 11 21 31 03 13 23 33)
|
||||||
|
paddb mm1,mm6
|
||||||
|
paddb mm4,mm6
|
||||||
|
|
||||||
|
movq mm7,mm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw mm1,mm4 ; mm1=(00 01 10 11 20 21 30 31)
|
||||||
|
punpckhbw mm7,mm4 ; mm7=(02 03 12 13 22 23 32 33)
|
||||||
|
|
||||||
|
movq mm0,mm1 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd mm1,mm7 ; mm1=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhwd mm0,mm7 ; mm0=(20 21 22 23 30 31 32 33)
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
movd DWORD [edx+eax*SIZEOF_JSAMPLE], mm1
|
||||||
|
movd DWORD [esi+eax*SIZEOF_JSAMPLE], mm0
|
||||||
|
|
||||||
|
psrlq mm1,4*BYTE_BIT
|
||||||
|
psrlq mm0,4*BYTE_BIT
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movd DWORD [edx+eax*SIZEOF_JSAMPLE], mm1
|
||||||
|
movd DWORD [esi+eax*SIZEOF_JSAMPLE], mm0
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 2x2 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_2x2_mmx (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_2x2_mmx)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_2x2_mmx):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
|
||||||
|
; | input: | result: |
|
||||||
|
; | 00 01 ** 03 ** 05 ** 07 | |
|
||||||
|
; | 10 11 ** 13 ** 15 ** 17 | |
|
||||||
|
; | ** ** ** ** ** ** ** ** | |
|
||||||
|
; | 30 31 ** 33 ** 35 ** 37 | A0 A1 A3 A5 A7 |
|
||||||
|
; | ** ** ** ** ** ** ** ** | B0 B1 B3 B5 B7 |
|
||||||
|
; | 50 51 ** 53 ** 55 ** 57 | |
|
||||||
|
; | ** ** ** ** ** ** ** ** | |
|
||||||
|
; | 70 71 ** 73 ** 75 ** 77 | |
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm0, MMWORD [MMBLOCK(1,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(3,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movq mm2, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm2, MMWORD [MMBLOCK(5,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(7,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; mm0=(10 11 ** 13), mm1=(30 31 ** 33)
|
||||||
|
; mm2=(50 51 ** 53), mm3=(70 71 ** 73)
|
||||||
|
|
||||||
|
pcmpeqd mm7,mm7
|
||||||
|
pslld mm7,WORD_BIT ; mm7={0x0000 0xFFFF 0x0000 0xFFFF}
|
||||||
|
|
||||||
|
movq mm4,mm0 ; mm4=(10 11 ** 13)
|
||||||
|
movq mm5,mm2 ; mm5=(50 51 ** 53)
|
||||||
|
punpcklwd mm4,mm1 ; mm4=(10 30 11 31)
|
||||||
|
punpcklwd mm5,mm3 ; mm5=(50 70 51 71)
|
||||||
|
pmaddwd mm4,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
psrld mm0,WORD_BIT ; mm0=(11 -- 13 --)
|
||||||
|
pand mm1,mm7 ; mm1=(-- 31 -- 33)
|
||||||
|
psrld mm2,WORD_BIT ; mm2=(51 -- 53 --)
|
||||||
|
pand mm3,mm7 ; mm3=(-- 71 -- 73)
|
||||||
|
por mm0,mm1 ; mm0=(11 31 13 33)
|
||||||
|
por mm2,mm3 ; mm2=(51 71 53 73)
|
||||||
|
pmaddwd mm0,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
paddd mm4,mm5 ; mm4=tmp0[col0 col1]
|
||||||
|
|
||||||
|
movq mm6, MMWORD [MMBLOCK(1,1,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm1, MMWORD [MMBLOCK(3,1,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm6, MMWORD [MMBLOCK(1,1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(3,1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movq mm3, MMWORD [MMBLOCK(5,1,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(7,1,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm3, MMWORD [MMBLOCK(5,1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm5, MMWORD [MMBLOCK(7,1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; mm6=(** 15 ** 17), mm1=(** 35 ** 37)
|
||||||
|
; mm3=(** 55 ** 57), mm5=(** 75 ** 77)
|
||||||
|
|
||||||
|
psrld mm6,WORD_BIT ; mm6=(15 -- 17 --)
|
||||||
|
pand mm1,mm7 ; mm1=(-- 35 -- 37)
|
||||||
|
psrld mm3,WORD_BIT ; mm3=(55 -- 57 --)
|
||||||
|
pand mm5,mm7 ; mm5=(-- 75 -- 77)
|
||||||
|
por mm6,mm1 ; mm6=(15 35 17 37)
|
||||||
|
por mm3,mm5 ; mm3=(55 75 57 77)
|
||||||
|
pmaddwd mm6,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd mm3,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
paddd mm0,mm2 ; mm0=tmp0[col1 col3]
|
||||||
|
paddd mm6,mm3 ; mm6=tmp0[col5 col7]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq mm1, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq mm5, MMWORD [MMBLOCK(0,1,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw mm1, MMWORD [MMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw mm5, MMWORD [MMBLOCK(0,1,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; mm1=(00 01 ** 03), mm5=(** 05 ** 07)
|
||||||
|
|
||||||
|
movq mm2,mm1 ; mm2=(00 01 ** 03)
|
||||||
|
pslld mm1,WORD_BIT ; mm1=(-- 00 -- **)
|
||||||
|
psrad mm1,(WORD_BIT-CONST_BITS-2) ; mm1=tmp10[col0 ****]
|
||||||
|
|
||||||
|
pand mm2,mm7 ; mm2=(-- 01 -- 03)
|
||||||
|
pand mm5,mm7 ; mm5=(-- 05 -- 07)
|
||||||
|
psrad mm2,(WORD_BIT-CONST_BITS-2) ; mm2=tmp10[col1 col3]
|
||||||
|
psrad mm5,(WORD_BIT-CONST_BITS-2) ; mm5=tmp10[col5 col7]
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movq mm3,mm1
|
||||||
|
paddd mm1,mm4 ; mm1=data0[col0 ****]=(A0 **)
|
||||||
|
psubd mm3,mm4 ; mm3=data1[col0 ****]=(B0 **)
|
||||||
|
punpckldq mm1,mm3 ; mm1=(A0 B0)
|
||||||
|
|
||||||
|
movq mm7,[GOTOFF(ebx,PD_DESCALE_P1_2)] ; mm7=[PD_DESCALE_P1_2]
|
||||||
|
|
||||||
|
movq mm4,mm2
|
||||||
|
movq mm3,mm5
|
||||||
|
paddd mm2,mm0 ; mm2=data0[col1 col3]=(A1 A3)
|
||||||
|
paddd mm5,mm6 ; mm5=data0[col5 col7]=(A5 A7)
|
||||||
|
psubd mm4,mm0 ; mm4=data1[col1 col3]=(B1 B3)
|
||||||
|
psubd mm3,mm6 ; mm3=data1[col5 col7]=(B5 B7)
|
||||||
|
|
||||||
|
paddd mm1,mm7
|
||||||
|
psrad mm1,DESCALE_P1_2
|
||||||
|
|
||||||
|
paddd mm2,mm7
|
||||||
|
paddd mm5,mm7
|
||||||
|
psrad mm2,DESCALE_P1_2
|
||||||
|
psrad mm5,DESCALE_P1_2
|
||||||
|
paddd mm4,mm7
|
||||||
|
paddd mm3,mm7
|
||||||
|
psrad mm4,DESCALE_P1_2
|
||||||
|
psrad mm3,DESCALE_P1_2
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows, store into output array.
|
||||||
|
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(ebp)]
|
||||||
|
|
||||||
|
; | input:| result:|
|
||||||
|
; | A0 B0 | |
|
||||||
|
; | A1 B1 | C0 C1 |
|
||||||
|
; | A3 B3 | D0 D1 |
|
||||||
|
; | A5 B5 | |
|
||||||
|
; | A7 B7 | |
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
packssdw mm2,mm4 ; mm2=(A1 A3 B1 B3)
|
||||||
|
packssdw mm5,mm3 ; mm5=(A5 A7 B5 B7)
|
||||||
|
pmaddwd mm2,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd mm5,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
paddd mm2,mm5 ; mm2=tmp0[row0 row1]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
pslld mm1,(CONST_BITS+2) ; mm1=tmp10[row0 row1]
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movq mm0,[GOTOFF(ebx,PD_DESCALE_P2_2)] ; mm0=[PD_DESCALE_P2_2]
|
||||||
|
|
||||||
|
movq mm6,mm1
|
||||||
|
paddd mm1,mm2 ; mm1=data0[row0 row1]=(C0 C1)
|
||||||
|
psubd mm6,mm2 ; mm6=data1[row0 row1]=(D0 D1)
|
||||||
|
|
||||||
|
paddd mm1,mm0
|
||||||
|
paddd mm6,mm0
|
||||||
|
psrad mm1,DESCALE_P2_2
|
||||||
|
psrad mm6,DESCALE_P2_2
|
||||||
|
|
||||||
|
movq mm7,mm1 ; transpose coefficients
|
||||||
|
punpckldq mm1,mm6 ; mm1=(C0 D0)
|
||||||
|
punpckhdq mm7,mm6 ; mm7=(C1 D1)
|
||||||
|
|
||||||
|
packssdw mm1,mm7 ; mm1=(C0 D0 C1 D1)
|
||||||
|
packsswb mm1,mm1 ; mm1=(C0 D0 C1 D1 C0 D0 C1 D1)
|
||||||
|
paddb mm1,[GOTOFF(ebx,PB_CENTERJSAMP)]
|
||||||
|
|
||||||
|
movd ecx,mm1
|
||||||
|
movd ebx,mm1 ; ebx=(C0 D0 C1 D1)
|
||||||
|
shr ecx,2*BYTE_BIT ; ecx=(C1 D1 -- --)
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
mov WORD [edx+eax*SIZEOF_JSAMPLE], bx
|
||||||
|
mov WORD [esi+eax*SIZEOF_JSAMPLE], cx
|
||||||
|
|
||||||
|
emms ; empty MMX state
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_INT_MMX_SUPPORTED
|
||||||
|
%endif ; IDCT_SCALING_SUPPORTED
|
||||||
508
jiss2flt.asm
Normal file
508
jiss2flt.asm
Normal file
@@ -0,0 +1,508 @@
|
|||||||
|
;
|
||||||
|
; jiss2flt.asm - floating-point IDCT (SSE & SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a floating-point implementation of the inverse DCT
|
||||||
|
; (Discrete Cosine Transform). The following code is based directly on
|
||||||
|
; the IJG's original jidctflt.c; see the jidctflt.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_FLOAT_SUPPORTED
|
||||||
|
%ifdef JIDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%macro unpcklps2 2 ; %1=(0 1 2 3) / %2=(4 5 6 7) => %1=(0 1 4 5)
|
||||||
|
shufps %1,%2,0x44
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
%macro unpckhps2 2 ; %1=(0 1 2 3) / %2=(4 5 6 7) => %1=(2 3 6 7)
|
||||||
|
shufps %1,%2,0xEE
|
||||||
|
%endmacro
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_float_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_float_sse2):
|
||||||
|
|
||||||
|
PD_1_414 times 4 dd 1.414213562373095048801689
|
||||||
|
PD_1_847 times 4 dd 1.847759065022573512256366
|
||||||
|
PD_1_082 times 4 dd 1.082392200292393968799446
|
||||||
|
PD_M2_613 times 4 dd -2.613125929752753055713286
|
||||||
|
PD_RNDINT_MAGIC times 4 dd 100663296.0 ; (float)(0x00C00000 << 3)
|
||||||
|
PB_CENTERJSAMP times 16 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_float_sse2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
%define workspace wk(0)-DCTSIZE2*SIZEOF_FAST_FLOAT
|
||||||
|
; FAST_FLOAT workspace[DCTSIZE2]
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_float_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_float_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [workspace]
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input, store into work array.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
lea edi, [workspace] ; FAST_FLOAT * wsptr
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.columnloop:
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_FLOAT_SSE
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz near .columnDCT
|
||||||
|
|
||||||
|
movq xmm1, _MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm2, _MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm3, _MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm4, _MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm5, _MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm6, _MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm7, _MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1,xmm2
|
||||||
|
por xmm3,xmm4
|
||||||
|
por xmm5,xmm6
|
||||||
|
por xmm1,xmm3
|
||||||
|
por xmm5,xmm7
|
||||||
|
por xmm1,xmm5
|
||||||
|
packsswb xmm1,xmm1
|
||||||
|
movd eax,xmm1
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movq xmm0, _MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
punpcklwd xmm0,xmm0 ; xmm0=(00 00 01 01 02 02 03 03)
|
||||||
|
psrad xmm0,(DWORD_BIT-WORD_BIT) ; xmm0=in0=(00 01 02 03)
|
||||||
|
cvtdq2ps xmm0,xmm0 ; xmm0=in0=(00 01 02 03)
|
||||||
|
|
||||||
|
mulps xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
movaps xmm1,xmm0
|
||||||
|
movaps xmm2,xmm0
|
||||||
|
movaps xmm3,xmm0
|
||||||
|
|
||||||
|
shufps xmm0,xmm0,0x00 ; xmm0=(00 00 00 00)
|
||||||
|
shufps xmm1,xmm1,0x55 ; xmm1=(01 01 01 01)
|
||||||
|
shufps xmm2,xmm2,0xAA ; xmm2=(02 02 02 02)
|
||||||
|
shufps xmm3,xmm3,0xFF ; xmm3=(03 03 03 03)
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], xmm1
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], xmm1
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,0,edi,SIZEOF_FAST_FLOAT)], xmm2
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,1,edi,SIZEOF_FAST_FLOAT)], xmm2
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,0,edi,SIZEOF_FAST_FLOAT)], xmm3
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,1,edi,SIZEOF_FAST_FLOAT)], xmm3
|
||||||
|
jmp near .nextcolumn
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movq xmm0, _MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm1, _MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm2, _MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm3, _MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
punpcklwd xmm0,xmm0 ; xmm0=(00 00 01 01 02 02 03 03)
|
||||||
|
punpcklwd xmm1,xmm1 ; xmm1=(20 20 21 21 22 22 23 23)
|
||||||
|
psrad xmm0,(DWORD_BIT-WORD_BIT) ; xmm0=in0=(00 01 02 03)
|
||||||
|
psrad xmm1,(DWORD_BIT-WORD_BIT) ; xmm1=in2=(20 21 22 23)
|
||||||
|
cvtdq2ps xmm0,xmm0 ; xmm0=in0=(00 01 02 03)
|
||||||
|
cvtdq2ps xmm1,xmm1 ; xmm1=in2=(20 21 22 23)
|
||||||
|
|
||||||
|
punpcklwd xmm2,xmm2 ; xmm2=(40 40 41 41 42 42 43 43)
|
||||||
|
punpcklwd xmm3,xmm3 ; xmm3=(60 60 61 61 62 62 63 63)
|
||||||
|
psrad xmm2,(DWORD_BIT-WORD_BIT) ; xmm2=in4=(40 41 42 43)
|
||||||
|
psrad xmm3,(DWORD_BIT-WORD_BIT) ; xmm3=in6=(60 61 62 63)
|
||||||
|
cvtdq2ps xmm2,xmm2 ; xmm2=in4=(40 41 42 43)
|
||||||
|
cvtdq2ps xmm3,xmm3 ; xmm3=in6=(60 61 62 63)
|
||||||
|
|
||||||
|
mulps xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
mulps xmm1, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
mulps xmm2, XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
mulps xmm3, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
movaps xmm4,xmm0
|
||||||
|
movaps xmm5,xmm1
|
||||||
|
subps xmm0,xmm2 ; xmm0=tmp11
|
||||||
|
subps xmm1,xmm3
|
||||||
|
addps xmm4,xmm2 ; xmm4=tmp10
|
||||||
|
addps xmm5,xmm3 ; xmm5=tmp13
|
||||||
|
|
||||||
|
mulps xmm1,[GOTOFF(ebx,PD_1_414)]
|
||||||
|
subps xmm1,xmm5 ; xmm1=tmp12
|
||||||
|
|
||||||
|
movaps xmm6,xmm4
|
||||||
|
movaps xmm7,xmm0
|
||||||
|
subps xmm4,xmm5 ; xmm4=tmp3
|
||||||
|
subps xmm0,xmm1 ; xmm0=tmp2
|
||||||
|
addps xmm6,xmm5 ; xmm6=tmp0
|
||||||
|
addps xmm7,xmm1 ; xmm7=tmp1
|
||||||
|
|
||||||
|
movaps XMMWORD [wk(1)], xmm4 ; tmp3
|
||||||
|
movaps XMMWORD [wk(0)], xmm0 ; tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movq xmm2, _MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm3, _MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm5, _MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movq xmm1, _MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
|
||||||
|
punpcklwd xmm2,xmm2 ; xmm2=(10 10 11 11 12 12 13 13)
|
||||||
|
punpcklwd xmm3,xmm3 ; xmm3=(30 30 31 31 32 32 33 33)
|
||||||
|
psrad xmm2,(DWORD_BIT-WORD_BIT) ; xmm2=in1=(10 11 12 13)
|
||||||
|
psrad xmm3,(DWORD_BIT-WORD_BIT) ; xmm3=in3=(30 31 32 33)
|
||||||
|
cvtdq2ps xmm2,xmm2 ; xmm2=in1=(10 11 12 13)
|
||||||
|
cvtdq2ps xmm3,xmm3 ; xmm3=in3=(30 31 32 33)
|
||||||
|
|
||||||
|
punpcklwd xmm5,xmm5 ; xmm5=(50 50 51 51 52 52 53 53)
|
||||||
|
punpcklwd xmm1,xmm1 ; xmm1=(70 70 71 71 72 72 73 73)
|
||||||
|
psrad xmm5,(DWORD_BIT-WORD_BIT) ; xmm5=in5=(50 51 52 53)
|
||||||
|
psrad xmm1,(DWORD_BIT-WORD_BIT) ; xmm1=in7=(70 71 72 73)
|
||||||
|
cvtdq2ps xmm5,xmm5 ; xmm5=in5=(50 51 52 53)
|
||||||
|
cvtdq2ps xmm1,xmm1 ; xmm1=in7=(70 71 72 73)
|
||||||
|
|
||||||
|
mulps xmm2, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
mulps xmm3, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
mulps xmm5, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
mulps xmm1, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_FLOAT_MULT_TYPE)]
|
||||||
|
|
||||||
|
movaps xmm4,xmm2
|
||||||
|
movaps xmm0,xmm5
|
||||||
|
addps xmm2,xmm1 ; xmm2=z11
|
||||||
|
addps xmm5,xmm3 ; xmm5=z13
|
||||||
|
subps xmm4,xmm1 ; xmm4=z12
|
||||||
|
subps xmm0,xmm3 ; xmm0=z10
|
||||||
|
|
||||||
|
movaps xmm1,xmm2
|
||||||
|
subps xmm2,xmm5
|
||||||
|
addps xmm1,xmm5 ; xmm1=tmp7
|
||||||
|
|
||||||
|
mulps xmm2,[GOTOFF(ebx,PD_1_414)] ; xmm2=tmp11
|
||||||
|
|
||||||
|
movaps xmm3,xmm0
|
||||||
|
addps xmm0,xmm4
|
||||||
|
mulps xmm0,[GOTOFF(ebx,PD_1_847)] ; xmm0=z5
|
||||||
|
mulps xmm3,[GOTOFF(ebx,PD_M2_613)] ; xmm3=(z10 * -2.613125930)
|
||||||
|
mulps xmm4,[GOTOFF(ebx,PD_1_082)] ; xmm4=(z12 * 1.082392200)
|
||||||
|
addps xmm3,xmm0 ; xmm3=tmp12
|
||||||
|
subps xmm4,xmm0 ; xmm4=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
subps xmm3,xmm1 ; xmm3=tmp6
|
||||||
|
movaps xmm5,xmm6
|
||||||
|
movaps xmm0,xmm7
|
||||||
|
addps xmm6,xmm1 ; xmm6=data0=(00 01 02 03)
|
||||||
|
addps xmm7,xmm3 ; xmm7=data1=(10 11 12 13)
|
||||||
|
subps xmm5,xmm1 ; xmm5=data7=(70 71 72 73)
|
||||||
|
subps xmm0,xmm3 ; xmm0=data6=(60 61 62 63)
|
||||||
|
subps xmm2,xmm3 ; xmm2=tmp5
|
||||||
|
|
||||||
|
movaps xmm1,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm6,xmm7 ; xmm6=(00 10 01 11)
|
||||||
|
unpckhps xmm1,xmm7 ; xmm1=(02 12 03 13)
|
||||||
|
movaps xmm3,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm0,xmm5 ; xmm0=(60 70 61 71)
|
||||||
|
unpckhps xmm3,xmm5 ; xmm3=(62 72 63 73)
|
||||||
|
|
||||||
|
movaps xmm7, XMMWORD [wk(0)] ; xmm7=tmp2
|
||||||
|
movaps xmm5, XMMWORD [wk(1)] ; xmm5=tmp3
|
||||||
|
|
||||||
|
movaps XMMWORD [wk(0)], xmm0 ; wk(0)=(60 70 61 71)
|
||||||
|
movaps XMMWORD [wk(1)], xmm3 ; wk(1)=(62 72 63 73)
|
||||||
|
|
||||||
|
addps xmm4,xmm2 ; xmm4=tmp4
|
||||||
|
movaps xmm0,xmm7
|
||||||
|
movaps xmm3,xmm5
|
||||||
|
addps xmm7,xmm2 ; xmm7=data2=(20 21 22 23)
|
||||||
|
addps xmm5,xmm4 ; xmm5=data4=(40 41 42 43)
|
||||||
|
subps xmm0,xmm2 ; xmm0=data5=(50 51 52 53)
|
||||||
|
subps xmm3,xmm4 ; xmm3=data3=(30 31 32 33)
|
||||||
|
|
||||||
|
movaps xmm2,xmm7 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm7,xmm3 ; xmm7=(20 30 21 31)
|
||||||
|
unpckhps xmm2,xmm3 ; xmm2=(22 32 23 33)
|
||||||
|
movaps xmm4,xmm5 ; transpose coefficients(phase 1)
|
||||||
|
unpcklps xmm5,xmm0 ; xmm5=(40 50 41 51)
|
||||||
|
unpckhps xmm4,xmm0 ; xmm4=(42 52 43 53)
|
||||||
|
|
||||||
|
movaps xmm3,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm6,xmm7 ; xmm6=(00 10 20 30)
|
||||||
|
unpckhps2 xmm3,xmm7 ; xmm3=(01 11 21 31)
|
||||||
|
movaps xmm0,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm1,xmm2 ; xmm1=(02 12 22 32)
|
||||||
|
unpckhps2 xmm0,xmm2 ; xmm0=(03 13 23 33)
|
||||||
|
|
||||||
|
movaps xmm7, XMMWORD [wk(0)] ; xmm7=(60 70 61 71)
|
||||||
|
movaps xmm2, XMMWORD [wk(1)] ; xmm2=(62 72 63 73)
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], xmm6
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], xmm3
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,0,edi,SIZEOF_FAST_FLOAT)], xmm1
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,0,edi,SIZEOF_FAST_FLOAT)], xmm0
|
||||||
|
|
||||||
|
movaps xmm6,xmm5 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm5,xmm7 ; xmm5=(40 50 60 70)
|
||||||
|
unpckhps2 xmm6,xmm7 ; xmm6=(41 51 61 71)
|
||||||
|
movaps xmm3,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
unpcklps2 xmm4,xmm2 ; xmm4=(42 52 62 72)
|
||||||
|
unpckhps2 xmm3,xmm2 ; xmm3=(43 53 63 73)
|
||||||
|
|
||||||
|
movaps XMMWORD [XMMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], xmm5
|
||||||
|
movaps XMMWORD [XMMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], xmm6
|
||||||
|
movaps XMMWORD [XMMBLOCK(2,1,edi,SIZEOF_FAST_FLOAT)], xmm4
|
||||||
|
movaps XMMWORD [XMMBLOCK(3,1,edi,SIZEOF_FAST_FLOAT)], xmm3
|
||||||
|
|
||||||
|
.nextcolumn:
|
||||||
|
add esi, byte 4*SIZEOF_JCOEF ; coef_block
|
||||||
|
add edx, byte 4*SIZEOF_FLOAT_MULT_TYPE ; quantptr
|
||||||
|
add edi, 4*DCTSIZE*SIZEOF_FAST_FLOAT ; wsptr
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .columnloop
|
||||||
|
|
||||||
|
; -- Prefetch the next coefficient block
|
||||||
|
|
||||||
|
prefetchnta [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 0*32]
|
||||||
|
prefetchnta [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 1*32]
|
||||||
|
prefetchnta [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 2*32]
|
||||||
|
prefetchnta [esi + (DCTSIZE2-8)*SIZEOF_JCOEF + 3*32]
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
lea esi, [workspace] ; FAST_FLOAT * wsptr
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
mov ecx, DCTSIZE/4 ; ctr
|
||||||
|
alignx 16,7
|
||||||
|
.rowloop:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movaps xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm2, XMMWORD [XMMBLOCK(4,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
movaps xmm4,xmm0
|
||||||
|
movaps xmm5,xmm1
|
||||||
|
subps xmm0,xmm2 ; xmm0=tmp11
|
||||||
|
subps xmm1,xmm3
|
||||||
|
addps xmm4,xmm2 ; xmm4=tmp10
|
||||||
|
addps xmm5,xmm3 ; xmm5=tmp13
|
||||||
|
|
||||||
|
mulps xmm1,[GOTOFF(ebx,PD_1_414)]
|
||||||
|
subps xmm1,xmm5 ; xmm1=tmp12
|
||||||
|
|
||||||
|
movaps xmm6,xmm4
|
||||||
|
movaps xmm7,xmm0
|
||||||
|
subps xmm4,xmm5 ; xmm4=tmp3
|
||||||
|
subps xmm0,xmm1 ; xmm0=tmp2
|
||||||
|
addps xmm6,xmm5 ; xmm6=tmp0
|
||||||
|
addps xmm7,xmm1 ; xmm7=tmp1
|
||||||
|
|
||||||
|
movaps XMMWORD [wk(1)], xmm4 ; tmp3
|
||||||
|
movaps XMMWORD [wk(0)], xmm0 ; tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movaps xmm2, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm3, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm5, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
movaps xmm1, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_FAST_FLOAT)]
|
||||||
|
|
||||||
|
movaps xmm4,xmm2
|
||||||
|
movaps xmm0,xmm5
|
||||||
|
addps xmm2,xmm1 ; xmm2=z11
|
||||||
|
addps xmm5,xmm3 ; xmm5=z13
|
||||||
|
subps xmm4,xmm1 ; xmm4=z12
|
||||||
|
subps xmm0,xmm3 ; xmm0=z10
|
||||||
|
|
||||||
|
movaps xmm1,xmm2
|
||||||
|
subps xmm2,xmm5
|
||||||
|
addps xmm1,xmm5 ; xmm1=tmp7
|
||||||
|
|
||||||
|
mulps xmm2,[GOTOFF(ebx,PD_1_414)] ; xmm2=tmp11
|
||||||
|
|
||||||
|
movaps xmm3,xmm0
|
||||||
|
addps xmm0,xmm4
|
||||||
|
mulps xmm0,[GOTOFF(ebx,PD_1_847)] ; xmm0=z5
|
||||||
|
mulps xmm3,[GOTOFF(ebx,PD_M2_613)] ; xmm3=(z10 * -2.613125930)
|
||||||
|
mulps xmm4,[GOTOFF(ebx,PD_1_082)] ; xmm4=(z12 * 1.082392200)
|
||||||
|
addps xmm3,xmm0 ; xmm3=tmp12
|
||||||
|
subps xmm4,xmm0 ; xmm4=tmp10
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
subps xmm3,xmm1 ; xmm3=tmp6
|
||||||
|
movaps xmm5,xmm6
|
||||||
|
movaps xmm0,xmm7
|
||||||
|
addps xmm6,xmm1 ; xmm6=data0=(00 10 20 30)
|
||||||
|
addps xmm7,xmm3 ; xmm7=data1=(01 11 21 31)
|
||||||
|
subps xmm5,xmm1 ; xmm5=data7=(07 17 27 37)
|
||||||
|
subps xmm0,xmm3 ; xmm0=data6=(06 16 26 36)
|
||||||
|
subps xmm2,xmm3 ; xmm2=tmp5
|
||||||
|
|
||||||
|
movaps xmm1,[GOTOFF(ebx,PD_RNDINT_MAGIC)] ; xmm1=[PD_RNDINT_MAGIC]
|
||||||
|
pcmpeqd xmm3,xmm3
|
||||||
|
psrld xmm3,WORD_BIT ; xmm3={0xFFFF 0x0000 0xFFFF 0x0000 ..}
|
||||||
|
|
||||||
|
addps xmm6,xmm1 ; xmm6=roundint(data0/8)=(00 ** 10 ** 20 ** 30 **)
|
||||||
|
addps xmm7,xmm1 ; xmm7=roundint(data1/8)=(01 ** 11 ** 21 ** 31 **)
|
||||||
|
addps xmm0,xmm1 ; xmm0=roundint(data6/8)=(06 ** 16 ** 26 ** 36 **)
|
||||||
|
addps xmm5,xmm1 ; xmm5=roundint(data7/8)=(07 ** 17 ** 27 ** 37 **)
|
||||||
|
|
||||||
|
pand xmm6,xmm3 ; xmm6=(00 -- 10 -- 20 -- 30 --)
|
||||||
|
pslld xmm7,WORD_BIT ; xmm7=(-- 01 -- 11 -- 21 -- 31)
|
||||||
|
pand xmm0,xmm3 ; xmm0=(06 -- 16 -- 26 -- 36 --)
|
||||||
|
pslld xmm5,WORD_BIT ; xmm5=(-- 07 -- 17 -- 27 -- 37)
|
||||||
|
por xmm6,xmm7 ; xmm6=(00 01 10 11 20 21 30 31)
|
||||||
|
por xmm0,xmm5 ; xmm0=(06 07 16 17 26 27 36 37)
|
||||||
|
|
||||||
|
movaps xmm1, XMMWORD [wk(0)] ; xmm1=tmp2
|
||||||
|
movaps xmm3, XMMWORD [wk(1)] ; xmm3=tmp3
|
||||||
|
|
||||||
|
addps xmm4,xmm2 ; xmm4=tmp4
|
||||||
|
movaps xmm7,xmm1
|
||||||
|
movaps xmm5,xmm3
|
||||||
|
addps xmm1,xmm2 ; xmm1=data2=(02 12 22 32)
|
||||||
|
addps xmm3,xmm4 ; xmm3=data4=(04 14 24 34)
|
||||||
|
subps xmm7,xmm2 ; xmm7=data5=(05 15 25 35)
|
||||||
|
subps xmm5,xmm4 ; xmm5=data3=(03 13 23 33)
|
||||||
|
|
||||||
|
movaps xmm2,[GOTOFF(ebx,PD_RNDINT_MAGIC)] ; xmm2=[PD_RNDINT_MAGIC]
|
||||||
|
pcmpeqd xmm4,xmm4
|
||||||
|
psrld xmm4,WORD_BIT ; xmm4={0xFFFF 0x0000 0xFFFF 0x0000 ..}
|
||||||
|
|
||||||
|
addps xmm3,xmm2 ; xmm3=roundint(data4/8)=(04 ** 14 ** 24 ** 34 **)
|
||||||
|
addps xmm7,xmm2 ; xmm7=roundint(data5/8)=(05 ** 15 ** 25 ** 35 **)
|
||||||
|
addps xmm1,xmm2 ; xmm1=roundint(data2/8)=(02 ** 12 ** 22 ** 32 **)
|
||||||
|
addps xmm5,xmm2 ; xmm5=roundint(data3/8)=(03 ** 13 ** 23 ** 33 **)
|
||||||
|
|
||||||
|
pand xmm3,xmm4 ; xmm3=(04 -- 14 -- 24 -- 34 --)
|
||||||
|
pslld xmm7,WORD_BIT ; xmm7=(-- 05 -- 15 -- 25 -- 35)
|
||||||
|
pand xmm1,xmm4 ; xmm1=(02 -- 12 -- 22 -- 32 --)
|
||||||
|
pslld xmm5,WORD_BIT ; xmm5=(-- 03 -- 13 -- 23 -- 33)
|
||||||
|
por xmm3,xmm7 ; xmm3=(04 05 14 15 24 25 34 35)
|
||||||
|
por xmm1,xmm5 ; xmm1=(02 03 12 13 22 23 32 33)
|
||||||
|
|
||||||
|
movdqa xmm2,[GOTOFF(ebx,PB_CENTERJSAMP)] ; xmm2=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packsswb xmm6,xmm3 ; xmm6=(00 01 10 11 20 21 30 31 04 05 14 15 24 25 34 35)
|
||||||
|
packsswb xmm1,xmm0 ; xmm1=(02 03 12 13 22 23 32 33 06 07 16 17 26 27 36 37)
|
||||||
|
paddb xmm6,xmm2
|
||||||
|
paddb xmm1,xmm2
|
||||||
|
|
||||||
|
movdqa xmm4,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd xmm6,xmm1 ; xmm6=(00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33)
|
||||||
|
punpckhwd xmm4,xmm1 ; xmm4=(04 05 06 07 14 15 16 17 24 25 26 27 34 35 36 37)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm6 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq xmm6,xmm4 ; xmm6=(00 01 02 03 04 05 06 07 10 11 12 13 14 15 16 17)
|
||||||
|
punpckhdq xmm7,xmm4 ; xmm7=(20 21 22 23 24 25 26 27 30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
pshufd xmm5,xmm6,0x4E ; xmm5=(10 11 12 13 14 15 16 17 00 01 02 03 04 05 06 07)
|
||||||
|
pshufd xmm3,xmm7,0x4E ; xmm3=(30 31 32 33 34 35 36 37 20 21 22 23 24 25 26 27)
|
||||||
|
|
||||||
|
pushpic ebx ; save GOT address
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm6
|
||||||
|
movq _MMWORD [ebx+eax*SIZEOF_JSAMPLE], xmm7
|
||||||
|
mov edx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
mov ebx, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm5
|
||||||
|
movq _MMWORD [ebx+eax*SIZEOF_JSAMPLE], xmm3
|
||||||
|
|
||||||
|
poppic ebx ; restore GOT address
|
||||||
|
|
||||||
|
add esi, byte 4*SIZEOF_FAST_FLOAT ; wsptr
|
||||||
|
add edi, byte 4*SIZEOF_JSAMPROW
|
||||||
|
dec ecx ; ctr
|
||||||
|
jnz near .rowloop
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_FLT_SSE_SSE2_SUPPORTED
|
||||||
|
%endif ; DCT_FLOAT_SUPPORTED
|
||||||
512
jiss2fst.asm
Normal file
512
jiss2fst.asm
Normal file
@@ -0,0 +1,512 @@
|
|||||||
|
;
|
||||||
|
; jiss2fst.asm - fast integer IDCT (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a fast, not so accurate integer implementation of
|
||||||
|
; the inverse DCT (Discrete Cosine Transform). The following code is
|
||||||
|
; based directly on the IJG's original jidctfst.c; see the jidctfst.c
|
||||||
|
; for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_IFAST_SUPPORTED
|
||||||
|
%ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 8 ; 14 is also OK.
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%if IFAST_SCALE_BITS != PASS1_BITS
|
||||||
|
%error "'IFAST_SCALE_BITS' must be equal to 'PASS1_BITS'."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
%if CONST_BITS == 8
|
||||||
|
F_1_082 equ 277 ; FIX(1.082392200)
|
||||||
|
F_1_414 equ 362 ; FIX(1.414213562)
|
||||||
|
F_1_847 equ 473 ; FIX(1.847759065)
|
||||||
|
F_2_613 equ 669 ; FIX(2.613125930)
|
||||||
|
F_1_613 equ (F_2_613 - 256) ; FIX(2.613125930) - FIX(1)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_1_082 equ DESCALE(1162209775,30-CONST_BITS) ; FIX(1.082392200)
|
||||||
|
F_1_414 equ DESCALE(1518500249,30-CONST_BITS) ; FIX(1.414213562)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_2_613 equ DESCALE(2805822602,30-CONST_BITS) ; FIX(2.613125930)
|
||||||
|
F_1_613 equ (F_2_613 - (1 << CONST_BITS)) ; FIX(2.613125930) - FIX(1)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
; PRE_MULTIPLY_SCALE_BITS <= 2 (to avoid overflow)
|
||||||
|
; CONST_BITS + CONST_SHIFT + PRE_MULTIPLY_SCALE_BITS == 16 (for pmulhw)
|
||||||
|
|
||||||
|
%define PRE_MULTIPLY_SCALE_BITS 2
|
||||||
|
%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS)
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_ifast_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_ifast_sse2):
|
||||||
|
|
||||||
|
PW_F1414 times 8 dw F_1_414 << CONST_SHIFT
|
||||||
|
PW_F1847 times 8 dw F_1_847 << CONST_SHIFT
|
||||||
|
PW_MF1613 times 8 dw -F_1_613 << CONST_SHIFT
|
||||||
|
PW_F1082 times 8 dw F_1_082 << CONST_SHIFT
|
||||||
|
PB_CENTERJSAMP times 16 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_ifast_sse2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_ifast_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_ifast_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_IFAST_SSE2
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz near .columnDCT
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1, XMMWORD [XMMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1,xmm0
|
||||||
|
packsswb xmm1,xmm1
|
||||||
|
packsswb xmm1,xmm1
|
||||||
|
movd eax,xmm1
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
movdqa xmm7,xmm0 ; xmm0=in0=(00 01 02 03 04 05 06 07)
|
||||||
|
punpcklwd xmm0,xmm0 ; xmm0=(00 00 01 01 02 02 03 03)
|
||||||
|
punpckhwd xmm7,xmm7 ; xmm7=(04 04 05 05 06 06 07 07)
|
||||||
|
|
||||||
|
pshufd xmm6,xmm0,0x00 ; xmm6=col0=(00 00 00 00 00 00 00 00)
|
||||||
|
pshufd xmm2,xmm0,0x55 ; xmm2=col1=(01 01 01 01 01 01 01 01)
|
||||||
|
pshufd xmm5,xmm0,0xAA ; xmm5=col2=(02 02 02 02 02 02 02 02)
|
||||||
|
pshufd xmm0,xmm0,0xFF ; xmm0=col3=(03 03 03 03 03 03 03 03)
|
||||||
|
pshufd xmm1,xmm7,0x00 ; xmm1=col4=(04 04 04 04 04 04 04 04)
|
||||||
|
pshufd xmm4,xmm7,0x55 ; xmm4=col5=(05 05 05 05 05 05 05 05)
|
||||||
|
pshufd xmm3,xmm7,0xAA ; xmm3=col6=(06 06 06 06 06 06 06 06)
|
||||||
|
pshufd xmm7,xmm7,0xFF ; xmm7=col7=(07 07 07 07 07 07 07 07)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm2 ; wk(0)=col1
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=col3
|
||||||
|
jmp near .column_end
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw xmm1, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm2, XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw xmm3, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
psubw xmm0,xmm2 ; xmm0=tmp11
|
||||||
|
psubw xmm1,xmm3
|
||||||
|
paddw xmm4,xmm2 ; xmm4=tmp10
|
||||||
|
paddw xmm5,xmm3 ; xmm5=tmp13
|
||||||
|
|
||||||
|
psllw xmm1,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm1,[GOTOFF(ebx,PW_F1414)]
|
||||||
|
psubw xmm1,xmm5 ; xmm1=tmp12
|
||||||
|
|
||||||
|
movdqa xmm6,xmm4
|
||||||
|
movdqa xmm7,xmm0
|
||||||
|
psubw xmm4,xmm5 ; xmm4=tmp3
|
||||||
|
psubw xmm0,xmm1 ; xmm0=tmp2
|
||||||
|
paddw xmm6,xmm5 ; xmm6=tmp0
|
||||||
|
paddw xmm7,xmm1 ; xmm7=tmp1
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(1)], xmm4 ; wk(1)=tmp3
|
||||||
|
movdqa XMMWORD [wk(0)], xmm0 ; wk(0)=tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm2, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw xmm3, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
movdqa xmm5, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm5, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
pmullw xmm1, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_IFAST_MULT_TYPE)]
|
||||||
|
|
||||||
|
movdqa xmm4,xmm2
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
psubw xmm2,xmm1 ; xmm2=z12
|
||||||
|
psubw xmm5,xmm3 ; xmm5=z10
|
||||||
|
paddw xmm4,xmm1 ; xmm4=z11
|
||||||
|
paddw xmm0,xmm3 ; xmm0=z13
|
||||||
|
|
||||||
|
movdqa xmm1,xmm5 ; xmm1=z10(unscaled)
|
||||||
|
psllw xmm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw xmm5,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
movdqa xmm3,xmm4
|
||||||
|
psubw xmm4,xmm0
|
||||||
|
paddw xmm3,xmm0 ; xmm3=tmp7
|
||||||
|
|
||||||
|
psllw xmm4,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm4,[GOTOFF(ebx,PW_F1414)] ; xmm4=tmp11
|
||||||
|
|
||||||
|
; To avoid overflow...
|
||||||
|
;
|
||||||
|
; (Original)
|
||||||
|
; tmp12 = -2.613125930 * z10 + z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp12 = (-1.613125930 - 1) * z10 + z5;
|
||||||
|
; = -1.613125930 * z10 - z10 + z5;
|
||||||
|
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
paddw xmm5,xmm2
|
||||||
|
pmulhw xmm5,[GOTOFF(ebx,PW_F1847)] ; xmm5=z5
|
||||||
|
pmulhw xmm0,[GOTOFF(ebx,PW_MF1613)]
|
||||||
|
pmulhw xmm2,[GOTOFF(ebx,PW_F1082)]
|
||||||
|
psubw xmm0,xmm1
|
||||||
|
psubw xmm2,xmm5 ; xmm2=tmp10
|
||||||
|
paddw xmm0,xmm5 ; xmm0=tmp12
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
psubw xmm0,xmm3 ; xmm0=tmp6
|
||||||
|
movdqa xmm1,xmm6
|
||||||
|
movdqa xmm5,xmm7
|
||||||
|
paddw xmm6,xmm3 ; xmm6=data0=(00 01 02 03 04 05 06 07)
|
||||||
|
paddw xmm7,xmm0 ; xmm7=data1=(10 11 12 13 14 15 16 17)
|
||||||
|
psubw xmm1,xmm3 ; xmm1=data7=(70 71 72 73 74 75 76 77)
|
||||||
|
psubw xmm5,xmm0 ; xmm5=data6=(60 61 62 63 64 65 66 67)
|
||||||
|
psubw xmm4,xmm0 ; xmm4=tmp5
|
||||||
|
|
||||||
|
movdqa xmm3,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm6,xmm7 ; xmm6=(00 10 01 11 02 12 03 13)
|
||||||
|
punpckhwd xmm3,xmm7 ; xmm3=(04 14 05 15 06 16 07 17)
|
||||||
|
movdqa xmm0,xmm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm5,xmm1 ; xmm5=(60 70 61 71 62 72 63 73)
|
||||||
|
punpckhwd xmm0,xmm1 ; xmm0=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [wk(0)] ; xmm7=tmp2
|
||||||
|
movdqa xmm1, XMMWORD [wk(1)] ; xmm1=tmp3
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm5 ; wk(0)=(60 70 61 71 62 72 63 73)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
paddw xmm2,xmm4 ; xmm2=tmp4
|
||||||
|
movdqa xmm5,xmm7
|
||||||
|
movdqa xmm0,xmm1
|
||||||
|
paddw xmm7,xmm4 ; xmm7=data2=(20 21 22 23 24 25 26 27)
|
||||||
|
paddw xmm1,xmm2 ; xmm1=data4=(40 41 42 43 44 45 46 47)
|
||||||
|
psubw xmm5,xmm4 ; xmm5=data5=(50 51 52 53 54 55 56 57)
|
||||||
|
psubw xmm0,xmm2 ; xmm0=data3=(30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm7 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm7,xmm0 ; xmm7=(20 30 21 31 22 32 23 33)
|
||||||
|
punpckhwd xmm4,xmm0 ; xmm4=(24 34 25 35 26 36 27 37)
|
||||||
|
movdqa xmm2,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm1,xmm5 ; xmm1=(40 50 41 51 42 52 43 53)
|
||||||
|
punpckhwd xmm2,xmm5 ; xmm2=(44 54 45 55 46 56 47 57)
|
||||||
|
|
||||||
|
movdqa xmm0,xmm3 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm3,xmm4 ; xmm3=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq xmm0,xmm4 ; xmm0=(06 16 26 36 07 17 27 37)
|
||||||
|
movdqa xmm5,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm6,xmm7 ; xmm6=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq xmm5,xmm7 ; xmm5=(02 12 22 32 03 13 23 33)
|
||||||
|
|
||||||
|
movdqa xmm4, XMMWORD [wk(0)] ; xmm4=(60 70 61 71 62 72 63 73)
|
||||||
|
movdqa xmm7, XMMWORD [wk(1)] ; xmm7=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm3 ; wk(0)=(04 14 24 34 05 15 25 35)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa xmm3,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm1,xmm4 ; xmm1=(40 50 60 70 41 51 61 71)
|
||||||
|
punpckhdq xmm3,xmm4 ; xmm3=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa xmm0,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm2,xmm7 ; xmm2=(44 54 64 74 45 55 65 75)
|
||||||
|
punpckhdq xmm0,xmm7 ; xmm0=(46 56 66 76 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm6 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm6,xmm1 ; xmm6=col0=(00 10 20 30 40 50 60 70)
|
||||||
|
punpckhqdq xmm4,xmm1 ; xmm4=col1=(01 11 21 31 41 51 61 71)
|
||||||
|
movdqa xmm7,xmm5 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm5,xmm3 ; xmm5=col2=(02 12 22 32 42 52 62 72)
|
||||||
|
punpckhqdq xmm7,xmm3 ; xmm7=col3=(03 13 23 33 43 53 63 73)
|
||||||
|
|
||||||
|
movdqa xmm1, XMMWORD [wk(0)] ; xmm1=(04 14 24 34 05 15 25 35)
|
||||||
|
movdqa xmm3, XMMWORD [wk(1)] ; xmm3=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm4 ; wk(0)=col1
|
||||||
|
movdqa XMMWORD [wk(1)], xmm7 ; wk(1)=col3
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm1,xmm2 ; xmm1=col4=(04 14 24 34 44 54 64 74)
|
||||||
|
punpckhqdq xmm4,xmm2 ; xmm4=col5=(05 15 25 35 45 55 65 75)
|
||||||
|
movdqa xmm7,xmm3 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm3,xmm0 ; xmm3=col6=(06 16 26 36 46 56 66 76)
|
||||||
|
punpckhqdq xmm7,xmm0 ; xmm7=col7=(07 17 27 37 47 57 67 77)
|
||||||
|
.column_end:
|
||||||
|
|
||||||
|
; -- Prefetch the next coefficient block
|
||||||
|
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 0*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 1*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 2*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 3*32]
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
; xmm6=col0, xmm5=col2, xmm1=col4, xmm3=col6
|
||||||
|
|
||||||
|
movdqa xmm2,xmm6
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
psubw xmm6,xmm1 ; xmm6=tmp11
|
||||||
|
psubw xmm5,xmm3
|
||||||
|
paddw xmm2,xmm1 ; xmm2=tmp10
|
||||||
|
paddw xmm0,xmm3 ; xmm0=tmp13
|
||||||
|
|
||||||
|
psllw xmm5,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm5,[GOTOFF(ebx,PW_F1414)]
|
||||||
|
psubw xmm5,xmm0 ; xmm5=tmp12
|
||||||
|
|
||||||
|
movdqa xmm1,xmm2
|
||||||
|
movdqa xmm3,xmm6
|
||||||
|
psubw xmm2,xmm0 ; xmm2=tmp3
|
||||||
|
psubw xmm6,xmm5 ; xmm6=tmp2
|
||||||
|
paddw xmm1,xmm0 ; xmm1=tmp0
|
||||||
|
paddw xmm3,xmm5 ; xmm3=tmp1
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [wk(0)] ; xmm0=col1
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=col3
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm2 ; wk(0)=tmp3
|
||||||
|
movdqa XMMWORD [wk(1)], xmm6 ; wk(1)=tmp2
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
; xmm0=col1, xmm5=col3, xmm4=col5, xmm7=col7
|
||||||
|
|
||||||
|
movdqa xmm2,xmm0
|
||||||
|
movdqa xmm6,xmm4
|
||||||
|
psubw xmm0,xmm7 ; xmm0=z12
|
||||||
|
psubw xmm4,xmm5 ; xmm4=z10
|
||||||
|
paddw xmm2,xmm7 ; xmm2=z11
|
||||||
|
paddw xmm6,xmm5 ; xmm6=z13
|
||||||
|
|
||||||
|
movdqa xmm7,xmm4 ; xmm7=z10(unscaled)
|
||||||
|
psllw xmm0,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
psllw xmm4,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
|
||||||
|
movdqa xmm5,xmm2
|
||||||
|
psubw xmm2,xmm6
|
||||||
|
paddw xmm5,xmm6 ; xmm5=tmp7
|
||||||
|
|
||||||
|
psllw xmm2,PRE_MULTIPLY_SCALE_BITS
|
||||||
|
pmulhw xmm2,[GOTOFF(ebx,PW_F1414)] ; xmm2=tmp11
|
||||||
|
|
||||||
|
; To avoid overflow...
|
||||||
|
;
|
||||||
|
; (Original)
|
||||||
|
; tmp12 = -2.613125930 * z10 + z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp12 = (-1.613125930 - 1) * z10 + z5;
|
||||||
|
; = -1.613125930 * z10 - z10 + z5;
|
||||||
|
|
||||||
|
movdqa xmm6,xmm4
|
||||||
|
paddw xmm4,xmm0
|
||||||
|
pmulhw xmm4,[GOTOFF(ebx,PW_F1847)] ; xmm4=z5
|
||||||
|
pmulhw xmm6,[GOTOFF(ebx,PW_MF1613)]
|
||||||
|
pmulhw xmm0,[GOTOFF(ebx,PW_F1082)]
|
||||||
|
psubw xmm6,xmm7
|
||||||
|
psubw xmm0,xmm4 ; xmm0=tmp10
|
||||||
|
paddw xmm6,xmm4 ; xmm6=tmp12
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
psubw xmm6,xmm5 ; xmm6=tmp6
|
||||||
|
movdqa xmm7,xmm1
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
paddw xmm1,xmm5 ; xmm1=data0=(00 10 20 30 40 50 60 70)
|
||||||
|
paddw xmm3,xmm6 ; xmm3=data1=(01 11 21 31 41 51 61 71)
|
||||||
|
psraw xmm1,(PASS1_BITS+3) ; descale
|
||||||
|
psraw xmm3,(PASS1_BITS+3) ; descale
|
||||||
|
psubw xmm7,xmm5 ; xmm7=data7=(07 17 27 37 47 57 67 77)
|
||||||
|
psubw xmm4,xmm6 ; xmm4=data6=(06 16 26 36 46 56 66 76)
|
||||||
|
psraw xmm7,(PASS1_BITS+3) ; descale
|
||||||
|
psraw xmm4,(PASS1_BITS+3) ; descale
|
||||||
|
psubw xmm2,xmm6 ; xmm2=tmp5
|
||||||
|
|
||||||
|
packsswb xmm1,xmm4 ; xmm1=(00 10 20 30 40 50 60 70 06 16 26 36 46 56 66 76)
|
||||||
|
packsswb xmm3,xmm7 ; xmm3=(01 11 21 31 41 51 61 71 07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [wk(1)] ; xmm5=tmp2
|
||||||
|
movdqa xmm6, XMMWORD [wk(0)] ; xmm6=tmp3
|
||||||
|
|
||||||
|
paddw xmm0,xmm2 ; xmm0=tmp4
|
||||||
|
movdqa xmm4,xmm5
|
||||||
|
movdqa xmm7,xmm6
|
||||||
|
paddw xmm5,xmm2 ; xmm5=data2=(02 12 22 32 42 52 62 72)
|
||||||
|
paddw xmm6,xmm0 ; xmm6=data4=(04 14 24 34 44 54 64 74)
|
||||||
|
psraw xmm5,(PASS1_BITS+3) ; descale
|
||||||
|
psraw xmm6,(PASS1_BITS+3) ; descale
|
||||||
|
psubw xmm4,xmm2 ; xmm4=data5=(05 15 25 35 45 55 65 75)
|
||||||
|
psubw xmm7,xmm0 ; xmm7=data3=(03 13 23 33 43 53 63 73)
|
||||||
|
psraw xmm4,(PASS1_BITS+3) ; descale
|
||||||
|
psraw xmm7,(PASS1_BITS+3) ; descale
|
||||||
|
|
||||||
|
movdqa xmm2,[GOTOFF(ebx,PB_CENTERJSAMP)] ; xmm2=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packsswb xmm5,xmm6 ; xmm5=(02 12 22 32 42 52 62 72 04 14 24 34 44 54 64 74)
|
||||||
|
packsswb xmm7,xmm4 ; xmm7=(03 13 23 33 43 53 63 73 05 15 25 35 45 55 65 75)
|
||||||
|
|
||||||
|
paddb xmm1,xmm2
|
||||||
|
paddb xmm3,xmm2
|
||||||
|
paddb xmm5,xmm2
|
||||||
|
paddb xmm7,xmm2
|
||||||
|
|
||||||
|
movdqa xmm0,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw xmm1,xmm3 ; xmm1=(00 01 10 11 20 21 30 31 40 41 50 51 60 61 70 71)
|
||||||
|
punpckhbw xmm0,xmm3 ; xmm0=(06 07 16 17 26 27 36 37 46 47 56 57 66 67 76 77)
|
||||||
|
movdqa xmm6,xmm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw xmm5,xmm7 ; xmm5=(02 03 12 13 22 23 32 33 42 43 52 53 62 63 72 73)
|
||||||
|
punpckhbw xmm6,xmm7 ; xmm6=(04 05 14 15 24 25 34 35 44 45 54 55 64 65 74 75)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd xmm1,xmm5 ; xmm1=(00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33)
|
||||||
|
punpckhwd xmm4,xmm5 ; xmm4=(40 41 42 43 50 51 52 53 60 61 62 63 70 71 72 73)
|
||||||
|
movdqa xmm2,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd xmm6,xmm0 ; xmm6=(04 05 06 07 14 15 16 17 24 25 26 27 34 35 36 37)
|
||||||
|
punpckhwd xmm2,xmm0 ; xmm2=(44 45 46 47 54 55 56 57 64 65 66 67 74 75 76 77)
|
||||||
|
|
||||||
|
movdqa xmm3,xmm1 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq xmm1,xmm6 ; xmm1=(00 01 02 03 04 05 06 07 10 11 12 13 14 15 16 17)
|
||||||
|
punpckhdq xmm3,xmm6 ; xmm3=(20 21 22 23 24 25 26 27 30 31 32 33 34 35 36 37)
|
||||||
|
movdqa xmm7,xmm4 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq xmm4,xmm2 ; xmm4=(40 41 42 43 44 45 46 47 50 51 52 53 54 55 56 57)
|
||||||
|
punpckhdq xmm7,xmm2 ; xmm7=(60 61 62 63 64 65 66 67 70 71 72 73 74 75 76 77)
|
||||||
|
|
||||||
|
pshufd xmm5,xmm1,0x4E ; xmm5=(10 11 12 13 14 15 16 17 00 01 02 03 04 05 06 07)
|
||||||
|
pshufd xmm0,xmm3,0x4E ; xmm0=(30 31 32 33 34 35 36 37 20 21 22 23 24 25 26 27)
|
||||||
|
pshufd xmm6,xmm4,0x4E ; xmm6=(50 51 52 53 54 55 56 57 40 41 42 43 44 45 46 47)
|
||||||
|
pshufd xmm2,xmm7,0x4E ; xmm2=(70 71 72 73 74 75 76 77 60 61 62 63 64 65 66 67)
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm1
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm3
|
||||||
|
mov edx, JSAMPROW [edi+4*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+6*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm4
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm7
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm5
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm0
|
||||||
|
mov edx, JSAMPROW [edi+5*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+7*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm6
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm2
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
%endif ; DCT_IFAST_SUPPORTED
|
||||||
869
jiss2int.asm
Normal file
869
jiss2int.asm
Normal file
@@ -0,0 +1,869 @@
|
|||||||
|
;
|
||||||
|
; jiss2int.asm - accurate integer IDCT (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains a slow-but-accurate integer implementation of the
|
||||||
|
; inverse DCT (Discrete Cosine Transform). The following code is based
|
||||||
|
; directly on the IJG's original jidctint.c; see the jidctint.c for
|
||||||
|
; more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef DCT_ISLOW_SUPPORTED
|
||||||
|
%ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%define DESCALE_P1 (CONST_BITS-PASS1_BITS)
|
||||||
|
%define DESCALE_P2 (CONST_BITS+PASS1_BITS+3)
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_298 equ 2446 ; FIX(0.298631336)
|
||||||
|
F_0_390 equ 3196 ; FIX(0.390180644)
|
||||||
|
F_0_541 equ 4433 ; FIX(0.541196100)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_175 equ 9633 ; FIX(1.175875602)
|
||||||
|
F_1_501 equ 12299 ; FIX(1.501321110)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_1_961 equ 16069 ; FIX(1.961570560)
|
||||||
|
F_2_053 equ 16819 ; FIX(2.053119869)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_072 equ 25172 ; FIX(3.072711026)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_298 equ DESCALE( 320652955,30-CONST_BITS) ; FIX(0.298631336)
|
||||||
|
F_0_390 equ DESCALE( 418953276,30-CONST_BITS) ; FIX(0.390180644)
|
||||||
|
F_0_541 equ DESCALE( 581104887,30-CONST_BITS) ; FIX(0.541196100)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_175 equ DESCALE(1262586813,30-CONST_BITS) ; FIX(1.175875602)
|
||||||
|
F_1_501 equ DESCALE(1612031267,30-CONST_BITS) ; FIX(1.501321110)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_1_961 equ DESCALE(2106220350,30-CONST_BITS) ; FIX(1.961570560)
|
||||||
|
F_2_053 equ DESCALE(2204520673,30-CONST_BITS) ; FIX(2.053119869)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_072 equ DESCALE(3299298341,30-CONST_BITS) ; FIX(3.072711026)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_islow_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_islow_sse2):
|
||||||
|
|
||||||
|
PW_F130_F054 times 4 dw (F_0_541+F_0_765), F_0_541
|
||||||
|
PW_F054_MF130 times 4 dw F_0_541, (F_0_541-F_1_847)
|
||||||
|
PW_MF078_F117 times 4 dw (F_1_175-F_1_961), F_1_175
|
||||||
|
PW_F117_F078 times 4 dw F_1_175, (F_1_175-F_0_390)
|
||||||
|
PW_MF060_MF089 times 4 dw (F_0_298-F_0_899),-F_0_899
|
||||||
|
PW_MF089_F060 times 4 dw -F_0_899, (F_1_501-F_0_899)
|
||||||
|
PW_MF050_MF256 times 4 dw (F_2_053-F_2_562),-F_2_562
|
||||||
|
PW_MF256_F050 times 4 dw -F_2_562, (F_3_072-F_2_562)
|
||||||
|
PD_DESCALE_P1 times 4 dd 1 << (DESCALE_P1-1)
|
||||||
|
PD_DESCALE_P2 times 4 dd 1 << (DESCALE_P2-1)
|
||||||
|
PB_CENTERJSAMP times 16 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_islow_sse2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 12
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_islow_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_islow_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_ISLOW_SSE2
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz near .columnDCT
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1, XMMWORD [XMMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1,xmm0
|
||||||
|
packsswb xmm1,xmm1
|
||||||
|
packsswb xmm1,xmm1
|
||||||
|
movd eax,xmm1
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm5, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
psllw xmm5,PASS1_BITS
|
||||||
|
|
||||||
|
movdqa xmm4,xmm5 ; xmm5=in0=(00 01 02 03 04 05 06 07)
|
||||||
|
punpcklwd xmm5,xmm5 ; xmm5=(00 00 01 01 02 02 03 03)
|
||||||
|
punpckhwd xmm4,xmm4 ; xmm4=(04 04 05 05 06 06 07 07)
|
||||||
|
|
||||||
|
pshufd xmm7,xmm5,0x00 ; xmm7=col0=(00 00 00 00 00 00 00 00)
|
||||||
|
pshufd xmm6,xmm5,0x55 ; xmm6=col1=(01 01 01 01 01 01 01 01)
|
||||||
|
pshufd xmm1,xmm5,0xAA ; xmm1=col2=(02 02 02 02 02 02 02 02)
|
||||||
|
pshufd xmm5,xmm5,0xFF ; xmm5=col3=(03 03 03 03 03 03 03 03)
|
||||||
|
pshufd xmm0,xmm4,0x00 ; xmm0=col4=(04 04 04 04 04 04 04 04)
|
||||||
|
pshufd xmm3,xmm4,0x55 ; xmm3=col5=(05 05 05 05 05 05 05 05)
|
||||||
|
pshufd xmm2,xmm4,0xAA ; xmm2=col6=(06 06 06 06 06 06 06 06)
|
||||||
|
pshufd xmm4,xmm4,0xFF ; xmm4=col7=(07 07 07 07 07 07 07 07)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(8)], xmm6 ; wk(8)=col1
|
||||||
|
movdqa XMMWORD [wk(9)], xmm5 ; wk(9)=col3
|
||||||
|
movdqa XMMWORD [wk(10)], xmm3 ; wk(10)=col5
|
||||||
|
movdqa XMMWORD [wk(11)], xmm4 ; wk(11)=col7
|
||||||
|
jmp near .column_end
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm1, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(4,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm2, XMMWORD [XMMBLOCK(4,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm3, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (z2 + z3) * 0.541196100;
|
||||||
|
; tmp2 = z1 + z3 * -1.847759065;
|
||||||
|
; tmp3 = z1 + z2 * 0.765366865;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp2 = z2 * 0.541196100 + z3 * (0.541196100 - 1.847759065);
|
||||||
|
; tmp3 = z2 * (0.541196100 + 0.765366865) + z3 * 0.541196100;
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1 ; xmm1=in2=z2
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
punpcklwd xmm4,xmm3 ; xmm3=in6=z3
|
||||||
|
punpckhwd xmm5,xmm3
|
||||||
|
movdqa xmm1,xmm4
|
||||||
|
movdqa xmm3,xmm5
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_F130_F054)] ; xmm4=tmp3L
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F130_F054)] ; xmm5=tmp3H
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_F054_MF130)] ; xmm1=tmp2L
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_F054_MF130)] ; xmm3=tmp2H
|
||||||
|
|
||||||
|
movdqa xmm6,xmm0
|
||||||
|
paddw xmm0,xmm2 ; xmm0=in0+in4
|
||||||
|
psubw xmm6,xmm2 ; xmm6=in0-in4
|
||||||
|
|
||||||
|
pxor xmm7,xmm7
|
||||||
|
pxor xmm2,xmm2
|
||||||
|
punpcklwd xmm7,xmm0 ; xmm7=tmp0L
|
||||||
|
punpckhwd xmm2,xmm0 ; xmm2=tmp0H
|
||||||
|
psrad xmm7,(16-CONST_BITS) ; psrad xmm7,16 & pslld xmm7,CONST_BITS
|
||||||
|
psrad xmm2,(16-CONST_BITS) ; psrad xmm2,16 & pslld xmm2,CONST_BITS
|
||||||
|
|
||||||
|
movdqa xmm0,xmm7
|
||||||
|
paddd xmm7,xmm4 ; xmm7=tmp10L
|
||||||
|
psubd xmm0,xmm4 ; xmm0=tmp13L
|
||||||
|
movdqa xmm4,xmm2
|
||||||
|
paddd xmm2,xmm5 ; xmm2=tmp10H
|
||||||
|
psubd xmm4,xmm5 ; xmm4=tmp13H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm7 ; wk(0)=tmp10L
|
||||||
|
movdqa XMMWORD [wk(1)], xmm2 ; wk(1)=tmp10H
|
||||||
|
movdqa XMMWORD [wk(2)], xmm0 ; wk(2)=tmp13L
|
||||||
|
movdqa XMMWORD [wk(3)], xmm4 ; wk(3)=tmp13H
|
||||||
|
|
||||||
|
pxor xmm5,xmm5
|
||||||
|
pxor xmm7,xmm7
|
||||||
|
punpcklwd xmm5,xmm6 ; xmm5=tmp1L
|
||||||
|
punpckhwd xmm7,xmm6 ; xmm7=tmp1H
|
||||||
|
psrad xmm5,(16-CONST_BITS) ; psrad xmm5,16 & pslld xmm5,CONST_BITS
|
||||||
|
psrad xmm7,(16-CONST_BITS) ; psrad xmm7,16 & pslld xmm7,CONST_BITS
|
||||||
|
|
||||||
|
movdqa xmm2,xmm5
|
||||||
|
paddd xmm5,xmm1 ; xmm5=tmp11L
|
||||||
|
psubd xmm2,xmm1 ; xmm2=tmp12L
|
||||||
|
movdqa xmm0,xmm7
|
||||||
|
paddd xmm7,xmm3 ; xmm7=tmp11H
|
||||||
|
psubd xmm0,xmm3 ; xmm0=tmp12H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(4)], xmm5 ; wk(4)=tmp11L
|
||||||
|
movdqa XMMWORD [wk(5)], xmm7 ; wk(5)=tmp11H
|
||||||
|
movdqa XMMWORD [wk(6)], xmm2 ; wk(6)=tmp12L
|
||||||
|
movdqa XMMWORD [wk(7)], xmm0 ; wk(7)=tmp12H
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm4, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm6, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm4, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm6, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm1, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm3, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
movdqa xmm5,xmm6
|
||||||
|
movdqa xmm7,xmm4
|
||||||
|
paddw xmm5,xmm3 ; xmm5=z3
|
||||||
|
paddw xmm7,xmm1 ; xmm7=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movdqa xmm2,xmm5
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
punpcklwd xmm2,xmm7
|
||||||
|
punpckhwd xmm0,xmm7
|
||||||
|
movdqa xmm5,xmm2
|
||||||
|
movdqa xmm7,xmm0
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_MF078_F117)] ; xmm2=z3L
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF078_F117)] ; xmm0=z3H
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F117_F078)] ; xmm5=z4L
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_F117_F078)] ; xmm7=z4H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(10)], xmm2 ; wk(10)=z3L
|
||||||
|
movdqa XMMWORD [wk(11)], xmm0 ; wk(11)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp0 + tmp3; z2 = tmp1 + tmp2;
|
||||||
|
; tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869;
|
||||||
|
; tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; tmp0 += z1 + z3; tmp1 += z2 + z4;
|
||||||
|
; tmp2 += z2 + z3; tmp3 += z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223;
|
||||||
|
; tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447;
|
||||||
|
; tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223);
|
||||||
|
; tmp0 += z3; tmp1 += z4;
|
||||||
|
; tmp2 += z3; tmp3 += z4;
|
||||||
|
|
||||||
|
movdqa xmm2,xmm3
|
||||||
|
movdqa xmm0,xmm3
|
||||||
|
punpcklwd xmm2,xmm4
|
||||||
|
punpckhwd xmm0,xmm4
|
||||||
|
movdqa xmm3,xmm2
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm2=tmp0L
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm0=tmp0H
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_MF089_F060)] ; xmm3=tmp3L
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_MF089_F060)] ; xmm4=tmp3H
|
||||||
|
|
||||||
|
paddd xmm2, XMMWORD [wk(10)] ; xmm2=tmp0L
|
||||||
|
paddd xmm0, XMMWORD [wk(11)] ; xmm0=tmp0H
|
||||||
|
paddd xmm3,xmm5 ; xmm3=tmp3L
|
||||||
|
paddd xmm4,xmm7 ; xmm4=tmp3H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(8)], xmm2 ; wk(8)=tmp0L
|
||||||
|
movdqa XMMWORD [wk(9)], xmm0 ; wk(9)=tmp0H
|
||||||
|
|
||||||
|
movdqa xmm2,xmm1
|
||||||
|
movdqa xmm0,xmm1
|
||||||
|
punpcklwd xmm2,xmm6
|
||||||
|
punpckhwd xmm0,xmm6
|
||||||
|
movdqa xmm1,xmm2
|
||||||
|
movdqa xmm6,xmm0
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm2=tmp1L
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm0=tmp1H
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF256_F050)] ; xmm1=tmp2L
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_MF256_F050)] ; xmm6=tmp2H
|
||||||
|
|
||||||
|
paddd xmm2,xmm5 ; xmm2=tmp1L
|
||||||
|
paddd xmm0,xmm7 ; xmm0=tmp1H
|
||||||
|
paddd xmm1, XMMWORD [wk(10)] ; xmm1=tmp2L
|
||||||
|
paddd xmm6, XMMWORD [wk(11)] ; xmm6=tmp2H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(10)], xmm2 ; wk(10)=tmp1L
|
||||||
|
movdqa XMMWORD [wk(11)], xmm0 ; wk(11)=tmp1H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [wk(0)] ; xmm5=tmp10L
|
||||||
|
movdqa xmm7, XMMWORD [wk(1)] ; xmm7=tmp10H
|
||||||
|
|
||||||
|
movdqa xmm2,xmm5
|
||||||
|
movdqa xmm0,xmm7
|
||||||
|
paddd xmm5,xmm3 ; xmm5=data0L
|
||||||
|
paddd xmm7,xmm4 ; xmm7=data0H
|
||||||
|
psubd xmm2,xmm3 ; xmm2=data7L
|
||||||
|
psubd xmm0,xmm4 ; xmm0=data7H
|
||||||
|
|
||||||
|
movdqa xmm3,[GOTOFF(ebx,PD_DESCALE_P1)] ; xmm3=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd xmm5,xmm3
|
||||||
|
paddd xmm7,xmm3
|
||||||
|
psrad xmm5,DESCALE_P1
|
||||||
|
psrad xmm7,DESCALE_P1
|
||||||
|
paddd xmm2,xmm3
|
||||||
|
paddd xmm0,xmm3
|
||||||
|
psrad xmm2,DESCALE_P1
|
||||||
|
psrad xmm0,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm5,xmm7 ; xmm5=data0=(00 01 02 03 04 05 06 07)
|
||||||
|
packssdw xmm2,xmm0 ; xmm2=data7=(70 71 72 73 74 75 76 77)
|
||||||
|
|
||||||
|
movdqa xmm4, XMMWORD [wk(4)] ; xmm4=tmp11L
|
||||||
|
movdqa xmm3, XMMWORD [wk(5)] ; xmm3=tmp11H
|
||||||
|
|
||||||
|
movdqa xmm7,xmm4
|
||||||
|
movdqa xmm0,xmm3
|
||||||
|
paddd xmm4,xmm1 ; xmm4=data1L
|
||||||
|
paddd xmm3,xmm6 ; xmm3=data1H
|
||||||
|
psubd xmm7,xmm1 ; xmm7=data6L
|
||||||
|
psubd xmm0,xmm6 ; xmm0=data6H
|
||||||
|
|
||||||
|
movdqa xmm1,[GOTOFF(ebx,PD_DESCALE_P1)] ; xmm1=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd xmm4,xmm1
|
||||||
|
paddd xmm3,xmm1
|
||||||
|
psrad xmm4,DESCALE_P1
|
||||||
|
psrad xmm3,DESCALE_P1
|
||||||
|
paddd xmm7,xmm1
|
||||||
|
paddd xmm0,xmm1
|
||||||
|
psrad xmm7,DESCALE_P1
|
||||||
|
psrad xmm0,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm4,xmm3 ; xmm4=data1=(10 11 12 13 14 15 16 17)
|
||||||
|
packssdw xmm7,xmm0 ; xmm7=data6=(60 61 62 63 64 65 66 67)
|
||||||
|
|
||||||
|
movdqa xmm6,xmm5 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm5,xmm4 ; xmm5=(00 10 01 11 02 12 03 13)
|
||||||
|
punpckhwd xmm6,xmm4 ; xmm6=(04 14 05 15 06 16 07 17)
|
||||||
|
movdqa xmm1,xmm7 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm7,xmm2 ; xmm7=(60 70 61 71 62 72 63 73)
|
||||||
|
punpckhwd xmm1,xmm2 ; xmm1=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa xmm3, XMMWORD [wk(6)] ; xmm3=tmp12L
|
||||||
|
movdqa xmm0, XMMWORD [wk(7)] ; xmm0=tmp12H
|
||||||
|
movdqa xmm4, XMMWORD [wk(10)] ; xmm4=tmp1L
|
||||||
|
movdqa xmm2, XMMWORD [wk(11)] ; xmm2=tmp1H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm5 ; wk(0)=(00 10 01 11 02 12 03 13)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm6 ; wk(1)=(04 14 05 15 06 16 07 17)
|
||||||
|
movdqa XMMWORD [wk(4)], xmm7 ; wk(4)=(60 70 61 71 62 72 63 73)
|
||||||
|
movdqa XMMWORD [wk(5)], xmm1 ; wk(5)=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa xmm5,xmm3
|
||||||
|
movdqa xmm6,xmm0
|
||||||
|
paddd xmm3,xmm4 ; xmm3=data2L
|
||||||
|
paddd xmm0,xmm2 ; xmm0=data2H
|
||||||
|
psubd xmm5,xmm4 ; xmm5=data5L
|
||||||
|
psubd xmm6,xmm2 ; xmm6=data5H
|
||||||
|
|
||||||
|
movdqa xmm7,[GOTOFF(ebx,PD_DESCALE_P1)] ; xmm7=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd xmm3,xmm7
|
||||||
|
paddd xmm0,xmm7
|
||||||
|
psrad xmm3,DESCALE_P1
|
||||||
|
psrad xmm0,DESCALE_P1
|
||||||
|
paddd xmm5,xmm7
|
||||||
|
paddd xmm6,xmm7
|
||||||
|
psrad xmm5,DESCALE_P1
|
||||||
|
psrad xmm6,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm3,xmm0 ; xmm3=data2=(20 21 22 23 24 25 26 27)
|
||||||
|
packssdw xmm5,xmm6 ; xmm5=data5=(50 51 52 53 54 55 56 57)
|
||||||
|
|
||||||
|
movdqa xmm1, XMMWORD [wk(2)] ; xmm1=tmp13L
|
||||||
|
movdqa xmm4, XMMWORD [wk(3)] ; xmm4=tmp13H
|
||||||
|
movdqa xmm2, XMMWORD [wk(8)] ; xmm2=tmp0L
|
||||||
|
movdqa xmm7, XMMWORD [wk(9)] ; xmm7=tmp0H
|
||||||
|
|
||||||
|
movdqa xmm0,xmm1
|
||||||
|
movdqa xmm6,xmm4
|
||||||
|
paddd xmm1,xmm2 ; xmm1=data3L
|
||||||
|
paddd xmm4,xmm7 ; xmm4=data3H
|
||||||
|
psubd xmm0,xmm2 ; xmm0=data4L
|
||||||
|
psubd xmm6,xmm7 ; xmm6=data4H
|
||||||
|
|
||||||
|
movdqa xmm2,[GOTOFF(ebx,PD_DESCALE_P1)] ; xmm2=[PD_DESCALE_P1]
|
||||||
|
|
||||||
|
paddd xmm1,xmm2
|
||||||
|
paddd xmm4,xmm2
|
||||||
|
psrad xmm1,DESCALE_P1
|
||||||
|
psrad xmm4,DESCALE_P1
|
||||||
|
paddd xmm0,xmm2
|
||||||
|
paddd xmm6,xmm2
|
||||||
|
psrad xmm0,DESCALE_P1
|
||||||
|
psrad xmm6,DESCALE_P1
|
||||||
|
|
||||||
|
packssdw xmm1,xmm4 ; xmm1=data3=(30 31 32 33 34 35 36 37)
|
||||||
|
packssdw xmm0,xmm6 ; xmm0=data4=(40 41 42 43 44 45 46 47)
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [wk(0)] ; xmm7=(00 10 01 11 02 12 03 13)
|
||||||
|
movdqa xmm2, XMMWORD [wk(1)] ; xmm2=(04 14 05 15 06 16 07 17)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm3 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm3,xmm1 ; xmm3=(20 30 21 31 22 32 23 33)
|
||||||
|
punpckhwd xmm4,xmm1 ; xmm4=(24 34 25 35 26 36 27 37)
|
||||||
|
movdqa xmm6,xmm0 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm0,xmm5 ; xmm0=(40 50 41 51 42 52 43 53)
|
||||||
|
punpckhwd xmm6,xmm5 ; xmm6=(44 54 45 55 46 56 47 57)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm7 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm7,xmm3 ; xmm7=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq xmm1,xmm3 ; xmm1=(02 12 22 32 03 13 23 33)
|
||||||
|
movdqa xmm5,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm2,xmm4 ; xmm2=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq xmm5,xmm4 ; xmm5=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa xmm3, XMMWORD [wk(4)] ; xmm3=(60 70 61 71 62 72 63 73)
|
||||||
|
movdqa xmm4, XMMWORD [wk(5)] ; xmm4=(64 74 65 75 66 76 67 77)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(6)], xmm2 ; wk(6)=(04 14 24 34 05 15 25 35)
|
||||||
|
movdqa XMMWORD [wk(7)], xmm5 ; wk(7)=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa xmm2,xmm0 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm0,xmm3 ; xmm0=(40 50 60 70 41 51 61 71)
|
||||||
|
punpckhdq xmm2,xmm3 ; xmm2=(42 52 62 72 43 53 63 73)
|
||||||
|
movdqa xmm5,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm6,xmm4 ; xmm6=(44 54 64 74 45 55 65 75)
|
||||||
|
punpckhdq xmm5,xmm4 ; xmm5=(46 56 66 76 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm3,xmm7 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm7,xmm0 ; xmm7=col0=(00 10 20 30 40 50 60 70)
|
||||||
|
punpckhqdq xmm3,xmm0 ; xmm3=col1=(01 11 21 31 41 51 61 71)
|
||||||
|
movdqa xmm4,xmm1 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm1,xmm2 ; xmm1=col2=(02 12 22 32 42 52 62 72)
|
||||||
|
punpckhqdq xmm4,xmm2 ; xmm4=col3=(03 13 23 33 43 53 63 73)
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [wk(6)] ; xmm0=(04 14 24 34 05 15 25 35)
|
||||||
|
movdqa xmm2, XMMWORD [wk(7)] ; xmm2=(06 16 26 36 07 17 27 37)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(8)], xmm3 ; wk(8)=col1
|
||||||
|
movdqa XMMWORD [wk(9)], xmm4 ; wk(9)=col3
|
||||||
|
|
||||||
|
movdqa xmm3,xmm0 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm0,xmm6 ; xmm0=col4=(04 14 24 34 44 54 64 74)
|
||||||
|
punpckhqdq xmm3,xmm6 ; xmm3=col5=(05 15 25 35 45 55 65 75)
|
||||||
|
movdqa xmm4,xmm2 ; transpose coefficients(phase 3)
|
||||||
|
punpcklqdq xmm2,xmm5 ; xmm2=col6=(06 16 26 36 46 56 66 76)
|
||||||
|
punpckhqdq xmm4,xmm5 ; xmm4=col7=(07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(10)], xmm3 ; wk(10)=col5
|
||||||
|
movdqa XMMWORD [wk(11)], xmm4 ; wk(11)=col7
|
||||||
|
.column_end:
|
||||||
|
|
||||||
|
; -- Prefetch the next coefficient block
|
||||||
|
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 0*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 1*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 2*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 3*32]
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows from work array, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
; xmm7=col0, xmm1=col2, xmm0=col4, xmm2=col6
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = (z2 + z3) * 0.541196100;
|
||||||
|
; tmp2 = z1 + z3 * -1.847759065;
|
||||||
|
; tmp3 = z1 + z2 * 0.765366865;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp2 = z2 * 0.541196100 + z3 * (0.541196100 - 1.847759065);
|
||||||
|
; tmp3 = z2 * (0.541196100 + 0.765366865) + z3 * 0.541196100;
|
||||||
|
|
||||||
|
movdqa xmm6,xmm1 ; xmm1=in2=z2
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
punpcklwd xmm6,xmm2 ; xmm2=in6=z3
|
||||||
|
punpckhwd xmm5,xmm2
|
||||||
|
movdqa xmm1,xmm6
|
||||||
|
movdqa xmm2,xmm5
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_F130_F054)] ; xmm6=tmp3L
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F130_F054)] ; xmm5=tmp3H
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_F054_MF130)] ; xmm1=tmp2L
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_F054_MF130)] ; xmm2=tmp2H
|
||||||
|
|
||||||
|
movdqa xmm3,xmm7
|
||||||
|
paddw xmm7,xmm0 ; xmm7=in0+in4
|
||||||
|
psubw xmm3,xmm0 ; xmm3=in0-in4
|
||||||
|
|
||||||
|
pxor xmm4,xmm4
|
||||||
|
pxor xmm0,xmm0
|
||||||
|
punpcklwd xmm4,xmm7 ; xmm4=tmp0L
|
||||||
|
punpckhwd xmm0,xmm7 ; xmm0=tmp0H
|
||||||
|
psrad xmm4,(16-CONST_BITS) ; psrad xmm4,16 & pslld xmm4,CONST_BITS
|
||||||
|
psrad xmm0,(16-CONST_BITS) ; psrad xmm0,16 & pslld xmm0,CONST_BITS
|
||||||
|
|
||||||
|
movdqa xmm7,xmm4
|
||||||
|
paddd xmm4,xmm6 ; xmm4=tmp10L
|
||||||
|
psubd xmm7,xmm6 ; xmm7=tmp13L
|
||||||
|
movdqa xmm6,xmm0
|
||||||
|
paddd xmm0,xmm5 ; xmm0=tmp10H
|
||||||
|
psubd xmm6,xmm5 ; xmm6=tmp13H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm4 ; wk(0)=tmp10L
|
||||||
|
movdqa XMMWORD [wk(1)], xmm0 ; wk(1)=tmp10H
|
||||||
|
movdqa XMMWORD [wk(2)], xmm7 ; wk(2)=tmp13L
|
||||||
|
movdqa XMMWORD [wk(3)], xmm6 ; wk(3)=tmp13H
|
||||||
|
|
||||||
|
pxor xmm5,xmm5
|
||||||
|
pxor xmm4,xmm4
|
||||||
|
punpcklwd xmm5,xmm3 ; xmm5=tmp1L
|
||||||
|
punpckhwd xmm4,xmm3 ; xmm4=tmp1H
|
||||||
|
psrad xmm5,(16-CONST_BITS) ; psrad xmm5,16 & pslld xmm5,CONST_BITS
|
||||||
|
psrad xmm4,(16-CONST_BITS) ; psrad xmm4,16 & pslld xmm4,CONST_BITS
|
||||||
|
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
paddd xmm5,xmm1 ; xmm5=tmp11L
|
||||||
|
psubd xmm0,xmm1 ; xmm0=tmp12L
|
||||||
|
movdqa xmm7,xmm4
|
||||||
|
paddd xmm4,xmm2 ; xmm4=tmp11H
|
||||||
|
psubd xmm7,xmm2 ; xmm7=tmp12H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(4)], xmm5 ; wk(4)=tmp11L
|
||||||
|
movdqa XMMWORD [wk(5)], xmm4 ; wk(5)=tmp11H
|
||||||
|
movdqa XMMWORD [wk(6)], xmm0 ; wk(6)=tmp12L
|
||||||
|
movdqa XMMWORD [wk(7)], xmm7 ; wk(7)=tmp12H
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [wk(9)] ; xmm6=col3
|
||||||
|
movdqa xmm3, XMMWORD [wk(8)] ; xmm3=col1
|
||||||
|
movdqa xmm1, XMMWORD [wk(11)] ; xmm1=col7
|
||||||
|
movdqa xmm2, XMMWORD [wk(10)] ; xmm2=col5
|
||||||
|
|
||||||
|
movdqa xmm5,xmm6
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
paddw xmm5,xmm1 ; xmm5=z3
|
||||||
|
paddw xmm4,xmm2 ; xmm4=z4
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z5 = (z3 + z4) * 1.175875602;
|
||||||
|
; z3 = z3 * -1.961570560; z4 = z4 * -0.390180644;
|
||||||
|
; z3 += z5; z4 += z5;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; z3 = z3 * (1.175875602 - 1.961570560) + z4 * 1.175875602;
|
||||||
|
; z4 = z3 * 1.175875602 + z4 * (1.175875602 - 0.390180644);
|
||||||
|
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
movdqa xmm7,xmm5
|
||||||
|
punpcklwd xmm0,xmm4
|
||||||
|
punpckhwd xmm7,xmm4
|
||||||
|
movdqa xmm5,xmm0
|
||||||
|
movdqa xmm4,xmm7
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF078_F117)] ; xmm0=z3L
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF078_F117)] ; xmm7=z3H
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F117_F078)] ; xmm5=z4L
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_F117_F078)] ; xmm4=z4H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(10)], xmm0 ; wk(10)=z3L
|
||||||
|
movdqa XMMWORD [wk(11)], xmm7 ; wk(11)=z3H
|
||||||
|
|
||||||
|
; (Original)
|
||||||
|
; z1 = tmp0 + tmp3; z2 = tmp1 + tmp2;
|
||||||
|
; tmp0 = tmp0 * 0.298631336; tmp1 = tmp1 * 2.053119869;
|
||||||
|
; tmp2 = tmp2 * 3.072711026; tmp3 = tmp3 * 1.501321110;
|
||||||
|
; z1 = z1 * -0.899976223; z2 = z2 * -2.562915447;
|
||||||
|
; tmp0 += z1 + z3; tmp1 += z2 + z4;
|
||||||
|
; tmp2 += z2 + z3; tmp3 += z1 + z4;
|
||||||
|
;
|
||||||
|
; (This implementation)
|
||||||
|
; tmp0 = tmp0 * (0.298631336 - 0.899976223) + tmp3 * -0.899976223;
|
||||||
|
; tmp1 = tmp1 * (2.053119869 - 2.562915447) + tmp2 * -2.562915447;
|
||||||
|
; tmp2 = tmp1 * -2.562915447 + tmp2 * (3.072711026 - 2.562915447);
|
||||||
|
; tmp3 = tmp0 * -0.899976223 + tmp3 * (1.501321110 - 0.899976223);
|
||||||
|
; tmp0 += z3; tmp1 += z4;
|
||||||
|
; tmp2 += z3; tmp3 += z4;
|
||||||
|
|
||||||
|
movdqa xmm0,xmm1
|
||||||
|
movdqa xmm7,xmm1
|
||||||
|
punpcklwd xmm0,xmm3
|
||||||
|
punpckhwd xmm7,xmm3
|
||||||
|
movdqa xmm1,xmm0
|
||||||
|
movdqa xmm3,xmm7
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm0=tmp0L
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF060_MF089)] ; xmm7=tmp0H
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_MF089_F060)] ; xmm1=tmp3L
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_MF089_F060)] ; xmm3=tmp3H
|
||||||
|
|
||||||
|
paddd xmm0, XMMWORD [wk(10)] ; xmm0=tmp0L
|
||||||
|
paddd xmm7, XMMWORD [wk(11)] ; xmm7=tmp0H
|
||||||
|
paddd xmm1,xmm5 ; xmm1=tmp3L
|
||||||
|
paddd xmm3,xmm4 ; xmm3=tmp3H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(8)], xmm0 ; wk(8)=tmp0L
|
||||||
|
movdqa XMMWORD [wk(9)], xmm7 ; wk(9)=tmp0H
|
||||||
|
|
||||||
|
movdqa xmm0,xmm2
|
||||||
|
movdqa xmm7,xmm2
|
||||||
|
punpcklwd xmm0,xmm6
|
||||||
|
punpckhwd xmm7,xmm6
|
||||||
|
movdqa xmm2,xmm0
|
||||||
|
movdqa xmm6,xmm7
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm0=tmp1L
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF050_MF256)] ; xmm7=tmp1H
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_MF256_F050)] ; xmm2=tmp2L
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_MF256_F050)] ; xmm6=tmp2H
|
||||||
|
|
||||||
|
paddd xmm0,xmm5 ; xmm0=tmp1L
|
||||||
|
paddd xmm7,xmm4 ; xmm7=tmp1H
|
||||||
|
paddd xmm2, XMMWORD [wk(10)] ; xmm2=tmp2L
|
||||||
|
paddd xmm6, XMMWORD [wk(11)] ; xmm6=tmp2H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(10)], xmm0 ; wk(10)=tmp1L
|
||||||
|
movdqa XMMWORD [wk(11)], xmm7 ; wk(11)=tmp1H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movdqa xmm5, XMMWORD [wk(0)] ; xmm5=tmp10L
|
||||||
|
movdqa xmm4, XMMWORD [wk(1)] ; xmm4=tmp10H
|
||||||
|
|
||||||
|
movdqa xmm0,xmm5
|
||||||
|
movdqa xmm7,xmm4
|
||||||
|
paddd xmm5,xmm1 ; xmm5=data0L
|
||||||
|
paddd xmm4,xmm3 ; xmm4=data0H
|
||||||
|
psubd xmm0,xmm1 ; xmm0=data7L
|
||||||
|
psubd xmm7,xmm3 ; xmm7=data7H
|
||||||
|
|
||||||
|
movdqa xmm1,[GOTOFF(ebx,PD_DESCALE_P2)] ; xmm1=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd xmm5,xmm1
|
||||||
|
paddd xmm4,xmm1
|
||||||
|
psrad xmm5,DESCALE_P2
|
||||||
|
psrad xmm4,DESCALE_P2
|
||||||
|
paddd xmm0,xmm1
|
||||||
|
paddd xmm7,xmm1
|
||||||
|
psrad xmm0,DESCALE_P2
|
||||||
|
psrad xmm7,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw xmm5,xmm4 ; xmm5=data0=(00 10 20 30 40 50 60 70)
|
||||||
|
packssdw xmm0,xmm7 ; xmm0=data7=(07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm3, XMMWORD [wk(4)] ; xmm3=tmp11L
|
||||||
|
movdqa xmm1, XMMWORD [wk(5)] ; xmm1=tmp11H
|
||||||
|
|
||||||
|
movdqa xmm4,xmm3
|
||||||
|
movdqa xmm7,xmm1
|
||||||
|
paddd xmm3,xmm2 ; xmm3=data1L
|
||||||
|
paddd xmm1,xmm6 ; xmm1=data1H
|
||||||
|
psubd xmm4,xmm2 ; xmm4=data6L
|
||||||
|
psubd xmm7,xmm6 ; xmm7=data6H
|
||||||
|
|
||||||
|
movdqa xmm2,[GOTOFF(ebx,PD_DESCALE_P2)] ; xmm2=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd xmm3,xmm2
|
||||||
|
paddd xmm1,xmm2
|
||||||
|
psrad xmm3,DESCALE_P2
|
||||||
|
psrad xmm1,DESCALE_P2
|
||||||
|
paddd xmm4,xmm2
|
||||||
|
paddd xmm7,xmm2
|
||||||
|
psrad xmm4,DESCALE_P2
|
||||||
|
psrad xmm7,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw xmm3,xmm1 ; xmm3=data1=(01 11 21 31 41 51 61 71)
|
||||||
|
packssdw xmm4,xmm7 ; xmm4=data6=(06 16 26 36 46 56 66 76)
|
||||||
|
|
||||||
|
packsswb xmm5,xmm4 ; xmm5=(00 10 20 30 40 50 60 70 06 16 26 36 46 56 66 76)
|
||||||
|
packsswb xmm3,xmm0 ; xmm3=(01 11 21 31 41 51 61 71 07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [wk(6)] ; xmm6=tmp12L
|
||||||
|
movdqa xmm2, XMMWORD [wk(7)] ; xmm2=tmp12H
|
||||||
|
movdqa xmm1, XMMWORD [wk(10)] ; xmm1=tmp1L
|
||||||
|
movdqa xmm7, XMMWORD [wk(11)] ; xmm7=tmp1H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm5 ; wk(0)=(00 10 20 30 40 50 60 70 06 16 26 36 46 56 66 76)
|
||||||
|
movdqa XMMWORD [wk(1)], xmm3 ; wk(1)=(01 11 21 31 41 51 61 71 07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm6
|
||||||
|
movdqa xmm0,xmm2
|
||||||
|
paddd xmm6,xmm1 ; xmm6=data2L
|
||||||
|
paddd xmm2,xmm7 ; xmm2=data2H
|
||||||
|
psubd xmm4,xmm1 ; xmm4=data5L
|
||||||
|
psubd xmm0,xmm7 ; xmm0=data5H
|
||||||
|
|
||||||
|
movdqa xmm5,[GOTOFF(ebx,PD_DESCALE_P2)] ; xmm5=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd xmm6,xmm5
|
||||||
|
paddd xmm2,xmm5
|
||||||
|
psrad xmm6,DESCALE_P2
|
||||||
|
psrad xmm2,DESCALE_P2
|
||||||
|
paddd xmm4,xmm5
|
||||||
|
paddd xmm0,xmm5
|
||||||
|
psrad xmm4,DESCALE_P2
|
||||||
|
psrad xmm0,DESCALE_P2
|
||||||
|
|
||||||
|
packssdw xmm6,xmm2 ; xmm6=data2=(02 12 22 32 42 52 62 72)
|
||||||
|
packssdw xmm4,xmm0 ; xmm4=data5=(05 15 25 35 45 55 65 75)
|
||||||
|
|
||||||
|
movdqa xmm3, XMMWORD [wk(2)] ; xmm3=tmp13L
|
||||||
|
movdqa xmm1, XMMWORD [wk(3)] ; xmm1=tmp13H
|
||||||
|
movdqa xmm7, XMMWORD [wk(8)] ; xmm7=tmp0L
|
||||||
|
movdqa xmm5, XMMWORD [wk(9)] ; xmm5=tmp0H
|
||||||
|
|
||||||
|
movdqa xmm2,xmm3
|
||||||
|
movdqa xmm0,xmm1
|
||||||
|
paddd xmm3,xmm7 ; xmm3=data3L
|
||||||
|
paddd xmm1,xmm5 ; xmm1=data3H
|
||||||
|
psubd xmm2,xmm7 ; xmm2=data4L
|
||||||
|
psubd xmm0,xmm5 ; xmm0=data4H
|
||||||
|
|
||||||
|
movdqa xmm7,[GOTOFF(ebx,PD_DESCALE_P2)] ; xmm7=[PD_DESCALE_P2]
|
||||||
|
|
||||||
|
paddd xmm3,xmm7
|
||||||
|
paddd xmm1,xmm7
|
||||||
|
psrad xmm3,DESCALE_P2
|
||||||
|
psrad xmm1,DESCALE_P2
|
||||||
|
paddd xmm2,xmm7
|
||||||
|
paddd xmm0,xmm7
|
||||||
|
psrad xmm2,DESCALE_P2
|
||||||
|
psrad xmm0,DESCALE_P2
|
||||||
|
|
||||||
|
movdqa xmm5,[GOTOFF(ebx,PB_CENTERJSAMP)] ; xmm5=[PB_CENTERJSAMP]
|
||||||
|
|
||||||
|
packssdw xmm3,xmm1 ; xmm3=data3=(03 13 23 33 43 53 63 73)
|
||||||
|
packssdw xmm2,xmm0 ; xmm2=data4=(04 14 24 34 44 54 64 74)
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [wk(0)] ; xmm7=(00 10 20 30 40 50 60 70 06 16 26 36 46 56 66 76)
|
||||||
|
movdqa xmm1, XMMWORD [wk(1)] ; xmm1=(01 11 21 31 41 51 61 71 07 17 27 37 47 57 67 77)
|
||||||
|
|
||||||
|
packsswb xmm6,xmm2 ; xmm6=(02 12 22 32 42 52 62 72 04 14 24 34 44 54 64 74)
|
||||||
|
packsswb xmm3,xmm4 ; xmm3=(03 13 23 33 43 53 63 73 05 15 25 35 45 55 65 75)
|
||||||
|
|
||||||
|
paddb xmm7,xmm5
|
||||||
|
paddb xmm1,xmm5
|
||||||
|
paddb xmm6,xmm5
|
||||||
|
paddb xmm3,xmm5
|
||||||
|
|
||||||
|
movdqa xmm0,xmm7 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw xmm7,xmm1 ; xmm7=(00 01 10 11 20 21 30 31 40 41 50 51 60 61 70 71)
|
||||||
|
punpckhbw xmm0,xmm1 ; xmm0=(06 07 16 17 26 27 36 37 46 47 56 57 66 67 76 77)
|
||||||
|
movdqa xmm2,xmm6 ; transpose coefficients(phase 1)
|
||||||
|
punpcklbw xmm6,xmm3 ; xmm6=(02 03 12 13 22 23 32 33 42 43 52 53 62 63 72 73)
|
||||||
|
punpckhbw xmm2,xmm3 ; xmm2=(04 05 14 15 24 25 34 35 44 45 54 55 64 65 74 75)
|
||||||
|
|
||||||
|
movdqa xmm4,xmm7 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd xmm7,xmm6 ; xmm7=(00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33)
|
||||||
|
punpckhwd xmm4,xmm6 ; xmm4=(40 41 42 43 50 51 52 53 60 61 62 63 70 71 72 73)
|
||||||
|
movdqa xmm5,xmm2 ; transpose coefficients(phase 2)
|
||||||
|
punpcklwd xmm2,xmm0 ; xmm2=(04 05 06 07 14 15 16 17 24 25 26 27 34 35 36 37)
|
||||||
|
punpckhwd xmm5,xmm0 ; xmm5=(44 45 46 47 54 55 56 57 64 65 66 67 74 75 76 77)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm7 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq xmm7,xmm2 ; xmm7=(00 01 02 03 04 05 06 07 10 11 12 13 14 15 16 17)
|
||||||
|
punpckhdq xmm1,xmm2 ; xmm1=(20 21 22 23 24 25 26 27 30 31 32 33 34 35 36 37)
|
||||||
|
movdqa xmm3,xmm4 ; transpose coefficients(phase 3)
|
||||||
|
punpckldq xmm4,xmm5 ; xmm4=(40 41 42 43 44 45 46 47 50 51 52 53 54 55 56 57)
|
||||||
|
punpckhdq xmm3,xmm5 ; xmm3=(60 61 62 63 64 65 66 67 70 71 72 73 74 75 76 77)
|
||||||
|
|
||||||
|
pshufd xmm6,xmm7,0x4E ; xmm6=(10 11 12 13 14 15 16 17 00 01 02 03 04 05 06 07)
|
||||||
|
pshufd xmm0,xmm1,0x4E ; xmm0=(30 31 32 33 34 35 36 37 20 21 22 23 24 25 26 27)
|
||||||
|
pshufd xmm2,xmm4,0x4E ; xmm2=(50 51 52 53 54 55 56 57 40 41 42 43 44 45 46 47)
|
||||||
|
pshufd xmm5,xmm3,0x4E ; xmm5=(70 71 72 73 74 75 76 77 60 61 62 63 64 65 66 67)
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm7
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm1
|
||||||
|
mov edx, JSAMPROW [edi+4*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+6*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm4
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm3
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm6
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm0
|
||||||
|
mov edx, JSAMPROW [edi+5*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+7*SIZEOF_JSAMPROW]
|
||||||
|
movq _MMWORD [edx+eax*SIZEOF_JSAMPLE], xmm2
|
||||||
|
movq _MMWORD [esi+eax*SIZEOF_JSAMPLE], xmm5
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
%endif ; DCT_ISLOW_SUPPORTED
|
||||||
607
jiss2red.asm
Normal file
607
jiss2red.asm
Normal file
@@ -0,0 +1,607 @@
|
|||||||
|
;
|
||||||
|
; jiss2red.asm - reduced-size IDCT (SSE2)
|
||||||
|
;
|
||||||
|
; x86 SIMD extension for IJG JPEG library
|
||||||
|
; Copyright (C) 1999-2006, MIYASAKA Masaru.
|
||||||
|
; For conditions of distribution and use, see copyright notice in jsimdext.inc
|
||||||
|
;
|
||||||
|
; This file should be assembled with NASM (Netwide Assembler),
|
||||||
|
; can *not* be assembled with Microsoft's MASM or any compatible
|
||||||
|
; assembler (including Borland's Turbo Assembler).
|
||||||
|
; NASM is available from http://nasm.sourceforge.net/ or
|
||||||
|
; http://sourceforge.net/project/showfiles.php?group_id=6208
|
||||||
|
;
|
||||||
|
; This file contains inverse-DCT routines that produce reduced-size
|
||||||
|
; output: either 4x4 or 2x2 pixels from an 8x8 DCT block.
|
||||||
|
; The following code is based directly on the IJG's original jidctred.c;
|
||||||
|
; see the jidctred.c for more details.
|
||||||
|
;
|
||||||
|
; Last Modified : February 4, 2006
|
||||||
|
;
|
||||||
|
; [TAB8]
|
||||||
|
|
||||||
|
%include "jsimdext.inc"
|
||||||
|
%include "jdct.inc"
|
||||||
|
|
||||||
|
%ifdef IDCT_SCALING_SUPPORTED
|
||||||
|
%ifdef JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
|
||||||
|
; This module is specialized to the case DCTSIZE = 8.
|
||||||
|
;
|
||||||
|
%if DCTSIZE != 8
|
||||||
|
%error "Sorry, this code only copes with 8x8 DCTs."
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
|
||||||
|
%define CONST_BITS 13
|
||||||
|
%define PASS1_BITS 2
|
||||||
|
|
||||||
|
%define DESCALE_P1_4 (CONST_BITS-PASS1_BITS+1)
|
||||||
|
%define DESCALE_P2_4 (CONST_BITS+PASS1_BITS+3+1)
|
||||||
|
%define DESCALE_P1_2 (CONST_BITS-PASS1_BITS+2)
|
||||||
|
%define DESCALE_P2_2 (CONST_BITS+PASS1_BITS+3+2)
|
||||||
|
|
||||||
|
%if CONST_BITS == 13
|
||||||
|
F_0_211 equ 1730 ; FIX(0.211164243)
|
||||||
|
F_0_509 equ 4176 ; FIX(0.509795579)
|
||||||
|
F_0_601 equ 4926 ; FIX(0.601344887)
|
||||||
|
F_0_720 equ 5906 ; FIX(0.720959822)
|
||||||
|
F_0_765 equ 6270 ; FIX(0.765366865)
|
||||||
|
F_0_850 equ 6967 ; FIX(0.850430095)
|
||||||
|
F_0_899 equ 7373 ; FIX(0.899976223)
|
||||||
|
F_1_061 equ 8697 ; FIX(1.061594337)
|
||||||
|
F_1_272 equ 10426 ; FIX(1.272758580)
|
||||||
|
F_1_451 equ 11893 ; FIX(1.451774981)
|
||||||
|
F_1_847 equ 15137 ; FIX(1.847759065)
|
||||||
|
F_2_172 equ 17799 ; FIX(2.172734803)
|
||||||
|
F_2_562 equ 20995 ; FIX(2.562915447)
|
||||||
|
F_3_624 equ 29692 ; FIX(3.624509785)
|
||||||
|
%else
|
||||||
|
; NASM cannot do compile-time arithmetic on floating-point constants.
|
||||||
|
%define DESCALE(x,n) (((x)+(1<<((n)-1)))>>(n))
|
||||||
|
F_0_211 equ DESCALE( 226735879,30-CONST_BITS) ; FIX(0.211164243)
|
||||||
|
F_0_509 equ DESCALE( 547388834,30-CONST_BITS) ; FIX(0.509795579)
|
||||||
|
F_0_601 equ DESCALE( 645689155,30-CONST_BITS) ; FIX(0.601344887)
|
||||||
|
F_0_720 equ DESCALE( 774124714,30-CONST_BITS) ; FIX(0.720959822)
|
||||||
|
F_0_765 equ DESCALE( 821806413,30-CONST_BITS) ; FIX(0.765366865)
|
||||||
|
F_0_850 equ DESCALE( 913142361,30-CONST_BITS) ; FIX(0.850430095)
|
||||||
|
F_0_899 equ DESCALE( 966342111,30-CONST_BITS) ; FIX(0.899976223)
|
||||||
|
F_1_061 equ DESCALE(1139878239,30-CONST_BITS) ; FIX(1.061594337)
|
||||||
|
F_1_272 equ DESCALE(1366614119,30-CONST_BITS) ; FIX(1.272758580)
|
||||||
|
F_1_451 equ DESCALE(1558831516,30-CONST_BITS) ; FIX(1.451774981)
|
||||||
|
F_1_847 equ DESCALE(1984016188,30-CONST_BITS) ; FIX(1.847759065)
|
||||||
|
F_2_172 equ DESCALE(2332956230,30-CONST_BITS) ; FIX(2.172734803)
|
||||||
|
F_2_562 equ DESCALE(2751909506,30-CONST_BITS) ; FIX(2.562915447)
|
||||||
|
F_3_624 equ DESCALE(3891787747,30-CONST_BITS) ; FIX(3.624509785)
|
||||||
|
%endif
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_CONST
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
global EXTN(jconst_idct_red_sse2)
|
||||||
|
|
||||||
|
EXTN(jconst_idct_red_sse2):
|
||||||
|
|
||||||
|
PW_F184_MF076 times 4 dw F_1_847,-F_0_765
|
||||||
|
PW_F256_F089 times 4 dw F_2_562, F_0_899
|
||||||
|
PW_F106_MF217 times 4 dw F_1_061,-F_2_172
|
||||||
|
PW_MF060_MF050 times 4 dw -F_0_601,-F_0_509
|
||||||
|
PW_F145_MF021 times 4 dw F_1_451,-F_0_211
|
||||||
|
PW_F362_MF127 times 4 dw F_3_624,-F_1_272
|
||||||
|
PW_F085_MF072 times 4 dw F_0_850,-F_0_720
|
||||||
|
PD_DESCALE_P1_4 times 4 dd 1 << (DESCALE_P1_4-1)
|
||||||
|
PD_DESCALE_P2_4 times 4 dd 1 << (DESCALE_P2_4-1)
|
||||||
|
PD_DESCALE_P1_2 times 4 dd 1 << (DESCALE_P1_2-1)
|
||||||
|
PD_DESCALE_P2_2 times 4 dd 1 << (DESCALE_P2_2-1)
|
||||||
|
PB_CENTERJSAMP times 16 db CENTERJSAMPLE
|
||||||
|
|
||||||
|
alignz 16
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
SECTION SEG_TEXT
|
||||||
|
BITS 32
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 4x4 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_4x4_sse2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
%define original_ebp ebp+0
|
||||||
|
%define wk(i) ebp-(WK_NUM-(i))*SIZEOF_XMMWORD ; xmmword wk[WK_NUM]
|
||||||
|
%define WK_NUM 2
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_4x4_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_4x4_sse2):
|
||||||
|
push ebp
|
||||||
|
mov eax,esp ; eax = original ebp
|
||||||
|
sub esp, byte 4
|
||||||
|
and esp, byte (-SIZEOF_XMMWORD) ; align to 128 bits
|
||||||
|
mov [esp],eax
|
||||||
|
mov ebp,esp ; ebp = aligned ebp
|
||||||
|
lea esp, [wk(0)]
|
||||||
|
pushpic ebx
|
||||||
|
; push ecx ; unused
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input.
|
||||||
|
|
||||||
|
; mov eax, [original_ebp]
|
||||||
|
mov edx, POINTER [compptr(eax)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(eax)] ; inptr
|
||||||
|
|
||||||
|
%ifndef NO_ZERO_COLUMN_TEST_4X4_SSE2
|
||||||
|
mov eax, DWORD [DWBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
or eax, DWORD [DWBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm1, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
por xmm0,xmm1
|
||||||
|
packsswb xmm0,xmm0
|
||||||
|
packsswb xmm0,xmm0
|
||||||
|
movd eax,xmm0
|
||||||
|
test eax,eax
|
||||||
|
jnz short .columnDCT
|
||||||
|
|
||||||
|
; -- AC terms all zero
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
psllw xmm0,PASS1_BITS
|
||||||
|
|
||||||
|
movdqa xmm3,xmm0 ; xmm0=in0=(00 01 02 03 04 05 06 07)
|
||||||
|
punpcklwd xmm0,xmm0 ; xmm0=(00 00 01 01 02 02 03 03)
|
||||||
|
punpckhwd xmm3,xmm3 ; xmm3=(04 04 05 05 06 06 07 07)
|
||||||
|
|
||||||
|
pshufd xmm1,xmm0,0x50 ; xmm1=[col0 col1]=(00 00 00 00 01 01 01 01)
|
||||||
|
pshufd xmm0,xmm0,0xFA ; xmm0=[col2 col3]=(02 02 02 02 03 03 03 03)
|
||||||
|
pshufd xmm6,xmm3,0x50 ; xmm6=[col4 col5]=(04 04 04 04 05 05 05 05)
|
||||||
|
pshufd xmm3,xmm3,0xFA ; xmm3=[col6 col7]=(06 06 06 06 07 07 07 07)
|
||||||
|
|
||||||
|
jmp near .column_end
|
||||||
|
alignx 16,7
|
||||||
|
%endif
|
||||||
|
.columnDCT:
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm1, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm2, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm3, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0
|
||||||
|
movdqa xmm5,xmm0
|
||||||
|
punpcklwd xmm4,xmm1
|
||||||
|
punpckhwd xmm5,xmm1
|
||||||
|
movdqa xmm0,xmm4
|
||||||
|
movdqa xmm1,xmm5
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_F256_F089)] ; xmm4=(tmp2L)
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F256_F089)] ; xmm5=(tmp2H)
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_F106_MF217)] ; xmm0=(tmp0L)
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_F106_MF217)] ; xmm1=(tmp0H)
|
||||||
|
|
||||||
|
movdqa xmm6,xmm2
|
||||||
|
movdqa xmm7,xmm2
|
||||||
|
punpcklwd xmm6,xmm3
|
||||||
|
punpckhwd xmm7,xmm3
|
||||||
|
movdqa xmm2,xmm6
|
||||||
|
movdqa xmm3,xmm7
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_MF060_MF050)] ; xmm6=(tmp2L)
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_MF060_MF050)] ; xmm7=(tmp2H)
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_F145_MF021)] ; xmm2=(tmp0L)
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_F145_MF021)] ; xmm3=(tmp0H)
|
||||||
|
|
||||||
|
paddd xmm6,xmm4 ; xmm6=tmp2L
|
||||||
|
paddd xmm7,xmm5 ; xmm7=tmp2H
|
||||||
|
paddd xmm2,xmm0 ; xmm2=tmp0L
|
||||||
|
paddd xmm3,xmm1 ; xmm3=tmp0H
|
||||||
|
|
||||||
|
movdqa XMMWORD [wk(0)], xmm2 ; wk(0)=tmp0L
|
||||||
|
movdqa XMMWORD [wk(1)], xmm3 ; wk(1)=tmp0H
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm4, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm5, XMMWORD [XMMBLOCK(2,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(6,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm4, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm5, XMMWORD [XMMBLOCK(2,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(6,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
pxor xmm1,xmm1
|
||||||
|
pxor xmm2,xmm2
|
||||||
|
punpcklwd xmm1,xmm4 ; xmm1=tmp0L
|
||||||
|
punpckhwd xmm2,xmm4 ; xmm2=tmp0H
|
||||||
|
psrad xmm1,(16-CONST_BITS-1) ; psrad xmm1,16 & pslld xmm1,CONST_BITS+1
|
||||||
|
psrad xmm2,(16-CONST_BITS-1) ; psrad xmm2,16 & pslld xmm2,CONST_BITS+1
|
||||||
|
|
||||||
|
movdqa xmm3,xmm5 ; xmm5=in2=z2
|
||||||
|
punpcklwd xmm5,xmm0 ; xmm0=in6=z3
|
||||||
|
punpckhwd xmm3,xmm0
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F184_MF076)] ; xmm5=tmp2L
|
||||||
|
pmaddwd xmm3,[GOTOFF(ebx,PW_F184_MF076)] ; xmm3=tmp2H
|
||||||
|
|
||||||
|
movdqa xmm4,xmm1
|
||||||
|
movdqa xmm0,xmm2
|
||||||
|
paddd xmm1,xmm5 ; xmm1=tmp10L
|
||||||
|
paddd xmm2,xmm3 ; xmm2=tmp10H
|
||||||
|
psubd xmm4,xmm5 ; xmm4=tmp12L
|
||||||
|
psubd xmm0,xmm3 ; xmm0=tmp12H
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
movdqa xmm3,xmm2
|
||||||
|
paddd xmm1,xmm6 ; xmm1=data0L
|
||||||
|
paddd xmm2,xmm7 ; xmm2=data0H
|
||||||
|
psubd xmm5,xmm6 ; xmm5=data3L
|
||||||
|
psubd xmm3,xmm7 ; xmm3=data3H
|
||||||
|
|
||||||
|
movdqa xmm6,[GOTOFF(ebx,PD_DESCALE_P1_4)] ; xmm6=[PD_DESCALE_P1_4]
|
||||||
|
|
||||||
|
paddd xmm1,xmm6
|
||||||
|
paddd xmm2,xmm6
|
||||||
|
psrad xmm1,DESCALE_P1_4
|
||||||
|
psrad xmm2,DESCALE_P1_4
|
||||||
|
paddd xmm5,xmm6
|
||||||
|
paddd xmm3,xmm6
|
||||||
|
psrad xmm5,DESCALE_P1_4
|
||||||
|
psrad xmm3,DESCALE_P1_4
|
||||||
|
|
||||||
|
packssdw xmm1,xmm2 ; xmm1=data0=(00 01 02 03 04 05 06 07)
|
||||||
|
packssdw xmm5,xmm3 ; xmm5=data3=(30 31 32 33 34 35 36 37)
|
||||||
|
|
||||||
|
movdqa xmm7, XMMWORD [wk(0)] ; xmm7=tmp0L
|
||||||
|
movdqa xmm6, XMMWORD [wk(1)] ; xmm6=tmp0H
|
||||||
|
|
||||||
|
movdqa xmm2,xmm4
|
||||||
|
movdqa xmm3,xmm0
|
||||||
|
paddd xmm4,xmm7 ; xmm4=data1L
|
||||||
|
paddd xmm0,xmm6 ; xmm0=data1H
|
||||||
|
psubd xmm2,xmm7 ; xmm2=data2L
|
||||||
|
psubd xmm3,xmm6 ; xmm3=data2H
|
||||||
|
|
||||||
|
movdqa xmm7,[GOTOFF(ebx,PD_DESCALE_P1_4)] ; xmm7=[PD_DESCALE_P1_4]
|
||||||
|
|
||||||
|
paddd xmm4,xmm7
|
||||||
|
paddd xmm0,xmm7
|
||||||
|
psrad xmm4,DESCALE_P1_4
|
||||||
|
psrad xmm0,DESCALE_P1_4
|
||||||
|
paddd xmm2,xmm7
|
||||||
|
paddd xmm3,xmm7
|
||||||
|
psrad xmm2,DESCALE_P1_4
|
||||||
|
psrad xmm3,DESCALE_P1_4
|
||||||
|
|
||||||
|
packssdw xmm4,xmm0 ; xmm4=data1=(10 11 12 13 14 15 16 17)
|
||||||
|
packssdw xmm2,xmm3 ; xmm2=data2=(20 21 22 23 24 25 26 27)
|
||||||
|
|
||||||
|
movdqa xmm6,xmm1 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm1,xmm4 ; xmm1=(00 10 01 11 02 12 03 13)
|
||||||
|
punpckhwd xmm6,xmm4 ; xmm6=(04 14 05 15 06 16 07 17)
|
||||||
|
movdqa xmm7,xmm2 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm2,xmm5 ; xmm2=(20 30 21 31 22 32 23 33)
|
||||||
|
punpckhwd xmm7,xmm5 ; xmm7=(24 34 25 35 26 36 27 37)
|
||||||
|
|
||||||
|
movdqa xmm0,xmm1 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm1,xmm2 ; xmm1=[col0 col1]=(00 10 20 30 01 11 21 31)
|
||||||
|
punpckhdq xmm0,xmm2 ; xmm0=[col2 col3]=(02 12 22 32 03 13 23 33)
|
||||||
|
movdqa xmm3,xmm6 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm6,xmm7 ; xmm6=[col4 col5]=(04 14 24 34 05 15 25 35)
|
||||||
|
punpckhdq xmm3,xmm7 ; xmm3=[col6 col7]=(06 16 26 36 07 17 27 37)
|
||||||
|
.column_end:
|
||||||
|
|
||||||
|
; -- Prefetch the next coefficient block
|
||||||
|
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 0*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 1*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 2*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 3*32]
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows, store into output array.
|
||||||
|
|
||||||
|
mov eax, [original_ebp]
|
||||||
|
mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(eax)]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
pxor xmm4,xmm4
|
||||||
|
punpcklwd xmm4,xmm1 ; xmm4=tmp0
|
||||||
|
psrad xmm4,(16-CONST_BITS-1) ; psrad xmm4,16 & pslld xmm4,CONST_BITS+1
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
punpckhwd xmm1,xmm0
|
||||||
|
punpckhwd xmm6,xmm3
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
movdqa xmm2,xmm6
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_F256_F089)] ; xmm1=(tmp2)
|
||||||
|
pmaddwd xmm6,[GOTOFF(ebx,PW_MF060_MF050)] ; xmm6=(tmp2)
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F106_MF217)] ; xmm5=(tmp0)
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_F145_MF021)] ; xmm2=(tmp0)
|
||||||
|
|
||||||
|
paddd xmm6,xmm1 ; xmm6=tmp2
|
||||||
|
paddd xmm2,xmm5 ; xmm2=tmp0
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
punpcklwd xmm0,xmm3
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_F184_MF076)] ; xmm0=tmp2
|
||||||
|
|
||||||
|
movdqa xmm7,xmm4
|
||||||
|
paddd xmm4,xmm0 ; xmm4=tmp10
|
||||||
|
psubd xmm7,xmm0 ; xmm7=tmp12
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movdqa xmm1,[GOTOFF(ebx,PD_DESCALE_P2_4)] ; xmm1=[PD_DESCALE_P2_4]
|
||||||
|
|
||||||
|
movdqa xmm5,xmm4
|
||||||
|
movdqa xmm3,xmm7
|
||||||
|
paddd xmm4,xmm6 ; xmm4=data0=(00 10 20 30)
|
||||||
|
paddd xmm7,xmm2 ; xmm7=data1=(01 11 21 31)
|
||||||
|
psubd xmm5,xmm6 ; xmm5=data3=(03 13 23 33)
|
||||||
|
psubd xmm3,xmm2 ; xmm3=data2=(02 12 22 32)
|
||||||
|
|
||||||
|
paddd xmm4,xmm1
|
||||||
|
paddd xmm7,xmm1
|
||||||
|
psrad xmm4,DESCALE_P2_4
|
||||||
|
psrad xmm7,DESCALE_P2_4
|
||||||
|
paddd xmm5,xmm1
|
||||||
|
paddd xmm3,xmm1
|
||||||
|
psrad xmm5,DESCALE_P2_4
|
||||||
|
psrad xmm3,DESCALE_P2_4
|
||||||
|
|
||||||
|
packssdw xmm4,xmm3 ; xmm4=(00 10 20 30 02 12 22 32)
|
||||||
|
packssdw xmm7,xmm5 ; xmm7=(01 11 21 31 03 13 23 33)
|
||||||
|
|
||||||
|
movdqa xmm0,xmm4 ; transpose coefficients(phase 1)
|
||||||
|
punpcklwd xmm4,xmm7 ; xmm4=(00 01 10 11 20 21 30 31)
|
||||||
|
punpckhwd xmm0,xmm7 ; xmm0=(02 03 12 13 22 23 32 33)
|
||||||
|
|
||||||
|
movdqa xmm6,xmm4 ; transpose coefficients(phase 2)
|
||||||
|
punpckldq xmm4,xmm0 ; xmm4=(00 01 02 03 10 11 12 13)
|
||||||
|
punpckhdq xmm6,xmm0 ; xmm6=(20 21 22 23 30 31 32 33)
|
||||||
|
|
||||||
|
packsswb xmm4,xmm6 ; xmm4=(00 01 02 03 10 11 12 13 20 ..)
|
||||||
|
paddb xmm4,[GOTOFF(ebx,PB_CENTERJSAMP)]
|
||||||
|
|
||||||
|
pshufd xmm2,xmm4,0x39 ; xmm2=(10 11 12 13 20 21 22 23 30 ..)
|
||||||
|
pshufd xmm1,xmm4,0x4E ; xmm1=(20 21 22 23 30 31 32 33 00 ..)
|
||||||
|
pshufd xmm3,xmm4,0x93 ; xmm3=(30 31 32 33 00 01 02 03 10 ..)
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
movd _DWORD [edx+eax*SIZEOF_JSAMPLE], xmm4
|
||||||
|
movd _DWORD [esi+eax*SIZEOF_JSAMPLE], xmm2
|
||||||
|
mov edx, JSAMPROW [edi+2*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+3*SIZEOF_JSAMPROW]
|
||||||
|
movd _DWORD [edx+eax*SIZEOF_JSAMPLE], xmm1
|
||||||
|
movd _DWORD [esi+eax*SIZEOF_JSAMPLE], xmm3
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; unused
|
||||||
|
poppic ebx
|
||||||
|
mov esp,ebp ; esp <- aligned ebp
|
||||||
|
pop esp ; esp <- original ebp
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
|
||||||
|
; --------------------------------------------------------------------------
|
||||||
|
;
|
||||||
|
; Perform dequantization and inverse DCT on one block of coefficients,
|
||||||
|
; producing a reduced-size 2x2 output block.
|
||||||
|
;
|
||||||
|
; GLOBAL(void)
|
||||||
|
; jpeg_idct_2x2_sse2 (j_decompress_ptr cinfo, jpeg_component_info * compptr,
|
||||||
|
; JCOEFPTR coef_block,
|
||||||
|
; JSAMPARRAY output_buf, JDIMENSION output_col)
|
||||||
|
;
|
||||||
|
|
||||||
|
%define cinfo(b) (b)+8 ; j_decompress_ptr cinfo
|
||||||
|
%define compptr(b) (b)+12 ; jpeg_component_info * compptr
|
||||||
|
%define coef_block(b) (b)+16 ; JCOEFPTR coef_block
|
||||||
|
%define output_buf(b) (b)+20 ; JSAMPARRAY output_buf
|
||||||
|
%define output_col(b) (b)+24 ; JDIMENSION output_col
|
||||||
|
|
||||||
|
align 16
|
||||||
|
global EXTN(jpeg_idct_2x2_sse2)
|
||||||
|
|
||||||
|
EXTN(jpeg_idct_2x2_sse2):
|
||||||
|
push ebp
|
||||||
|
mov ebp,esp
|
||||||
|
push ebx
|
||||||
|
; push ecx ; need not be preserved
|
||||||
|
; push edx ; need not be preserved
|
||||||
|
push esi
|
||||||
|
push edi
|
||||||
|
|
||||||
|
get_GOT ebx ; get GOT address
|
||||||
|
|
||||||
|
; ---- Pass 1: process columns from input.
|
||||||
|
|
||||||
|
mov edx, POINTER [compptr(ebp)]
|
||||||
|
mov edx, POINTER [jcompinfo_dct_table(edx)] ; quantptr
|
||||||
|
mov esi, JCOEFPTR [coef_block(ebp)] ; inptr
|
||||||
|
|
||||||
|
; | input: | result: |
|
||||||
|
; | 00 01 ** 03 ** 05 ** 07 | |
|
||||||
|
; | 10 11 ** 13 ** 15 ** 17 | |
|
||||||
|
; | ** ** ** ** ** ** ** ** | |
|
||||||
|
; | 30 31 ** 33 ** 35 ** 37 | A0 A1 A3 A5 A7 |
|
||||||
|
; | ** ** ** ** ** ** ** ** | B0 B1 B3 B5 B7 |
|
||||||
|
; | 50 51 ** 53 ** 55 ** 57 | |
|
||||||
|
; | ** ** ** ** ** ** ** ** | |
|
||||||
|
; | 70 71 ** 73 ** 75 ** 77 | |
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
movdqa xmm0, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm1, XMMWORD [XMMBLOCK(3,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm0, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm1, XMMWORD [XMMBLOCK(3,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
movdqa xmm2, XMMWORD [XMMBLOCK(5,0,esi,SIZEOF_JCOEF)]
|
||||||
|
movdqa xmm3, XMMWORD [XMMBLOCK(7,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm2, XMMWORD [XMMBLOCK(5,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
pmullw xmm3, XMMWORD [XMMBLOCK(7,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; xmm0=(10 11 ** 13 ** 15 ** 17), xmm1=(30 31 ** 33 ** 35 ** 37)
|
||||||
|
; xmm2=(50 51 ** 53 ** 55 ** 57), xmm3=(70 71 ** 73 ** 75 ** 77)
|
||||||
|
|
||||||
|
pcmpeqd xmm7,xmm7
|
||||||
|
pslld xmm7,WORD_BIT ; xmm7={0x0000 0xFFFF 0x0000 0xFFFF ..}
|
||||||
|
|
||||||
|
movdqa xmm4,xmm0 ; xmm4=(10 11 ** 13 ** 15 ** 17)
|
||||||
|
movdqa xmm5,xmm2 ; xmm5=(50 51 ** 53 ** 55 ** 57)
|
||||||
|
punpcklwd xmm4,xmm1 ; xmm4=(10 30 11 31 ** ** 13 33)
|
||||||
|
punpcklwd xmm5,xmm3 ; xmm5=(50 70 51 71 ** ** 53 73)
|
||||||
|
pmaddwd xmm4,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd xmm5,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
psrld xmm0,WORD_BIT ; xmm0=(11 -- 13 -- 15 -- 17 --)
|
||||||
|
pand xmm1,xmm7 ; xmm1=(-- 31 -- 33 -- 35 -- 37)
|
||||||
|
psrld xmm2,WORD_BIT ; xmm2=(51 -- 53 -- 55 -- 57 --)
|
||||||
|
pand xmm3,xmm7 ; xmm3=(-- 71 -- 73 -- 75 -- 77)
|
||||||
|
por xmm0,xmm1 ; xmm0=(11 31 13 33 15 35 17 37)
|
||||||
|
por xmm2,xmm3 ; xmm2=(51 71 53 73 55 75 57 77)
|
||||||
|
pmaddwd xmm0,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd xmm2,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
paddd xmm4,xmm5 ; xmm4=tmp0[col0 col1 **** col3]
|
||||||
|
paddd xmm0,xmm2 ; xmm0=tmp0[col1 col3 col5 col7]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
movdqa xmm6, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_JCOEF)]
|
||||||
|
pmullw xmm6, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_ISLOW_MULT_TYPE)]
|
||||||
|
|
||||||
|
; xmm6=(00 01 ** 03 ** 05 ** 07)
|
||||||
|
|
||||||
|
movdqa xmm1,xmm6 ; xmm1=(00 01 ** 03 ** 05 ** 07)
|
||||||
|
pslld xmm6,WORD_BIT ; xmm6=(-- 00 -- ** -- ** -- **)
|
||||||
|
pand xmm1,xmm7 ; xmm1=(-- 01 -- 03 -- 05 -- 07)
|
||||||
|
psrad xmm6,(WORD_BIT-CONST_BITS-2) ; xmm6=tmp10[col0 **** **** ****]
|
||||||
|
psrad xmm1,(WORD_BIT-CONST_BITS-2) ; xmm1=tmp10[col1 col3 col5 col7]
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movdqa xmm3,xmm6
|
||||||
|
movdqa xmm5,xmm1
|
||||||
|
paddd xmm6,xmm4 ; xmm6=data0[col0 **** **** ****]=(A0 ** ** **)
|
||||||
|
paddd xmm1,xmm0 ; xmm1=data0[col1 col3 col5 col7]=(A1 A3 A5 A7)
|
||||||
|
psubd xmm3,xmm4 ; xmm3=data1[col0 **** **** ****]=(B0 ** ** **)
|
||||||
|
psubd xmm5,xmm0 ; xmm5=data1[col1 col3 col5 col7]=(B1 B3 B5 B7)
|
||||||
|
|
||||||
|
movdqa xmm2,[GOTOFF(ebx,PD_DESCALE_P1_2)] ; xmm2=[PD_DESCALE_P1_2]
|
||||||
|
|
||||||
|
punpckldq xmm6,xmm3 ; xmm6=(A0 B0 ** **)
|
||||||
|
|
||||||
|
movdqa xmm7,xmm1
|
||||||
|
punpcklqdq xmm1,xmm5 ; xmm1=(A1 A3 B1 B3)
|
||||||
|
punpckhqdq xmm7,xmm5 ; xmm7=(A5 A7 B5 B7)
|
||||||
|
|
||||||
|
paddd xmm6,xmm2
|
||||||
|
psrad xmm6,DESCALE_P1_2
|
||||||
|
|
||||||
|
paddd xmm1,xmm2
|
||||||
|
paddd xmm7,xmm2
|
||||||
|
psrad xmm1,DESCALE_P1_2
|
||||||
|
psrad xmm7,DESCALE_P1_2
|
||||||
|
|
||||||
|
; -- Prefetch the next coefficient block
|
||||||
|
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 0*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 1*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 2*32]
|
||||||
|
prefetchnta [esi + DCTSIZE2*SIZEOF_JCOEF + 3*32]
|
||||||
|
|
||||||
|
; ---- Pass 2: process rows, store into output array.
|
||||||
|
|
||||||
|
mov edi, JSAMPARRAY [output_buf(ebp)] ; (JSAMPROW *)
|
||||||
|
mov eax, JDIMENSION [output_col(ebp)]
|
||||||
|
|
||||||
|
; | input:| result:|
|
||||||
|
; | A0 B0 | |
|
||||||
|
; | A1 B1 | C0 C1 |
|
||||||
|
; | A3 B3 | D0 D1 |
|
||||||
|
; | A5 B5 | |
|
||||||
|
; | A7 B7 | |
|
||||||
|
|
||||||
|
; -- Odd part
|
||||||
|
|
||||||
|
packssdw xmm1,xmm1 ; xmm1=(A1 A3 B1 B3 A1 A3 B1 B3)
|
||||||
|
packssdw xmm7,xmm7 ; xmm7=(A5 A7 B5 B7 A5 A7 B5 B7)
|
||||||
|
pmaddwd xmm1,[GOTOFF(ebx,PW_F362_MF127)]
|
||||||
|
pmaddwd xmm7,[GOTOFF(ebx,PW_F085_MF072)]
|
||||||
|
|
||||||
|
paddd xmm1,xmm7 ; xmm1=tmp0[row0 row1 row0 row1]
|
||||||
|
|
||||||
|
; -- Even part
|
||||||
|
|
||||||
|
pslld xmm6,(CONST_BITS+2) ; xmm6=tmp10[row0 row1 **** ****]
|
||||||
|
|
||||||
|
; -- Final output stage
|
||||||
|
|
||||||
|
movdqa xmm4,xmm6
|
||||||
|
paddd xmm6,xmm1 ; xmm6=data0[row0 row1 **** ****]=(C0 C1 ** **)
|
||||||
|
psubd xmm4,xmm1 ; xmm4=data1[row0 row1 **** ****]=(D0 D1 ** **)
|
||||||
|
|
||||||
|
punpckldq xmm6,xmm4 ; xmm6=(C0 D0 C1 D1)
|
||||||
|
|
||||||
|
paddd xmm6,[GOTOFF(ebx,PD_DESCALE_P2_2)]
|
||||||
|
psrad xmm6,DESCALE_P2_2
|
||||||
|
|
||||||
|
packssdw xmm6,xmm6 ; xmm6=(C0 D0 C1 D1 C0 D0 C1 D1)
|
||||||
|
packsswb xmm6,xmm6 ; xmm6=(C0 D0 C1 D1 C0 D0 C1 D1 ..)
|
||||||
|
paddb xmm6,[GOTOFF(ebx,PB_CENTERJSAMP)]
|
||||||
|
|
||||||
|
pextrw ebx,xmm6,0x00 ; ebx=(C0 D0 -- --)
|
||||||
|
pextrw ecx,xmm6,0x01 ; ecx=(C1 D1 -- --)
|
||||||
|
|
||||||
|
mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW]
|
||||||
|
mov esi, JSAMPROW [edi+1*SIZEOF_JSAMPROW]
|
||||||
|
mov WORD [edx+eax*SIZEOF_JSAMPLE], bx
|
||||||
|
mov WORD [esi+eax*SIZEOF_JSAMPLE], cx
|
||||||
|
|
||||||
|
pop edi
|
||||||
|
pop esi
|
||||||
|
; pop edx ; need not be preserved
|
||||||
|
; pop ecx ; need not be preserved
|
||||||
|
pop ebx
|
||||||
|
pop ebp
|
||||||
|
ret
|
||||||
|
|
||||||
|
%endif ; JIDCT_INT_SSE2_SUPPORTED
|
||||||
|
%endif ; IDCT_SCALING_SUPPORTED
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user