"ARM"="Arm", "NEON"="Neon"

Refer to: https://www.arm.com/company/policies/trademarks/arm-trademark-list/arm-trademark https://www.arm.com/company/policies/trademarks/arm-trademark-list/neon-trademark NOTE: These changes are only applied to change log entries for 2.0.x and later, since the change log is a historical record and Arm's new trademark policy did not go into effect until late 2017.
2020-10-15 17:47:31 -05:00
parent b5a1472781
commit 1ed312eab6
15 changed files with 76 additions and 76 deletions
--- a/BUILDING.md
+++ b/BUILDING.md
@@ -398,8 +398,8 @@ located (usually **/usr/bin**.)  Next, execute the following commands:
 Building libjpeg-turbo for iOS
 ------------------------------

-iOS platforms, such as the iPhone and iPad, use ARM processors, and all
-currently supported models include NEON instructions.  Thus, they can take
+iOS platforms, such as the iPhone and iPad, use Arm processors, and all
+currently supported models include Neon instructions.  Thus, they can take
 advantage of libjpeg-turbo's SIMD extensions to significantly accelerate JPEG
 compression/decompression.  This section describes how to build libjpeg-turbo
 for these platforms.
@@ -412,7 +412,7 @@ for these platforms.
  it should be installed in your `PATH`.


-### ARMv7 (32-bit)
+### Armv7 (32-bit)

 **gas-preprocessor.pl required**

@@ -465,7 +465,7 @@ Same as above, but replace the first line with:
    make


-### ARMv7s (32-bit)
+### Armv7s (32-bit)

 **gas-preprocessor.pl required**

@@ -493,13 +493,13 @@ iPhone 5/iPad 4th Generation and newer:

 #### Xcode 5 and later (Clang)

-Same as the ARMv7 build procedure for Xcode 5 and later, except replace the
+Same as the Armv7 build procedure for Xcode 5 and later, except replace the
 compiler flags as follows:

    export CFLAGS="-Wall -mfloat-abi=softfp -arch armv7s -miphoneos-version-min=6.0"


-### ARMv8 (64-bit)
+### Armv8 (64-bit)

 **gas-preprocessor.pl required if using Xcode < 6**

@@ -523,7 +523,7 @@ iPhone 5S/iPad Mini 2/iPad Air and newer.
      [additional CMake flags] {source_directory}
    make

-Once built, lipo can be used to combine the ARMv7, v7s, and/or v8 variants into
+Once built, lipo can be used to combine the Armv7, v7s, and/or v8 variants into
 a universal library.


@@ -534,7 +534,7 @@ Building libjpeg-turbo for Android platforms requires v13b or later of the
 [Android NDK](https://developer.android.com/tools/sdk/ndk).


-### ARMv7 (32-bit)
+### Armv7 (32-bit)

 The following is a general recipe script that can be modified for your specific
 needs.
@@ -559,7 +559,7 @@ needs.
    make


-### ARMv8 (64-bit)
+### Armv8 (64-bit)

 The following is a general recipe script that can be modified for your specific
 needs.
@@ -742,21 +742,21 @@ must be built on OS X 10.6 or later.

    make udmg

-This creates a Mac package/disk image that contains universal x86-64/i386/ARM
+This creates a Mac package/disk image that contains universal x86-64/i386/Arm
 binaries.  The following CMake variables control which architectures are
 included in the universal binaries.  Setting any of these variables to an empty
 string excludes that architecture from the package.

 * `OSX_32BIT_BUILD`: Directory containing an i386 (32-bit) Mac build of
  libjpeg-turbo (default: *{source_directory}*/osxx86)
-* `IOS_ARMV7_BUILD`: Directory containing an ARMv7 (32-bit) iOS build of
+* `IOS_ARMV7_BUILD`: Directory containing an Armv7 (32-bit) iOS build of
  libjpeg-turbo (default: *{source_directory}*/iosarmv7)
-* `IOS_ARMV7S_BUILD`: Directory containing an ARMv7s (32-bit) iOS build of
+* `IOS_ARMV7S_BUILD`: Directory containing an Armv7s (32-bit) iOS build of
  libjpeg-turbo (default: *{source_directory}*/iosarmv7s)
-* `IOS_ARMV8_BUILD`: Directory containing an ARMv8 (64-bit) iOS build of
+* `IOS_ARMV8_BUILD`: Directory containing an Armv8 (64-bit) iOS build of
  libjpeg-turbo (default: *{source_directory}*/iosarmv8)

-You should first use CMake to configure i386, ARMv7, ARMv7s, and/or ARMv8
+You should first use CMake to configure i386, Armv7, Armv7s, and/or Armv8
 sub-builds of libjpeg-turbo (see "Build Recipes" and "Building libjpeg-turbo
 for iOS" above) in build directories that match those specified in the
 aforementioned CMake variables.  Next, configure the primary build of
--- a/ChangeLog.md
+++ b/ChangeLog.md
@@ -20,8 +20,8 @@ with `jpeg_skip_scanlines()`, and the issues could not readily be fixed.
     - Fixed an issue whereby `jpeg_skip_scanlines()` always returned 0 when
 skipping past the end of an image.

-3. The ARM 64-bit (ARMv8) NEON SIMD extensions can now be built using MinGW
-toolchains targetting ARM64 (AArch64) Windows binaries.
+3. The Arm 64-bit (Armv8) Neon SIMD extensions can now be built using MinGW
+toolchains targetting Arm64 (AArch64) Windows binaries.

 4. Fixed unexpected visual artifacts that occurred when using
 `jpeg_crop_scanline()` and interblock smoothing while decompressing only the DC
@@ -94,7 +94,7 @@ other user-visible errant behavior, and given that the lossless transformer
 (unlike the decompressor) is not generally exposed to arbitrary data exploits,
 this issue did not likely pose a security risk.

-6. The ARM 64-bit (ARMv8) NEON SIMD assembly code now stores constants in a
+6. The Arm 64-bit (Armv8) Neon SIMD assembly code now stores constants in a
 separate read-only data section rather than in the text section, to support
 execute-only memory layouts.

@@ -380,7 +380,7 @@ algorithm that caused incorrect dithering in the output image.  This algorithm
 now produces bitwise-identical results to the unmerged algorithms.

 12. The SIMD function symbols for x86[-64]/ELF, MIPS/ELF, macOS/x86[-64] (if
-libjpeg-turbo is built with YASM), and iOS/ARM[64] builds are now private.
+libjpeg-turbo is built with YASM), and iOS/Arm[64] builds are now private.
 This prevents those symbols from being exposed in applications or shared
 libraries that link statically with libjpeg-turbo.

--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@ Background
 ==========

 libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
-baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and
+baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
 MIPS systems, as well as progressive JPEG compression on x86 and x86-64
 systems.  On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
 all else being equal.  On other types of systems, libjpeg-turbo can still
--- a/cmakescripts/BuildPackages.cmake
+++ b/cmakescripts/BuildPackages.cmake
@@ -137,13 +137,13 @@ set(OSX_32BIT_BUILD ${DEFAULT_OSX_32BIT_BUILD} CACHE PATH
  "Directory containing 32-bit (i386) Mac build to include in universal binaries (default: ${DEFAULT_OSX_32BIT_BUILD})")
 set(DEFAULT_IOS_ARMV7_BUILD ${CMAKE_SOURCE_DIR}/iosarmv7)
 set(IOS_ARMV7_BUILD ${DEFAULT_IOS_ARMV7_BUILD} CACHE PATH
-  "Directory containing ARMv7 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7_BUILD})")
+  "Directory containing Armv7 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7_BUILD})")
 set(DEFAULT_IOS_ARMV7S_BUILD ${CMAKE_SOURCE_DIR}/iosarmv7s)
 set(IOS_ARMV7S_BUILD ${DEFAULT_IOS_ARMV7S_BUILD} CACHE PATH
-  "Directory containing ARMv7s iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7S_BUILD})")
+  "Directory containing Armv7s iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV7S_BUILD})")
 set(DEFAULT_IOS_ARMV8_BUILD ${CMAKE_SOURCE_DIR}/iosarmv8)
 set(IOS_ARMV8_BUILD ${DEFAULT_IOS_ARMV8_BUILD} CACHE PATH
-  "Directory containing ARMv8 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV8_BUILD})")
+  "Directory containing Armv8 iOS build to include in universal binaries (default: ${DEFAULT_IOS_ARMV8_BUILD})")

 set(OSX_APP_CERT_NAME "" CACHE STRING
  "Name of the Developer ID Application certificate (in the macOS keychain) that should be used to sign the libjpeg-turbo DMG.  Leave this blank to generate an unsigned DMG.")
--- a/jchuff.c
+++ b/jchuff.c
@@ -34,10 +34,10 @@
 * memory footprint by 64k, which is important for some mobile applications
 * that create many isolated instances of libjpeg-turbo (web browsers, for
 * instance.)  This may improve performance on some mobile platforms as well.
- * This feature is enabled by default only on ARM processors, because some x86
+ * This feature is enabled by default only on Arm processors, because some x86
 * chips have a slow implementation of bsr, and the use of clz/bsr cannot be
 * shown to have a significant performance impact even on the x86 chips that
- * have a fast implementation of it.  When building for ARMv6, you can
+ * have a fast implementation of it.  When building for Armv6, you can
 * explicitly disable the use of clz/bsr by adding -mthumb to the compiler
 * flags (this defines __thumb__).
 */
--- a/jcphuff.c
+++ b/jcphuff.c
@@ -43,10 +43,10 @@
 * memory footprint by 64k, which is important for some mobile applications
 * that create many isolated instances of libjpeg-turbo (web browsers, for
 * instance.)  This may improve performance on some mobile platforms as well.
- * This feature is enabled by default only on ARM processors, because some x86
+ * This feature is enabled by default only on Arm processors, because some x86
 * chips have a slow implementation of bsr, and the use of clz/bsr cannot be
 * shown to have a significant performance impact even on the x86 chips that
- * have a fast implementation of it.  When building for ARMv6, you can
+ * have a fast implementation of it.  When building for Armv6, you can
 * explicitly disable the use of clz/bsr by adding -mthumb to the compiler
 * flags (this defines __thumb__).
 */
--- a/release/ReadMe.txt
+++ b/release/ReadMe.txt
@@ -1,4 +1,4 @@
-libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86 and x86-64 systems.  On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal.  On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines.  In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.
+libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and MIPS systems, as well as progressive JPEG compression on x86 and x86-64 systems.  On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg, all else being equal.  On other types of systems, libjpeg-turbo can still outperform libjpeg by a significant amount, by virtue of its highly-optimized Huffman coding routines.  In many cases, the performance of libjpeg-turbo rivals that of proprietary high-speed JPEG codecs.

 libjpeg-turbo implements both the traditional libjpeg API as well as the less powerful but more straightforward TurboJPEG API.  libjpeg-turbo also features colorspace extensions that allow it to compress from/decompress to 32-bit and big-endian pixel buffers (RGBX, XBGR, etc.), as well as a full-featured Java interface.

--- a/release/deb-control.in
+++ b/release/deb-control.in
@@ -9,7 +9,7 @@ Homepage: @PKGURL@
 Installed-Size: {__SIZE}
 Description: A SIMD-accelerated JPEG codec that provides both the libjpeg and TurboJPEG APIs
 libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
- baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and
+ baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
 MIPS systems, as well as progressive JPEG compression on x86 and x86-64
 systems.  On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
 all else being equal.  On other types of systems, libjpeg-turbo can still
--- a/release/makemacpkg.in
+++ b/release/makemacpkg.in
@@ -223,15 +223,15 @@ install_ios()
 }

 if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV7" != "" ]; then
-	install_ios $BUILDDIRARMV7 ARMv7 armv7 arm
+	install_ios $BUILDDIRARMV7 Armv7 armv7 arm
 fi

 if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV7S" != "" ]; then
-	install_ios $BUILDDIRARMV7S ARMv7s armv7s arm
+	install_ios $BUILDDIRARMV7S Armv7s armv7s arm
 fi

 if [ $UNIVERSAL = 1 -a "$BUILDDIRARMV8" != "" ]; then
-	install_ios $BUILDDIRARMV8 ARMv8 armv8 arm64
+	install_ios $BUILDDIRARMV8 Armv8 armv8 arm64
 fi

 install_name_tool -id $LIBDIR/$LIBJPEG_DSO_NAME $PKGROOT/$LIBDIR/$LIBJPEG_DSO_NAME
--- a/release/rpm.spec.in
+++ b/release/rpm.spec.in
@@ -52,7 +52,7 @@ Provides: %{name} = %{version}-%{release}, @CMAKE_PROJECT_NAME@ = %{version}-%{r

 %description
 libjpeg-turbo is a JPEG image codec that uses SIMD instructions to accelerate
-baseline JPEG compression and decompression on x86, x86-64, ARM, PowerPC, and
+baseline JPEG compression and decompression on x86, x86-64, Arm, PowerPC, and
 MIPS systems, as well as progressive JPEG compression on x86 and x86-64
 systems.  On such systems, libjpeg-turbo is generally 2-6x as fast as libjpeg,
 all else being equal.  On other types of systems, libjpeg-turbo can still
--- a/simd/CMakeLists.txt
+++ b/simd/CMakeLists.txt
@@ -205,7 +205,7 @@ endif()


 ###############################################################################
-# ARM (GAS)
+# Arm (GAS)
 ###############################################################################

 elseif(CPU_TYPE STREQUAL "arm64" OR CPU_TYPE STREQUAL "arm")
--- a/simd/arm/jsimd.c
+++ b/simd/arm/jsimd.c
@@ -13,7 +13,7 @@
 *
 * This file contains the interface between the "normal" portions
 * of the library and the SIMD implementations when running on a
- * 32-bit ARM architecture.
+ * 32-bit Arm architecture.
 */

 #define JPEG_INTERNALS
@@ -118,7 +118,7 @@ init_simd(void)
 #if defined(__ARM_NEON__)
  simd_support |= JSIMD_NEON;
 #elif defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
-  /* We still have a chance to use NEON regardless of globally used
+  /* We still have a chance to use Neon regardless of globally used
   * -mcpu/-mfpu options passed to gcc by performing runtime detection via
   * /proc/cpuinfo parsing on linux/android */
  while (!parse_proc_cpuinfo(bufsize)) {
--- a/simd/arm/jsimd_neon.S
+++ b/simd/arm/jsimd_neon.S
@@ -1,5 +1,5 @@
 /*
- * ARMv7 NEON optimizations for libjpeg-turbo
+ * Armv7 Neon optimizations for libjpeg-turbo
 *
 * Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies).
 *                          All Rights Reserved.
@@ -229,7 +229,7 @@ asm_function jsimd_idct_islow_neon
    ROW7L           .req d30
    ROW7R           .req d31

-    /* Load and dequantize coefficients into NEON registers
+    /* Load and dequantize coefficients into Neon registers
     * with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
@@ -261,7 +261,7 @@ asm_function jsimd_idct_islow_neon
    vld1.16         {d0, d1, d2, d3}, [ip, :128]  /* load constants */
    add             ip, ip, #16
    vmul.s16        q15, q15, q3
-    vpush           {d8-d15}                      /* save NEON registers */
+    vpush           {d8-d15}                      /* save Neon registers */
    /* 1-D IDCT, pass 1, left 4x8 half */
    vadd.s16        d4, ROW7L, ROW3L
    vadd.s16        d5, ROW5L, ROW1L
@@ -507,7 +507,7 @@ asm_function jsimd_idct_islow_neon
    vqrshrn.s16     d17, q9, #2
    vqrshrn.s16     d18, q10, #2
    vqrshrn.s16     d19, q11, #2
-    vpop            {d8-d15}                      /* restore NEON registers */
+    vpop            {d8-d15}                      /* restore Neon registers */
    vqrshrn.s16     d20, q12, #2
      /* Transpose the final 8-bit samples and do signed->unsigned conversion */
      vtrn.16         q8, q9
@@ -688,7 +688,7 @@ asm_function jsimd_idct_islow_neon
 * function from jidctfst.c
 *
 * Normally 1-D AAN DCT needs 5 multiplications and 29 additions.
- * But in ARM NEON case some extra additions are required because VQDMULH
+ * But in Arm Neon case some extra additions are required because VQDMULH
 * instruction can't handle the constants larger than 1. So the expressions
 * like "x * 1.082392200" have to be converted to "x * 0.082392200 + x",
 * which introduces an extra addition. Overall, there are 6 extra additions
@@ -718,7 +718,7 @@ asm_function jsimd_idct_ifast_neon
    TMP3            .req r2
    TMP4            .req ip

-    /* Load and dequantize coefficients into NEON registers
+    /* Load and dequantize coefficients into Neon registers
     * with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
@@ -749,7 +749,7 @@ asm_function jsimd_idct_ifast_neon
    vmul.s16        q13, q13, q1
    vld1.16         {d0}, [ip, :64]  /* load constants */
    vmul.s16        q15, q15, q3
-    vpush           {d8-d13}         /* save NEON registers */
+    vpush           {d8-d13}         /* save Neon registers */
    /* 1-D IDCT, pass 1 */
    vsub.s16        q2, q10, q14
    vadd.s16        q14, q10, q14
@@ -842,7 +842,7 @@ asm_function jsimd_idct_ifast_neon
    vadd.s16        q14, q5, q3
    vsub.s16        q9, q5, q3
    vsub.s16        q13, q10, q2
-    vpop            {d8-d13}      /* restore NEON registers */
+    vpop            {d8-d13}      /* restore Neon registers */
    vadd.s16        q10, q10, q2
    vsub.s16        q11, q12, q1
    vadd.s16        q12, q12, q1
@@ -913,7 +913,7 @@ asm_function jsimd_idct_ifast_neon
 *
 * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which
 *       requires much less arithmetic operations and hence should be faster.
- *       The primary purpose of this particular NEON optimized function is
+ *       The primary purpose of this particular Neon optimized function is
 *       bit exact compatibility with jpeg-6b.
 *
 * TODO: a bit better instructions scheduling can be achieved by expanding
@@ -1016,7 +1016,7 @@ asm_function jsimd_idct_4x4_neon
    adr             TMP4, jsimd_idct_4x4_neon_consts
    vld1.16         {d0, d1, d2, d3}, [TMP4, :128]

-    /* Load all COEF_BLOCK into NEON registers with the following allocation:
+    /* Load all COEF_BLOCK into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | d4      | d5
@@ -1126,7 +1126,7 @@ asm_function jsimd_idct_4x4_neon
 *
 * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which
 *       requires much less arithmetic operations and hence should be faster.
- *       The primary purpose of this particular NEON optimized function is
+ *       The primary purpose of this particular Neon optimized function is
 *       bit exact compatibility with jpeg-6b.
 */

@@ -1173,7 +1173,7 @@ asm_function jsimd_idct_2x2_neon
    adr             TMP2, jsimd_idct_2x2_neon_consts
    vld1.16         {d0}, [TMP2, :64]

-    /* Load all COEF_BLOCK into NEON registers with the following allocation:
+    /* Load all COEF_BLOCK into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | d4      | d5
@@ -1499,7 +1499,7 @@ asm_function jsimd_ycc_\colorid\()_convert_neon
    adr             ip, jsimd_ycc_\colorid\()_neon_consts
    vld1.16         {d0, d1, d2, d3}, [ip, :128]

-    /* Save ARM registers and handle input arguments */
+    /* Save Arm registers and handle input arguments */
    push            {r4, r5, r6, r7, r8, r9, r10, lr}
    ldr             NUM_ROWS, [sp, #(4 * 8)]
    ldr             INPUT_BUF0, [INPUT_BUF]
@@ -1507,7 +1507,7 @@ asm_function jsimd_ycc_\colorid\()_convert_neon
    ldr             INPUT_BUF2, [INPUT_BUF, #8]
    .unreq          INPUT_BUF

-    /* Save NEON registers */
+    /* Save Neon registers */
    vpush           {d8-d15}

    /* Initially set d10, d11, d12, d13 to 0xFF */
@@ -1814,7 +1814,7 @@ asm_function jsimd_\colorid\()_ycc_convert_neon
    adr             ip, jsimd_\colorid\()_ycc_neon_consts
    vld1.16         {d0, d1, d2, d3}, [ip, :128]

-    /* Save ARM registers and handle input arguments */
+    /* Save Arm registers and handle input arguments */
    push            {r4, r5, r6, r7, r8, r9, r10, lr}
    ldr             NUM_ROWS, [sp, #(4 * 8)]
    ldr             OUTPUT_BUF0, [OUTPUT_BUF]
@@ -1822,7 +1822,7 @@ asm_function jsimd_\colorid\()_ycc_convert_neon
    ldr             OUTPUT_BUF2, [OUTPUT_BUF, #8]
    .unreq          OUTPUT_BUF

-    /* Save NEON registers */
+    /* Save Neon registers */
    vpush           {d8-d15}

    /* Outer loop over scanlines */
@@ -2017,7 +2017,7 @@ asm_function jsimd_fdct_ifast_neon
    adr             TMP, jsimd_fdct_ifast_neon_consts
    vld1.16         {d0}, [TMP, :64]

-    /* Load all DATA into NEON registers with the following allocation:
+    /* Load all DATA into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | d16     | d17    | q8
@@ -2112,8 +2112,8 @@ asm_function jsimd_fdct_ifast_neon
 *
 * Note: the code uses 2 stage pipelining in order to improve instructions
 *       scheduling and eliminate stalls (this provides ~15% better
- *       performance for this function on both ARM Cortex-A8 and
- *       ARM Cortex-A9 when compared to the non-pipelined variant).
+ *       performance for this function on both Arm Cortex-A8 and
+ *       Arm Cortex-A9 when compared to the non-pipelined variant).
 *       The instructions which belong to the second stage use different
 *       indentation for better readiability.
 */
--- a/simd/arm64/jsimd.c
+++ b/simd/arm64/jsimd.c
@@ -12,7 +12,7 @@
 *
 * This file contains the interface between the "normal" portions
 * of the library and the SIMD implementations when running on a
- * 64-bit ARM architecture.
+ * 64-bit Arm architecture.
 */

 #define JPEG_INTERNALS
@@ -114,8 +114,8 @@ parse_proc_cpuinfo(int bufsize)
 */

 /*
- * ARMv8 architectures support NEON extensions by default.
- * It is no longer optional as it was with ARMv7.
+ * Armv8 architectures support Neon extensions by default.
+ * It is no longer optional as it was with Armv7.
 */


--- a/simd/arm64/jsimd_neon.S
+++ b/simd/arm64/jsimd_neon.S
@@ -1,5 +1,5 @@
 /*
- * ARMv8 NEON optimizations for libjpeg-turbo
+ * Armv8 Neon optimizations for libjpeg-turbo
 *
 * Copyright (C) 2009-2011, Nokia Corporation and/or its subsidiary(-ies).
 *                          All Rights Reserved.
@@ -611,7 +611,7 @@ asm_function jsimd_idct_islow_neon
    shrn2           v5.8h, v15.4s, #16  /* wsptr[DCTSIZE*3] = (int)DESCALE(tmp13 + tmp0, CONST_BITS+PASS1_BITS+3) */
    shrn2           v6.8h, v17.4s, #16  /* wsptr[DCTSIZE*4] = (int)DESCALE(tmp13 - tmp0, CONST_BITS+PASS1_BITS+3) */
    movi            v0.16b, #(CENTERJSAMPLE)
-    /* Prepare pointers (dual-issue with NEON instructions) */
+    /* Prepare pointers (dual-issue with Neon instructions) */
      ldp             TMP1, TMP2, [OUTPUT_BUF], 16
    sqrshrn         v28.8b, v2.8h, #(CONST_BITS+PASS1_BITS+3-16)
      ldp             TMP3, TMP4, [OUTPUT_BUF], 16
@@ -992,7 +992,7 @@ asm_function jsimd_idct_islow_neon
 * function from jidctfst.c
 *
 * Normally 1-D AAN DCT needs 5 multiplications and 29 additions.
- * But in ARM NEON case some extra additions are required because VQDMULH
+ * But in Arm Neon case some extra additions are required because VQDMULH
 * instruction can't handle the constants larger than 1. So the expressions
 * like "x * 1.082392200" have to be converted to "x * 0.082392200 + x",
 * which introduces an extra addition. Overall, there are 6 extra additions
@@ -1024,7 +1024,7 @@ asm_function jsimd_idct_ifast_neon
       instruction ensures that those bits are set to zero. */
    uxtw x3, w3

-    /* Load and dequantize coefficients into NEON registers
+    /* Load and dequantize coefficients into Neon registers
     * with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
@@ -1037,7 +1037,7 @@ asm_function jsimd_idct_ifast_neon
     *   6 | d28     | d29     ( v22.8h )
     *   7 | d30     | d31     ( v23.8h )
     */
-    /* Save NEON registers used in fast IDCT */
+    /* Save Neon registers used in fast IDCT */
    get_symbol_loc  TMP5, Ljsimd_idct_ifast_neon_consts
    ld1             {v16.8h, v17.8h}, [COEF_BLOCK], 32
    ld1             {v0.8h, v1.8h}, [DCT_TABLE], 32
@@ -1142,7 +1142,7 @@ asm_function jsimd_idct_ifast_neon
    add             v20.8h, v20.8h, v1.8h
    /* Descale to 8-bit and range limit */
    movi            v0.16b, #0x80
-      /* Prepare pointers (dual-issue with NEON instructions) */
+      /* Prepare pointers (dual-issue with Neon instructions) */
      ldp             TMP1, TMP2, [OUTPUT_BUF], 16
    sqshrn          v28.8b, v16.8h, #5
      ldp             TMP3, TMP4, [OUTPUT_BUF], 16
@@ -1221,7 +1221,7 @@ asm_function jsimd_idct_ifast_neon
 *
 * NOTE: jpeg-8 has an improved implementation of 4x4 inverse-DCT, which
 *       requires much less arithmetic operations and hence should be faster.
- *       The primary purpose of this particular NEON optimized function is
+ *       The primary purpose of this particular Neon optimized function is
 *       bit exact compatibility with jpeg-6b.
 *
 * TODO: a bit better instructions scheduling can be achieved by expanding
@@ -1291,7 +1291,7 @@ asm_function jsimd_idct_4x4_neon
       instruction ensures that those bits are set to zero. */
    uxtw x3, w3

-    /* Save all used NEON registers */
+    /* Save all used Neon registers */
    sub             sp, sp, 64
    mov             x9, sp
    /* Load constants (v3.4h is just used for padding) */
@@ -1300,7 +1300,7 @@ asm_function jsimd_idct_4x4_neon
    st1             {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
    ld1             {v0.4h, v1.4h, v2.4h, v3.4h}, [TMP4]

-    /* Load all COEF_BLOCK into NEON registers with the following allocation:
+    /* Load all COEF_BLOCK into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | v4.4h   | v5.4h
@@ -1434,7 +1434,7 @@ asm_function jsimd_idct_4x4_neon
 *
 * NOTE: jpeg-8 has an improved implementation of 2x2 inverse-DCT, which
 *       requires much less arithmetic operations and hence should be faster.
- *       The primary purpose of this particular NEON optimized function is
+ *       The primary purpose of this particular Neon optimized function is
 *       bit exact compatibility with jpeg-6b.
 */

@@ -1483,7 +1483,7 @@ asm_function jsimd_idct_2x2_neon
    st1             {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
    ld1             {v14.4h}, [TMP2]

-    /* Load all COEF_BLOCK into NEON registers with the following allocation:
+    /* Load all COEF_BLOCK into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | v4.4h   | v5.4h
@@ -1857,7 +1857,7 @@ asm_function jsimd_ycc_\colorid\()_convert_neon_slowst3
    /* Load constants to d1, d2, d3 (v0.4h is just used for padding) */
    get_symbol_loc  x15, Ljsimd_ycc_rgb_neon_consts

-    /* Save NEON registers */
+    /* Save Neon registers */
    st1             {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
    st1             {v12.8b, v13.8b, v14.8b, v15.8b}, [x9], 32
    ld1             {v0.4h, v1.4h}, [x15], 16
@@ -2142,7 +2142,7 @@ generate_jsimd_ycc_rgb_convert_neon extbgr,  24, 2, .4h,  1, .4h,  0, .4h,  .8b,
 .endm

 /* TODO: expand macros and interleave instructions if some in-order
- *       ARM64 processor actually can dual-issue LOAD/STORE with ALU */
+ *       AArch64 processor actually can dual-issue LOAD/STORE with ALU */
 .macro do_rgb_to_yuv_stage2_store_load_stage1 fast_ld3
    do_rgb_to_yuv_stage2
    do_load         \bpp, 8, \fast_ld3
@@ -2182,7 +2182,7 @@ asm_function jsimd_\colorid\()_ycc_convert_neon_slowld3
    ldr             OUTPUT_BUF2, [OUTPUT_BUF, #16]
    .unreq          OUTPUT_BUF

-    /* Save NEON registers */
+    /* Save Neon registers */
    sub             sp, sp, #64
    mov             x9, sp
    st1             {v8.8b, v9.8b, v10.8b, v11.8b}, [x9], 32
@@ -2396,13 +2396,13 @@ asm_function jsimd_fdct_islow_neon
    get_symbol_loc  TMP, Ljsimd_fdct_islow_neon_consts
    ld1             {v0.8h, v1.8h}, [TMP]

-    /* Save NEON registers */
+    /* Save Neon registers */
    sub             sp, sp, #64
    mov             x10, sp
    st1             {v8.8b, v9.8b, v10.8b, v11.8b}, [x10], 32
    st1             {v12.8b, v13.8b, v14.8b, v15.8b}, [x10], 32

-    /* Load all DATA into NEON registers with the following allocation:
+    /* Load all DATA into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | d16     | d17    | v16.8h
@@ -2629,7 +2629,7 @@ asm_function jsimd_fdct_islow_neon
    st1             {v16.8h, v17.8h, v18.8h, v19.8h}, [DATA], 64
    st1             {v20.8h, v21.8h, v22.8h, v23.8h}, [DATA]

-    /* Restore NEON registers */
+    /* Restore Neon registers */
    ld1             {v8.8b, v9.8b, v10.8b, v11.8b}, [sp], 32
    ld1             {v12.8b, v13.8b, v14.8b, v15.8b}, [sp], 32

@@ -2681,7 +2681,7 @@ asm_function jsimd_fdct_ifast_neon
    get_symbol_loc  TMP, Ljsimd_fdct_ifast_neon_consts
    ld1             {v0.4h}, [TMP]

-    /* Load all DATA into NEON registers with the following allocation:
+    /* Load all DATA into Neon registers with the following allocation:
     *       0 1 2 3 | 4 5 6 7
     *      ---------+--------
     *   0 | d16     | d17    | v0.8h
@@ -3066,7 +3066,7 @@ asm_function jsimd_huff_encode_one_block_neon_slowtbl
 .endif
    sub             sp, sp, 272
    sub             BUFFER, BUFFER, #0x1    /* BUFFER=buffer-- */
-    /* Save ARM registers */
+    /* Save Arm registers */
    stp             x19, x20, [sp]
    get_symbol_loc  x15, Ljsimd_huff_encode_one_block_neon_consts
    ldr             PUT_BUFFER, [x0, #0x10]