Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

Search

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

Contents

(Top) 1 Changes 2 CPUs with SSE3 3 New instructions 3.1 Common instructions 3.1.1 Arithmetic 3.1.2 AOS ( Array Of Structures ) 3.2 Other instructions 4 References 5 External links

SSE3: Difference between revisions

●Català ●Deutsch ●Español ●فارسی ●Français ●한국어 ●Italiano ●Magyar ●Nederlands ●Norsk bokmål ●Polski ●Русский ●Українська ●中文 Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance Help From Wikipedia, the free encyclopedia Browse history interactively

← Previous edit

Content deleted Content added

Inline

Latest revision as of 22:08, 7 June 2024

SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI),^[1] is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU.^[1] In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs.^[2] The earlier SIMD instruction sets on the x86 platform, from oldest to newest, are MMX, 3DNow! (developed by AMD, no longer supported on newer CPUs), SSE, and SSE2.

SSE3 contains 13 new instructions over SSE2.^[3]

Changes[edit]

The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added.^[4] These instructions can be used to speed up the implementation of a number of DSP and 3D operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly pipeline stalls. Finally, the extension adds LDDQU, an alternative misaligned integer vector load that has better performance on NetBurst based platforms for loads that cross cacheline boundaries.^[5]

CPUs with SSE3[edit]

AMD:
- Opteron (since Stepping E4^[6])
- Sempron (since Palermo. Stepping E3)
- Athlon 64 (since Venice Stepping E3 and San Diego Stepping E4)
- Athlon 64 FX (since San Diego Stepping E4)
- Athlon 64 X2
- Phenom 64 X2
- Turion family
- K10 family
- APU family (including without GPU)
- FX Series
- Zen family
Intel:
- Celeron D
- Celeron (starting with Core microarchitecture)
- Pentium 4 (since Prescott)
- Pentium D
- Pentium Extreme Edition (but NOT Pentium 4 Extreme Edition)
- Pentium Dual-Core
- Pentium (starting with Core microarchitecture)
- Core
- Xeon (since Nocona^[7])
- Atom
VIA/Centaur:
- C7
- Nano
Transmeta Efficeon TM88xx (NOT Model Numbers TM86xx)

New instructions[edit]

Common instructions[edit]

Arithmetic[edit]

ADDSUBPD

Add-Subtract-Packed-Double^[8]

Input: { A0, A1 }, { B0, B1 }
Output: { A0 − B0, A1 + B1 }

ADDSUBPS

Add-Subtract-Packed-Single^[8]

Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
Output: { A0 − B0, A1 + B1, A2 − B2, A3 + B3 }

AOS ( Array Of Structures )[edit]

HADDPD

Horizontal-Add-Packed-Double^[8]

Input: { A0, A1 }, { B0, B1 }
Output: { A0 + A1, B0 + B1 }

HADDPS

Horizontal-Add-Packed-Single^[8]

Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 }

HSUBPD

Horizontal-Subtract-Packed-Double^[8]

Input: { A0, A1 }, { B0, B1 }
Output: { A0 − A1, B0 − B1 }

HSUBPS

Horizontal-Subtract-Packed-Single^[8]

Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 }
Output: { A0 − A1, A2 − A3, B0 − B1, B2 − B3 }

LDDQU: As stated above, this is an alternative misaligned integer vector load.^[8] It can be helpful for video compression tasks.
MOVDDUP, MOVSHDUP, MOVSLDUP^[4]: These are useful for complex numbers and wave calculation like sound.
FISTTP: Like the older x87 FISTP instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead.^[4] Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard.

Other instructions[edit]

MONITOR, MWAIT: The MONITOR instruction is used to specify a memory address for monitoring, while the MWAIT instruction puts the processor into a low-power state and waits for a write event to the monitored address.^[4]

References[edit]

^ ^a ^b Wilson, Anand Lal Shimpi & Derek. "Intel's Pentium 4 E: Prescott Arrives with Luggage". www.anandtech.com. Retrieved 2023-04-10.

^ Shimpi, Anand Lal. "Industry Update - Q4-2004: AMD adds SSE3 Support, Intel's 925/915 not selling and more". www.anandtech.com. Retrieved 2023-04-10.

^ "Intel Instruction Set Extensions Technology". Intel. Retrieved 2023-04-10.

^ ^a ^b ^c ^d Wright, Christopher. "SSE3 Instruction Set". softpixel.com. Retrieved 2023-04-10.

^ "LDDQU — Load Unaligned Integer 128 Bits". www.felixcloutier.com. Retrieved 2023-04-10.

^ Wilson, Derek. "AMD K8 E4 Stepping: SSE3 Performance". www.anandtech.com. Retrieved 2023-04-10.

^ "Intel Xeon 3.4GHz ['Nocona' core]". HEXUS. 2004-08-18. Retrieved 2023-04-10.

^ ^a ^b ^c ^d ^e ^f ^g "SSE3 Instructions - x86 Assembly Language Reference Manual". docs.oracle.com. Retrieved 2023-04-10.

External links[edit]

X-bit Labs

v t e Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5 ~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015) AMX (2022) AVX10 (2023)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	Thumb MIPS16e ASE RVC
Security and cryptography	PadLock (2003) AES-NI (2008); ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015) TDX (2021)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006) VT-d (AMD-Vi)
Suspended extensions' dates are ~~struck through~~.

Retrieved from "https://en.wikipedia.org/w/index.php?title=SSE3&oldid=1227806007" Categories: ●X86 instructions ●SIMD computing Hidden categories: ●Articles with short description ●Short description is different from Wikidata ●This page was last edited on 7 June 2024, at 22:08 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view