Clarified "pni" flag
|
Adding local short description: "CPU instruction set", overriding Wikidata description "instruction set"
|
||
(47 intermediate revisions by 35 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|CPU instruction set}} |
|||
{{Distinguish|SSSE3}} |
{{Distinguish|SSSE3}} |
||
'''SSE3''', '''Streaming SIMD Extensions 3''', also known by its [[Intel]] code name '''Prescott New Instructions''' ('''PNI'''),<ref name=":1">{{Cite web |last=Wilson |first=Anand Lal Shimpi & Derek |title=Intel's Pentium 4 E: Prescott Arrives with Luggage |url=https://www.anandtech.com/show/1230 |access-date=2023-04-10 |website=www.anandtech.com}}</ref> is the third iteration of the [[Streaming SIMD Extensions|SSE]] instruction set for the [[IA-32]] (x86) architecture. Intel introduced SSE3 in early 2004 with the [[Pentium 4#Prescott|Prescott]] revision of their [[Pentium 4]] CPU.<ref name=":1" /> In April 2005, [[AMD]] introduced a subset of SSE3 in revision E (Venice and San Diego) of their [[Athlon 64]] CPUs.<ref>{{Cite web |last=Shimpi |first=Anand Lal |title=Industry Update - Q4-2004: AMD adds SSE3 Support, Intel's 925/915 not selling and more |url=https://www.anandtech.com/show/1532 |access-date=2023-04-10 |website=www.anandtech.com}}</ref> The earlier [[SIMD]] instruction sets on the [[x86]] platform, from oldest to newest, are [[MMX (instruction set)|MMX]], [[3DNow!]] (developed by AMD, no longer supported on newer CPUs), [[Streaming SIMD Extensions|SSE]], and [[SSE2]]. |
|||
SSE3 contains 13 new instructions over [[SSE2]].<ref>{{Cite web |title=Intel Instruction Set Extensions Technology |url=https://www.intel.com/content/www/us/en/support/articles/000005779/processors.html |access-date=2023-04-10 |website=Intel |language=en}}</ref> |
|||
'''SSE3''', '''Streaming SIMD Extensions 3''', also known by its [[Intel]] code name '''Prescott New Instructions (PNI)''', is the third iteration of the [[Streaming SIMD Extensions|SSE]] instruction set for the [[IA-32]] (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their [[Pentium 4]] CPU. In April 2005, [[AMD]] introduced a subset of SSE3 in revision E (Venice and San Diego) of their [[Athlon 64]] CPUs. The earlier [[SIMD]] instruction sets on the [[x86]] platform, from oldest to newest, are [[MMX (instruction set)|MMX]], [[3DNow!]] (developed by AMD), [[Streaming SIMD Extensions|SSE]] and [[SSE2]]. |
|||
SSE3 contains 13 new instructions over [[SSE2]]. On [[UNIX-like]] systems, a CPU can be identified as having SSE3 by the presence of the flag "pni" in /proc/cpuinfo. |
|||
==Changes== |
==Changes== |
||
The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added. These instructions |
The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added.<ref name=":2">{{Cite web |last=Wright |first=Christopher |title=SSE3 Instruction Set |url=https://softpixel.com/~cwright/programming/simd/sse3.php |access-date=2023-04-10 |website=softpixel.com |language=en}}</ref> These instructions can be used to speed up the implementation of a number of [[Digital signal processing|DSP]] and [[3D computer graphics|3D]] operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly [[Instruction pipeline|pipeline]] stalls. Finally, the extension adds <code>LDDQU</code>, an alternative misaligned integer vector load that has better performance on [[NetBurst]] based platforms for loads that cross cacheline boundaries.<ref>{{Cite web |title=LDDQU — Load Unaligned Integer 128 Bits |url=https://www.felixcloutier.com/x86/lddqu |access-date=2023-04-10 |website=www.felixcloutier.com}}</ref> |
||
==CPUs with SSE3== |
==CPUs with SSE3== |
||
*[[AMD]]: |
*[[AMD]]: |
||
**[[Opteron]] (since Stepping E4<ref>{{Cite web |last=Wilson |first=Derek |title=AMD K8 E4 Stepping: SSE3 Performance |url=https://www.anandtech.com/show/1618 |access-date=2023-04-10 |website=www.anandtech.com}}</ref>) |
|||
**[[Sempron]] (since Palermo. Stepping E3) |
|||
**[[Athlon 64]] (since Venice Stepping E3 and San Diego Stepping E4) |
**[[Athlon 64]] (since Venice Stepping E3 and San Diego Stepping E4) |
||
**[[Athlon 64 X2]] |
|||
**[[Athlon 64|Athlon 64 FX]] (since San Diego Stepping E4) |
**[[Athlon 64|Athlon 64 FX]] (since San Diego Stepping E4) |
||
**[[Athlon 64 X2]] |
|||
**[[Opteron]] (since Stepping E4) |
|||
**[[Phenom 64 X2]] |
|||
**[[Sempron]] (since Palermo. Stepping E3) |
|||
**[[AMD Turion|Turion]] family |
|||
**[[Phenom]] |
|||
**[[ |
**[[AMD 10h|K10]] family |
||
**[[AMD Accelerated Processing Unit|APU]] family (including without GPU) |
|||
**[[Athlon II]] |
|||
**[[ |
**[[AMD FX|FX Series]] |
||
** [[Zen (microarchitecture)|Zen]] family |
|||
**[[Turion 64 X2]] |
|||
*[[Intel]]: |
*[[Intel]]: |
||
**[[Celeron D]] |
**[[Celeron D]] |
||
**[[Celeron]] |
**[[Celeron]] (starting with Core microarchitecture) |
||
**[[Celeron Dual Core]] |
|||
**[[Pentium 4]] (since Prescott) |
**[[Pentium 4]] (since Prescott) |
||
**[[Pentium D]] |
**[[Pentium D]] |
||
**[[Pentium Dual-Core]] |
|||
**[[Pentium Extreme Edition]] (but NOT Pentium 4 Extreme Edition) |
**[[Pentium Extreme Edition]] (but NOT Pentium 4 Extreme Edition) |
||
**[[ |
**[[Pentium Dual-Core]] |
||
**[[Pentium]] (starting with Core microarchitecture) |
|||
**[[Intel Core Solo]] |
|||
**[[Intel Core |
**[[Intel Core|Core]] |
||
**[[Xeon]] (since Nocona<ref>{{Cite web |date=2004-08-18 |title=Intel Xeon 3.4GHz ['Nocona' core] |url=https://hexus.net/business/reviews/enterprise/822-intel-xeon-34ghz-nocona-core/ |access-date=2023-04-10 |website=HEXUS}}</ref>) |
|||
**[[Intel Core 2 Duo]] |
|||
**[[Intel Core 2 Extreme]] |
|||
**[[Intel Core 2 Quad]] |
|||
**[[Xeon]] (since Nocona) |
|||
**[[Intel Atom|Atom]] |
**[[Intel Atom|Atom]] |
||
**[[Intel Core i3]] |
|||
**[[Intel Core i5]] |
|||
**[[Intel Core i7]] |
|||
*[[VIA Technologies|VIA]]/[[Centaur Technology|Centaur]]: |
*[[VIA Technologies|VIA]]/[[Centaur Technology|Centaur]]: |
||
**[[VIA C7|C7]] |
**[[VIA C7|C7]] |
||
**[[VIA Nano|Nano]] |
**[[VIA Nano|Nano]] |
||
*[[Transmeta]] |
*[[Transmeta Efficeon]] TM88xx (NOT Model Numbers TM86xx) |
||
**[[Efficeon]] TM88xx (NOT Model Numbers TM86xx) |
|||
==New instructions== |
==New instructions== |
||
===Common instructions=== |
===Common instructions=== |
||
|
====Arithmetic==== |
||
;<code>ADDSUBPD</code> |
|||
* ADDSUBPD — (''Add-Subtract-Packed-Double'') |
|||
:''Add-Subtract-Packed-Double''<ref name=":0">{{Cite web |title=SSE3 Instructions - x86 Assembly Language Reference Manual |url=https://docs.oracle.com/cd/E53394_01/html/E54851/gntby.html |access-date=2023-04-10 |website=docs.oracle.com}}</ref> |
|||
** Input: { A0, A1 }, { B0, B1 } |
|||
* |
:*Input: { A0, A1 }, { B0, B1 } |
||
:*Output: { A0 − B0, A1 + B1 } |
|||
* ADDSUBPS — (''Add-Subtract-Packed-Single'') |
|||
;<code>ADDSUBPS</code> |
|||
** Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } |
|||
:''Add-Subtract-Packed-Single''<ref name=":0" /> |
|||
** Output: { A0 − B0, A1 + B1, A2 − B2, A3 + B3 } |
|||
:* Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } |
|||
'''AOS ( Array Of Structures )''' |
|||
:* Output: { A0 − B0, A1 + B1, A2 − B2, A3 + B3 } |
|||
* HADDPD — (''Horizontal-Add-Packed-Double'') |
|||
** Input: { A0, A1 }, { B0, B1 } |
|||
** Output: { A0 + A1, B0 + B1 } |
|||
* HADDPS (''Horizontal-Add-Packed-Single'') |
|||
** Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } |
|||
** Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 } |
|||
* HSUBPD — (''Horizontal-Subtract-Packed-Double'') |
|||
** Input: { A0, A1 }, { B0, B1 } |
|||
** Output: { A0 − A1, B0 − B1 } |
|||
* HSUBPS — (''Horizontal-Subtract-Packed-Single'') |
|||
** Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } |
|||
** Output: { A0 − A1, A2 − A3, B0 − B1, B2 − B3 } |
|||
* LDDQU — As stated above, this is an alternative misaligned integer vector load. It can be helpful for video compression tasks. |
|||
* MOVDDUP, MOVSHDUP, MOVSLDUP — These are also used for complex numbers, and can be helpful for wave calculation like sound. |
|||
* FISTTP — Like the older x87 FISTP instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead. Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard. |
|||
====AOS ( Array Of Structures )==== |
|||
===Intel instructions=== |
|||
;<code>HADDPD</code> |
|||
* MONITOR, MWAIT - These optimize multi-threaded applications, giving processors with [[Hyper-Threading]] better performance. |
|||
:''Horizontal-Add-Packed-Double''<ref name=":0" /> |
|||
:* Input: { A0, A1 }, { B0, B1 } |
|||
:* Output: { A0 + A1, B0 + B1 } |
|||
;<code>HADDPS</code> |
|||
:''Horizontal-Add-Packed-Single''<ref name=":0" /> |
|||
:* Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } |
|||
:* Output: { A0 + A1, A2 + A3, B0 + B1, B2 + B3 } |
|||
;<code>HSUBPD</code> |
|||
:''Horizontal-Subtract-Packed-Double''<ref name=":0" /> |
|||
:* Input: { A0, A1 }, { B0, B1 } |
|||
:* Output: { A0 − A1, B0 − B1 } |
|||
;<code>HSUBPS</code> |
|||
:''Horizontal-Subtract-Packed-Single''<ref name=":0" /> |
|||
:* Input: { A0, A1, A2, A3 }, { B0, B1, B2, B3 } |
|||
:* Output: { A0 − A1, A2 − A3, B0 − B1, B2 − B3 } |
|||
;<code>LDDQU</code> |
|||
:As stated above, this is an alternative misaligned integer vector load.<ref name=":0" /> It can be helpful for video compression tasks. |
|||
;<code>[[MOVDDUP]]</code>, <code>MOVSHDUP</code>, <code>MOVSLDUP</code><ref name=":2" /> |
|||
:These are useful for complex numbers and wave calculation like sound. |
|||
;<code>FISTTP</code> |
|||
:Like the older x87 <code>FISTP</code> instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead.<ref name=":2" /> Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard. |
|||
===Other instructions=== |
|||
;<code>MONITOR</code>, <code>MWAIT</code> |
|||
:The <code>MONITOR</code> instruction is used to specify a memory address for monitoring, while the <code>MWAIT</code> instruction puts the processor into a low-power state and waits for a write event to the monitored address.<ref name=":2" /> |
|||
==References== |
|||
{{reflist}} |
|||
==External links== |
==External links== |
||
*[http://www.xbitlabs.com/articles/cpu/display/prescott_10.html X-bit Labs] |
*[https://web.archive.org/web/20060531094837/http://www.xbitlabs.com/articles/cpu/display/prescott_10.html X-bit Labs] |
||
{{Multimedia extensions}} |
{{Multimedia extensions}} |
||
{{DEFAULTSORT:Sse3}} |
{{DEFAULTSORT:Sse3}} |
||
[[Category:Parallel computing]] |
|||
[[Category:X86 instructions]] |
[[Category:X86 instructions]] |
||
[[Category:SIMD computing]] |
[[Category:SIMD computing]] |
||
[[de:Streaming SIMD Extensions 3]] |
|||
[[es:SSE3]] |
|||
[[fr:SSE3]] |
|||
[[ko:SSE3]] |
|||
[[it:SSE3]] |
|||
[[nl:SSE3]] |
|||
[[no:SSE3]] |
|||
[[pl:SSE3]] |
|||
[[pt:SSE3]] |
|||
[[ru:SSE3]] |
|||
[[uk:SSE3]] |
|||
[[zh:SSE3]] |
SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI),[1] is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU.[1] In April 2005, AMD introduced a subset of SSE3 in revision E (Venice and San Diego) of their Athlon 64 CPUs.[2] The earlier SIMD instruction sets on the x86 platform, from oldest to newest, are MMX, 3DNow! (developed by AMD, no longer supported on newer CPUs), SSE, and SSE2.
SSE3 contains 13 new instructions over SSE2.[3]
The most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. More specifically, instructions to add and subtract the multiple values stored within a single register have been added.[4] These instructions can be used to speed up the implementation of a number of DSP and 3D operations. There is also a new instruction to convert floating point values to integers without having to change the global rounding mode, thus avoiding costly pipeline stalls. Finally, the extension adds LDDQU
, an alternative misaligned integer vector load that has better performance on NetBurst based platforms for loads that cross cacheline boundaries.[5]
ADDSUBPD
ADDSUBPS
HADDPD
HADDPS
HSUBPD
HSUBPS
LDDQU
MOVDDUP
, MOVSHDUP
, MOVSLDUP
[4]FISTTP
FISTP
instruction, but ignores the floating point control register's rounding mode settings and uses the "chop" (truncate) mode instead.[4] Allows omission of the expensive loading and re-loading of the control register in languages such as C where float-to-int conversion requires truncate behaviour by standard.MONITOR
, MWAIT
MONITOR
instruction is used to specify a memory address for monitoring, while the MWAIT
instruction puts the processor into a low-power state and waits for a write event to the monitored address.[4]
| |
---|---|
SIMD (RISC) |
|
SIMD (x86) |
|
Bit manipulation |
|
Compressed instructions |
|
Security and cryptography |
|
Transactional memory |
|
Virtualization |
|
Suspended extensions' dates are |