Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Instructions  





2 FMA3 instruction set  



2.1  CPUs with FMA3  





2.2  Excerpt from FMA3  







3 FMA4 instruction set  



3.1  CPUs with FMA4  





3.2  Excerpt from FMA4  







4 History  





5 Compiler and assembler support  





6 References  














FMA instruction set






Català
Deutsch
Français

Italiano
Русский
Српски / srpski

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations.[1] There are two variants:

Instructions[edit]

FMA3 and FMA4 instructions have almost identical functionality, but are not compatible. Both contain fused multiply–add (FMA) instructions for floating-point scalar and SIMD operations, but FMA3 instructions have three operands, while FMA4 ones have four. The FMA operation has the form d = round(a · b + c), where the round function performs a rounding to allow the result to fit within the destination register if there are too many significant bits to fit within the destination.

The four-operand form (FMA4) allows a, b, c and d to be four different registers, while the three-operand form (FMA3) requires that d be the same register as a, borc. The three-operand form makes the code shorter and the hardware implementation slightly simpler, while the four-operand form provides more programming flexibility.

See XOP instruction set for more discussion of compatibility issues between Intel and AMD.

FMA3 instruction set[edit]

CPUs with FMA3[edit]

Excerpt from FMA3[edit]

Supported commands include

Mnemonic Operation Mnemonic Operation
VFMADD result = + a · b + c VFMADDSUB result = a · b + c  for  i = 1, 3, ...
result = a · b − c  for  i = 0, 2, ...
VFNMADD result = − a · b + c
VFMSUB result = + a · b − c VFMSUBADD result = a · b − c  for  i = 1, 3, ...
result = a · b + c  for  i = 0, 2, ...
VFNMSUB result = − a · b − c
Note

Explicit order of operands is included in the mnemonic using numbers "132", "213", and "231":

Postfix
1
Operation possible
memory operand
overwrites
132 a = a · c + b c (factor) a (other factor)
213 a = b · a + c c (summand) a (factor)
231 a = b · c + a c (factor) a (summand)

as well as operand format (packed or scalar) and size (single or double).

Postfix
2
precision size Postfix
2
precision size
SS Single 00× 32 bit SD Double 64 bit
PSx 04× 32 bit PDx 2× 64 bit
PSy 08× 32 bit PDy 4× 64 bit
PSz 16× 32 bit PDz 8× 64 bit

This results in

Encoding Mnemonic Operands Operation
VEX.256.66.0F38.W1 98 /r VFMADD132PDy ymm, ymm, ymm/m256 a = a · c + b
VEX.256.66.0F38.W0 98 /r VFMADD132PSy
VEX.128.66.0F38.W1 98 /r VFMADD132PDx xmm, xmm, xmm/m128
VEX.128.66.0F38.W0 98 /r VFMADD132PSx
VEX.LIG.66.0F38.W1 99 /r VFMADD132SD xmm, xmm, xmm/m64
VEX.LIG.66.0F38.W0 99 /r VFMADD132SS xmm, xmm, xmm/m32
VEX.256.66.0F38.W1 A8 /r VFMADD213PDy ymm, ymm, ymm/m256 a = b · a + c
VEX.256.66.0F38.W0 A8 /r VFMADD213PSy
VEX.128.66.0F38.W1 A8 /r VFMADD213PDx xmm, xmm, xmm/m128
VEX.128.66.0F38.W0 A8 /r VFMADD213PSx
VEX.LIG.66.0F38.W1 A9 /r VFMADD213SD xmm, xmm, xmm/m64
VEX.LIG.66.0F38.W0 A9 /r VFMADD213SS xmm, xmm, xmm/m32
VEX.256.66.0F38.W1 B8 /r VFMADD231PDy ymm, ymm, ymm/m256 a = b · c + a
VEX.256.66.0F38.W0 B8 /r VFMADD231PSy
VEX.128.66.0F38.W1 B8 /r VFMADD231PDx xmm, xmm, xmm/m128
VEX.128.66.0F38.W0 B8 /r VFMADD231PSx
VEX.LIG.66.0F38.W1 B9 /r VFMADD231SD xmm, xmm, xmm/m64
VEX.LIG.66.0F38.W0 B9 /r VFMADD231SS xmm, xmm, xmm/m32

FMA4 instruction set[edit]

CPUs with FMA4[edit]

Excerpt from FMA4[edit]

Mnemonic (AT&T) Operands Operation
VFMADDPDx xmm, xmm, xmm/m128, xmm/m128 a = b·c + d
VFMADDPDy ymm, ymm, ymm/m256, ymm/m256
VFMADDPSx xmm, xmm, xmm/m128, xmm/m128
VFMADDPSy ymm, ymm, ymm/m256, ymm/m256
VFMADDSD xmm, xmm, xmm/m64, xmm/m64
VFMADDSS xmm, xmm, xmm/m32, xmm/m32

History[edit]

The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:

Compiler and assembler support[edit]

Different compilers provide different levels of support for FMA:

References[edit]

  1. ^ "FMA3 and FMA4 are not instruction sets, they are individual instructions -- fused multiply add. They could be quite useful depending on how Intel and AMD implement them" Woltmann, George (Prime95). "Intel AVX and GIMPS". mersenneforum.org/index.php. Great Internet Mersenne Prime Search (GIMPS) project. Retrieved 27 July 2011.{{cite web}}: CS1 maint: numeric names: authors list (link)
  • ^ a b "The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers" (PDF). Retrieved 2017-05-02.
  • ^ Maffeo, Robin (March 1, 2012). "AMD and the Visual Studio 11 Beta". AMD. Archived from the original on November 9, 2013. Retrieved 2018-11-07.
  • ^ "CPU-Z - ID : y5z6gq". Retrieved 2022-05-01.
  • ^ "CPU-Z - ID : kr2mlx". Retrieved 2022-05-01.
  • ^ "AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions" (PDF). AMD. May 1, 2009.
  • ^ "New "Bulldozer" and "Piledriver" Instructions A step forward for high performance software development" (PDF). AMD. October 2012.
  • ^ "Agner's CPU blog - Test results for AMD Ryzen". 2017-05-02.
  • ^ a b "Discussion – Ryzen has undocumented support for FMA4". Retrieved 2017-05-10.
  • ^ "www.amd.com, FMA4 support model list".
  • ^ "www.amd.com, FMA4 support model list".
  • ^ "www.amd.com, FMA4 support model list".
  • ^ "128-Bit SSE5 Instruction Set". AMD Developer Central. Archived from the original on 2008-01-15. Retrieved 2008-01-28.
  • ^ "Intel Advanced Vector Extensions Programming Reference" (PDF). Intel. Retrieved 2008-04-05.[permanent dead link]
  • ^ "Intel Advanced Vector Extensions Programming Reference". Intel. Retrieved 2009-05-06.
  • ^ "Striking a balance". Dave Christie, AMD Developer blogs. May 6, 2009. Archived from the original on July 8, 2012. Retrieved 2018-11-07.
  • ^ a b "New Bulldozer and Piledriver Instructions" (PDF). AMD. Retrieved 25 July 2013.
  • ^ "Software Optimization Guide for AMD Family 15h Processors" (PDF). AMD. Retrieved 19 April 2012.
  • ^ "Intel Architecture Instruction Set Extensions Programming Reference" (PDF). Intel. Retrieved 25 July 2013.
  • ^ Gopalasubramanian, Ganesh (2015-03-10). "[PATCH] add znver1 processor". Retrieved 2022-05-01.
  • ^ Pawar, Amit (2015-08-07). "[PATCH] Remove CpuFMA4 from Znver1 CPU Flags". Retrieved 2022-05-01.
  • ^ "Stack Overflow comment by Mysticial". 2019-07-16. Archived from the original on 2019-08-22. Retrieved 2023-09-01.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  • ^ "AMD Ryzen Machine Crashes to a Sequence of FMA3 Instructions". 16 March 2017. Retrieved 2017-09-10.
  • ^ "Stack Overflow comment by Mysticial". 2019-07-16. Retrieved 2023-09-01.
  • ^ a b Latif, Lawrence (Nov 14, 2011). "AMD Bulldozer only FMA4 and XOP instructions are supported by GCC Intel still mute". The Inquirer. Archived from the original on November 17, 2011.{{cite web}}: CS1 maint: unfit URL (link)
  • ^ "FMA4 Intrinsics Added for Visual Studio 2010 SP1". 4 February 2013.
  • ^ "EKOPath man doc". Archived from the original on 2016-06-23. Retrieved 2013-07-24.
  • ^ "LLVM 3.1 Release Notes".
  • ^ "Enable detection of AVX and AVX2 support through CPUID". LLVM. 2012-04-26.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=FMA_instruction_set&oldid=1215967821"

    Categories: 
    X86 instructions
    SIMD computing
    AMD technologies
    Hidden categories: 
    CS1 maint: numeric names: authors list
    All articles with dead external links
    Articles with dead external links from September 2017
    Articles with permanently dead external links
    CS1 maint: bot: original URL status unknown
    CS1 maint: unfit URL
    Articles with short description
    Short description matches Wikidata
     



    This page was last edited on 28 March 2024, at 06:43 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki