Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Triggering  





2 Pre-processing instructions  





3 Exiting  





4 Register file checkpoint options  





5 Optimizations  





6 Side effects  





7 See also  





8 References  














Runahead






Türkçe
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Runahead is a technique that allows a computer processortospeculatively pre-process instructions during cache miss cycles. The pre-processed instructions are used to generate instruction and data stream prefetches by executing instructions leading to cache misses (typically called long latency loads) before they would normally occur, effectively hiding memory latency. In runahead, the processor uses the idle execution resources to calculate instruction and data stream addresses using the available information that is independent of a cache miss. Once the processor has resolved the initial cache miss, all runahead results are discarded, and the processor resumes execution as normal. The primary use case of the technique is to mitigate the effects of the memory wall. The technique may also be used for other purposes, such as pre-computing branch outcomes to achieve highly accurate branch prediction.[1]

The principal hardware cost is a means of checkpointing the register file state. Typically, runahead processors will also contain a small additional cache, which allows runahead store operations to execute without modifying actual memory. Certain implementations also use dedicated hardware acceleration units to execute specific slices of pre-processed instructions.[2][3]

Runahead was initially investigated in the context of an in-order microprocessor;[4] however, this technique has been extended for use with out-of-order microprocessors.[5]

Triggering[edit]

In principle, any event can trigger runahead, though typically the entry condition is a last level data cache miss that makes it to the head of the re-order buffer.[5] In a normal out-of-order processor, such long latency load instructions block retirement of all younger instructions until the miss is serviced and the load is retired.

When a processor enters runahead mode, it checkpoints all architectural registers and records the address of the load instruction that caused entry into runahead. All instructions in the pipeline are then marked as runahead. Because the value returned from a cache miss cannot be known ahead of time, it is possible for pre-processed instructions to be dependent upon unknown or invalid data. Registers containing such data, or data dependent on it, are denoted by adding an "invalid" or INV bit to every register in the register file. Instructions that use or write such invalid data are also marked with an INV bit. If the instruction that initiated runahead was a load, it is issued a bogus result and marked as INV, allowing it to mark its destination register as INV and drain out of the pipeline.

Pre-processing instructions[edit]

In runahead mode, the processor continues to execute instructions after the instruction that initiated runahead. However, runahead is considered a speculative state in which the processor only attempts to generate additional data and instruction cache misses which are effectively prefetches. The designer can opt to allow runahead to skip instructions that are not present in the instruction cache with the understanding that the quality of any prefetches generated will be reduced since the effect of the missing instructions is unknown.

Registers that are the target of an instruction that has one or more source registers marked INV are marked INV. This allows the processor to know which register values can (probably) be trusted during runahead mode. Branch instructions that cannot be resolved due to INV source registers are simply assumed to have been predicted correctly. In case the branch was mispredicted, the processor continues executing wrong-path instructions until it reaches a branch independent point, potentially executing wrong-path loads that pollute cache with useless data entries. Valid branch instruction outcomes can be saved for later use as highly accurate predictions during normal operation.

Since runahead is a speculative state, store instructions cannot be allowed to modify memory. In order to communicate store results to dependent loads, a very small cache only accessed by runahead loads and misses, called a runahead cache, can be used.[5] This cache is functionally similar to a normal cache, but contains INV bits to track which data is invalid. INV stores set the INV bit of their corresponding target cache line, while valid stores reset the INV bit of the cache line. Any runahead load instruction must check both real and runahead cache. If the load hits in runahead cache, it will discard the real cache result and use the runahead cache data, potentially becoming invalid if the cache line was marked with a INV bit. Because the runahead cache is separate from the memory hierarchy, there is no place to evict old data to. Therefore, in case of a cache conflict, the old data is simply dropped from the cache. Note that because of the limited size of the runahead cache, it is not possible to perfectly track INV data during runahead mode (as INV data may be overwritten by valid data in a cache conflict). In practice, this is not crucial since all results computed during runahead mode are discarded.

Exiting[edit]

As with entering runahead, any event can in principle be cause for exiting runahead. Though in the case of a runahead period initiated by a cache miss, it is typically exited once the cache miss has been serviced.

When the processor exits runahead, all instructions younger than and including the instruction that initiated runahead are squashed and drained out of the pipeline. The architectural register file is then restored from the checkpoint. A predetermined register aliasing table (RAT) is then copied into both the front- and backend RAT. Finally, the processor is redirected to the address of the instruction that initiated runahead. The processor then resumes execution in normal mode.

Register file checkpoint options[edit]

The simplest method of checkpointing the architectural register file (ARF) is to simply perform a full copy of the entire physical register file (PRF) (because the PRF is a superset of the ARF) to a checkpoint register file (CRF) when the processor enters runahead mode. When runahead is exited, the processor can then perform a full copy from the CRF to the PRF. However, there are more efficient options available.

One way to eliminate the copy operations is to write to both the PRF and CRF during normal operation, but only to the PRF in runahead mode. This approach can eliminate the checkpointing overhead that would otherwise be incurred on initiating runahead if the CRF and PRF are written to in parallel, but still requires the processor to restore the PRF when runahead is exited.

Because the only registers that need to be checkpointed are the architectural registers, the CRF only needs to contain as many registers as there are architectural registers, as defined by the instruction set architecture. Since processors typically contain far more physical registers than architectural registers, this significantly shrinks the size of the CRF.

An even more aggressive approach is to rely only upon the operand forwarding paths of the microarchitecture to provide modified values during runahead mode.[citation needed] The register file is then "checkpointed" by disabling writes to the register file during runahead.

Optimizations[edit]

While runahead is intended to increase processor performance, pre-processing instructions when the processor would otherwise have been idle decreases the processor's energy efficiency due to an increase in dynamic power draw. Additionally, entering and exiting runahead incurs a performance overhead, as register checkpointing and particularly flushing the pipeline may take many cycles to complete. Therefore, it is not wise to initiate runahead at every opportunity.

Some optimizations that improve the energy efficiency of runahead are:

Side effects[edit]

Runahead has been found to improve soft error rates in processors as a side effect. While a processor is waiting for a cache miss, the entire state of the processor is vulnerable to soft errors while the cache miss is outstanding. By continuing execution, runahead unintentionally reduces the amount of time the processor state is vulnerable to soft errors, thereby reducing soft error rates.[10]

See also[edit]

References[edit]

  1. ^ Pruett, Stephen; Patt, Yale (October 2021). "Branch Runahead: An Alternative to Branch Prediction for Impossible to Predict Branches". MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO '21. New York, NY, USA: Association for Computing Machinery. pp. 804–815. doi:10.1145/3466752.3480053. ISBN 978-1-4503-8557-2. S2CID 239011545.
  • ^ Hashemi, Milad; Mutlu, Onur; Patt, Yale N. (October 2016). "Continuous runahead: Transparent hardware acceleration for memory intensive workloads". 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). pp. 1–12. doi:10.1109/MICRO.2016.7783764. ISBN 978-1-5090-3508-3. S2CID 439575.
  • ^ Pruett, Stephen; Patt, Yale (October 2021). "Branch Runahead: An Alternative to Branch Prediction for Impossible to Predict Branches". MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO '21. New York, NY, USA: Association for Computing Machinery. pp. 804–815. doi:10.1145/3466752.3480053. ISBN 978-1-4503-8557-2. S2CID 239011545.
  • ^ Dundas, James D. and Mudge, Trevor N. (September 1996). "Using stall cycles to improve microprocessor performance". Technical report. Department of Electrical Engineering and Computer Science, University of Michigan.
  • ^ a b c Mutlu, O.; Stark, J.; Wilkerson, C.; Patt, Y.N. (February 2003). "Runahead execution: An alternative to very large instruction windows for out-of-order processors". The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. pp. 129–140. doi:10.1109/HPCA.2003.1183532. ISBN 0-7695-1871-0. S2CID 9016814.
  • ^ Van Craeynest, Kenzo; Eyerman, Stijn; Eeckhout, Lieven (2009), Seznec, André; Emer, Joel; O’Boyle, Michael; Martonosi, Margaret (eds.), "MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor", High Performance Embedded Architectures and Compilers, vol. 5409, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 110–124, doi:10.1007/978-3-540-92990-1_10, ISBN 978-3-540-92989-5, retrieved 2023-06-02
  • ^ Van Craeynest, Kenzo; Eyerman, Stijn; Eeckhout, Lieven (2009). "MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor". In Seznec, André; Emer, Joel; O’Boyle, Michael; Martonosi, Margaret; Ungerer, Theo (eds.). High Performance Embedded Architectures and Compilers. Lecture Notes in Computer Science. Vol. 5409. Berlin, Heidelberg: Springer. pp. 110–124. doi:10.1007/978-3-540-92990-1_10. ISBN 978-3-540-92990-1.
  • ^ Hashemi, Milad; Patt, Yale N. (2015-12-05). "Filtered runahead execution with a runahead buffer". Proceedings of the 48th International Symposium on Microarchitecture. MICRO-48. New York, NY, USA: Association for Computing Machinery. pp. 358–369. doi:10.1145/2830772.2830812. ISBN 978-1-4503-4034-2. S2CID 2897777.
  • ^ a b Naithani, Ajeya; Feliu, Josué; Adileh, Almutaz; Eeckhout, Lieven (February 2020). "Precise Runahead Execution". 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). pp. 397–410. doi:10.1109/HPCA47549.2020.00040. hdl:1854/LU-8668193. ISBN 978-1-7281-6149-5. S2CID 215817567.
  • ^ Naithani, Ajeya; Eeckhout, Lieven (April 2022). "Reliability-Aware Runahead". 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE. pp. 772–785. doi:10.1109/HPCA53966.2022.00062. ISBN 978-1-6654-2027-3. S2CID 248865294.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Runahead&oldid=1230387253"

    Categories: 
    Instruction processing
    Speculative execution
    Hidden categories: 
    Articles with short description
    Short description matches Wikidata
    Articles needing additional references from July 2023
    All articles needing additional references
    All articles with unsourced statements
    Articles with unsourced statements from July 2023
     



    This page was last edited on 22 June 2024, at 12:10 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki