Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 History  



1.1  Derivation from AlphaZero  





1.2  Comparison with R2D2  







2 Training and results  



2.1  Initial results  







3 Reactions and related work  





4 See also  





5 References  





6 External links  














MuZero






Français

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules.[1][2][3] Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atari games. The algorithm uses an approach similar to AlphaZero. It matched AlphaZero's performance in chess and shogi, improved on its performance in Go (setting a new world record), and improved on the state of the art in mastering a suite of 57 Atari games (the Arcade Learning Environment), a visually-complex domain.

MuZero was trained via self-play, with no access to rules, opening books, or endgame tablebases. The trained algorithm used the same convolutional and residual architecture as AlphaZero, but with 20 percent fewer computation steps per node in the search tree.[4]

History[edit]

MuZero really is discovering for itself how to build a model and understand it just from first principles.

— David Silver, DeepMind, Wired[5]

On November 19, 2019, the DeepMind team released a preprint introducing MuZero.

Derivation from AlphaZero[edit]

MuZero (MZ) is a combination of the high-performance planning of the AlphaZero (AZ) algorithm with approaches to model-free reinforcement learning. The combination allows for more efficient training in classical planning regimes, such as Go, while also handling domains with much more complex inputs at each stage, such as visual video games.

MuZero was derived directly from AZ code, sharing its rules for setting hyperparameters. Differences between the approaches include:[6]

Comparison with R2D2[edit]

The previous state of the art technique for learning to play the suite of Atari games was R2D2, the Recurrent Replay Distributed DQN.[7]

MuZero surpassed both R2D2's mean and median performance across the suite of games, though it did not do better in every game.

Training and results[edit]

MuZero used 16 third-generation tensor processing units (TPUs) for training, and 1000 TPUs for selfplay for board games, with 800 simulations per step and 8 TPUs for training and 32 TPUs for selfplay for Atari games, with 50 simulations per step.

AlphaZero used 64 second-generation TPUs for training, and 5000 first-generation TPUs for selfplay. As TPU design has improved (third-generation chips are 2x as powerful individually as second-generation chips, with further advances in bandwidth and networking across chips in a pod), these are comparable training setups.

R2D2 was trained for 5 days through 2M training steps.

Initial results[edit]

MuZero matched AlphaZero's performance in chess and Shogi after roughly 1 million training steps. It matched AZ's performance in Go after 500,000 training steps and surpassed it by 1 million steps. It matched R2D2's mean and median performance across the Atari game suite after 500 thousand training steps and surpassed it by 1 million steps, though it never performed well on 6 games in the suite.

Reactions and related work[edit]

MuZero was viewed as a significant advancement over AlphaZero, and a generalizable step forward in unsupervised learning techniques.[8][9] The work was seen as advancing understanding of how to compose systems from smaller components, a systems-level development more than a pure machine-learning development.[10]

While only pseudocode was released by the development team, Werner Duvaud produced an open source implementation based on that.[11]

MuZero has been used as a reference implementation in other work, for instance as a way to generate model-based behavior.[12]

In late 2021, a more efficient variant of MuZero was proposed, named EfficientZero. It "achieves 194.3 percent mean human performance and 109.0 percent median performance on the Atari 100k benchmark with only two hours of real-time game experience".[13]

In early 2022, a variant of MuZero was proposed to play stochastic games (for example 2048, backgammon), called Stochastic MuZero, which uses afterstate dynamics and chance codes to account for the stochastic nature of the environment when training the dynamics network.[14]

See also[edit]

References[edit]

  1. ^ Wiggers, Kyle (20 November 2019). "DeepMind's MuZero teaches itself how to win at Atari, chess, shogi, and Go". VentureBeat. Retrieved 22 July 2020.
  • ^ Friedel, Frederic. "MuZero figures out chess, rules and all". ChessBase GmbH. Retrieved 22 July 2020.
  • ^ Rodriguez, Jesus. "DeepMind Unveils MuZero, a New Agent that Mastered Chess, Shogi, Atari and Go Without Knowing the Rules". KDnuggets. Retrieved 22 July 2020.
  • ^ Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy (2020). "Mastering Atari, Go, chess and shogi by planning with a learned model". Nature. 588 (7839): 604–609. arXiv:1911.08265. Bibcode:2020Natur.588..604S. doi:10.1038/s41586-020-03051-4. PMID 33361790. S2CID 208158225.
  • ^ "What AlphaGo Can Teach Us About How People Learn". Wired. ISSN 1059-1028. Retrieved 2020-12-25.
  • ^ Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].
  • ^ Kapturowski, Steven; Ostrovski, Georg; Quan, John; Munos, Remi; Dabney, Will. RECURRENT EXPERIENCE REPLAY IN DISTRIBUTED REINFORCEMENT LEARNING. ICLR 2019 – via Open Review.
  • ^ Shah, Rohin. "[AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee - LessWrong 2.0". www.lesswrong.com. Retrieved 2020-06-07.
  • ^ Wu, Jun. "Reinforcement Learning, Deep Learning's Partner". Forbes. Retrieved 2020-07-15.
  • ^ "Machine Learning & Robotics: My (biased) 2019 State of the Field". cachestocaches.com. Retrieved 2020-07-15.
  • ^ Duvaud, Werner (2020-07-15), werner-duvaud/muzero-general, retrieved 2020-07-15
  • ^ van Seijen, Harm; Nekoei, Hadi; Racah, Evan; Chandar, Sarath (2020-07-06). "The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning". arXiv:2007.03158 [cs.stat].
  • ^ Ye, Weirui; Liu, Shaohuai; Kurutach, Thanard; Abbeel, Pieter; Gao, Yang (2021-12-11). "Mastering Atari Games with Limited Data". arXiv:2111.00210 [cs.LG].
  • ^ Antonoglou, Ioannis; Schrittwieser, Julian; Ozair, Serjil; Hubert, Thomas; Silver, David (2022-01-28). "Planning in Stochastic Environments with a Learned Model". Retrieved 2023-12-12.
  • External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=MuZero&oldid=1230619926"

    Categories: 
    2019 software
    AlphaGo
    Applied machine learning
    Hidden categories: 
    Wikipedia articles in need of updating from May 2022
    All Wikipedia articles in need of updating
    Articles with short description
    Short description matches Wikidata
     



    This page was last edited on 23 June 2024, at 19:07 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki