Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Architecture  





2 See also  





3 References  














Chinchilla (language model)






Español
فارسی
Русский
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

(Redirected from Chinchilla AI)

Chinchilla is a family of large language models developed by the research team at DeepMind, presented in March 2022.[1] It is named "chinchilla" because it is a further development over a previous model family named Gopher. Both model families were trained in order to investigate the scaling laws of large language models.[2]

It claimed to outperform GPT-3. It considerably simplifies downstream utilization because it requires much less computer power for inference and fine-tuning. Based on the training of previously employed language models, it has been determined that if one doubles the model size, one must also have twice the number of training tokens. This hypothesis has been used to train Chinchilla by DeepMind. Similar to Gopher in terms of cost, Chinchilla has 70B parameters and four times as much data.[3]

Chinchilla has an average accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla was still in the testing phase as of January 12, 2023.[4]

Chinchilla contributes to developing an effective training paradigm for large autoregressive language models with limited compute resources. The Chinchilla team recommends that the number of training tokens is twice for every model size doubling, meaning that using larger, higher-quality training datasets can lead to better results on downstream tasks.[5][6]

Architecture[edit]

Both the Gopher family and Chinchilla family are families of transformer models.

In particular, they are essentially the same as GPT-2, with different sizes and minor modifications. Gopher family uses RMSNorm instead of LayerNorm; relative positional encoding rather than absolute positional encoding. The Chinchilla family is the same as the Gopher family, but trained with AdamW instead of Adam optimizer.

The Gopher family contains six models of increasing size, from 44 million parameters to 280 billion parameters. They refer to the largest one as "Gopher" by default. Similar naming conventions apply for the Chinchilla family.

Table 1 of [2] shows the entire Gopher family:

Model Specifications for Gopher family
Parameter count Layers Number of heads Key/Value size Internal dimension Max learning rate Batch size
44M 8 16 32 512 6 × 10−4 0.25M
117M 12 12 64 768 6 × 10−4 0.25M
417M 12 12 128 1,536 2 × 10−4 0.25M
1.4B 24 16 128 2,048 2 × 10−4 0.25M
7.1B 32 32 128 4,096 1.2 × 10−4 2M
Gopher 280B 80 128 128 16,384 4 × 10−5 3M → 6M

Table 4 of [1] compares the 70-billion-parameter Chinchilla with Gopher 280B.

Comparison between Chinchilla and Gopher
Parameter count Layers Number of heads Key/Value size Internal dimension Max learning rate Batch size
Gopher 280B 80 128 128 16,384 4 × 10−5 3M → 6M
Chinchilla 70B 80 64 128 8,192 1 × 10−4 1.5M → 3M

See also[edit]

References[edit]

  1. ^ a b Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan (2022-03-29). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
  • ^ a b Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard (2022-01-21). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv:2112.11446 [cs.CL].
  • ^ Eliaçık, Eray (January 12, 2023). "Chinchilla AI is coming for the GPT-3's throne". Dataconomy. Archived from the original on March 26, 2023.
  • ^ Hendrycks, Dan (2023-03-14), Measuring Massive Multitask Language Understanding, archived from the original on 2023-03-15, retrieved 2023-03-15
  • ^ Chaithali, G. (April 9, 2022). "Check Out This DeepMind's New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks". Archived from the original on March 27, 2023. Retrieved January 15, 2023.
  • ^ Wali, Kartik (April 12, 2022). "DeepMind launches GPT-3 rival, Chinchilla". Analytics India Magazine. Archived from the original on March 26, 2023. Retrieved January 15, 2023.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Chinchilla_(language_model)&oldid=1228635435"

    Categories: 
    Chatbots
    Google DeepMind
    Large language models
    Hidden categories: 
    Articles with short description
    Short description matches Wikidata
     



    This page was last edited on 12 June 2024, at 09:36 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki