128 captures
30 Aug 2021 - 08 Mar 2026
Jun JUL Aug
20
2022 2023 2024
success
fail

About this capture

COLLECTED BY

Organization: Archive Team

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.

The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.

This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.

Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.

The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

Collection: Archive Team: URLs

TIMESTAMPS

The Wayback Machine - http://web.archive.org/web/20230720005358/https://arxiv.org/abs/2108.12409
 
Skip to main content  

Cornell University We are hiring
 
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate  



arxiv logo> cs> arXiv:2108.12409  






arXiv logo
Cornell University Logo
 

 

quick links



Login

Help Pages

About
 







Computer Science > Computation and Language

 

arXiv:2108.12409 (cs)  



[Submitted on 27 Aug 2021 (v1), last revised 22 Apr 2022 (this version, v2)]

Title:Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation


Authors:Ofir Press, Noah A. Smith, Mike Lewis

Download a PDF of the paper titled Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, by Ofir Press and 2 other authors
  Download PDF  
Abstract: Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training? We first show that extrapolation can be enabled by simply changing the position representation method, though we find that current methods do not allow for efficient extrapolation. We therefore introduce a simpler and more efficient position method, Attention with Linear Biases (ALiBi). ALiBi does not add positional embeddings to word embeddings; instead, it biases query-key attention scores with a penalty that is proportional to their distance. We show that this method trains a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048 but training 11% faster and using 11% less memory. ALiBi's inductive bias towards recency also leads it to outperform multiple strong position methods on the WikiText-103 benchmark.  

Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2108.12409 [cs.CL]
  (or arXiv:2108.12409v2 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2108.12409




Submission history

 From: Ofir Press [view email]  
[v1]   Fri, 27 Aug 2021 17:35:06 UTC (187 KB)
  [v2]   Fri, 22 Apr 2022 18:20:48 UTC (221 KB)
 



Full-text links:  

Download:



Download a PDF of the paper titled Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, by Ofir Press and 2 other authors
 PDF

Other formats

(license)
 

Current browse context: 
cs.CL

< prev     |     next >  

new  |  recent  |  2108  
Change to browse by:  
cs




References & Citations



NASA ADS
Google Scholar

Semantic Scholar
 



DBLP - CS Bibliography


listing | bibtex  

Ofir Press
Noah A. Smith
Mike Lewis  

export BibTeX citation  



BibTeX formatted citation

 ×  



Data provided by: 




Bookmark

 BibSonomy logoReddit logo


 

Bibliographic and Citation Tools






Bibliographic Explorer (What is the Explorer?)  





Litmaps (What is Litmaps?)  





scite Smart Citations (What are Smart Citations?)  





 

Code, Data and Media Associated with this Article






CatalyzeX Code Finder for Papers (What is CatalyzeX?)  





DagsHub (What is DagsHub?)  





Papers with Code (What is Papers with Code?)  





ScienceCast (What is ScienceCast?)  



 

Demos






Replicate (What is Replicate?)  





Hugging Face Spaces (What is Spaces?)  





 

Recommenders and Search Tools






Influence Flower (What are Influence Flowers?)  





Connected Papers (What is Connected Papers?)  





CORE Recommender (What is CORE?)  


  • Venue
  • Institution
  • Topic







  • arXivLabs: experimental projects with community collaborators


    arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

    Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

    Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
     








    Which authors of this paper are endorsers? |  Disable MathJax (What is MathJax?)  








    About

    Help
     




    contact arXivClick here to contact arXiv  Contact  

    subscribe to arXiv mailingsClick here to subscribe  Subscribe  









    Copyright

    Privacy Policy
     




    Web Accessibility Assistance


    arXiv Operational Status 
    Get status notifications via  emailorslack