Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites).
To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel operator permissions in order to issue archiving jobs. The dashboard shows the sites being downloaded currently.
Game Off is our annual month-long game jam. This year’s theme was “moonshot,” and there were more than 500 tre-moon-dous submissions! 🌛 Here are some of the top-rated games as voted on by the developers
In celebrating GitHub Security Lab’s one-year anniversary, we explained that we’re expanding our research focus. Why did we make this decision? The decision stemmed from our work with the Open Source Security Coalition (OSSC) where
@derrickstolee recently discussed several different git clone options, but how do those options actually affect your Git performance? Which option is fastest for your client experience? Which option is fastest for your build machines? How can these options impact
As your Git repositories grow, it becomes harder and harder for new developers to clone and start working on them. Git is designed as a distributed version control system. This means that you can work on your
Today, GitHub joined an amicus brief in NSO v. WhatsApp, opposing the expansion of foreign sovereign immunity to private cyber-surveillance companies that act on behalf of foreign governments. GitHub joined alongside Cisco, Google, LinkedIn, Microsoft,
This is the second post in a series about how we built our new homepage. In the first post, my teammate Tobias shared how we made the 3D globe come to life, with lots of
GitHub is where the world builds software. More than 56 million developers around the world build and work together on GitHub. With our new homepage, we wanted to show how open source development transcends the
2020 has been a year of change, with shifts to the way organizations of every size connect, collaborate, and build together. From our 2020 State of the Octoverse report to last week’s GitHub Universe, we’ve
Learn about ghapi, a third-party Python library and CLI client for the GitHub API. It includes tab-completion, integrated documentation and automatic pagination of responses. ghapi automatically manages required headers, query strings, route parameters, post data, and much more.