2,800 captures
18 Feb 2009 - 03 Feb 2026
Aug SEP Oct
16
2019 2020 2021
success
fail

About this capture

COLLECTED BY

Organization: Archive Team

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.

The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.

This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.

Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.

The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

Collection: ArchiveBot: The Archive Team Crowdsourced Crawler

ArchiveBot is an IRC bot designed to automate the archival of smaller websites (e.g. up to a few hundred thousand URLs). You give it a URL to start at, and it grabs all content under that URL, records it in a WARC, and then uploads that WARC to ArchiveTeam servers for eventual injection into the Internet Archive (or other archive sites).

To use ArchiveBot, drop by #archivebot on EFNet. To interact with ArchiveBot, you issue commands by typing it into the channel. Note you will need channel operator permissions in order to issue archiving jobs. The dashboard shows the sites being downloaded currently.

There is a dashboard running for the archivebot process at http://www.archivebot.com.

ArchiveBot's source code can be found at https://github.com/ArchiveTeam/ArchiveBot.

TIMESTAMPS
The Wayback Machine - http://web.archive.org/web/20200916191239/https://github.com/apache
Skip to content
Sign in Sign up
@apache

The Apache Software Foundation

  • Language: All
    Select language

    Repositories

    C Apache-2.0 164 238 78 (3 issues need help) 17 Updated Sep 16, 2020
  • Apache Ignite

    iot cloud sql database big-data hadoop cache
    Java Apache-2.0 1,583 3,440 0 662 Updated Sep 16, 2020
  • Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…

    arrow
    C++ Apache-2.0 1,566 6,174 0 88 Updated Sep 16, 2020
  • Muchos (an Apache Fluo project) sets up Apache Accumulo or Apache Fluo on a cluster (optionally launched in Amazon EC2 or Microsoft Azure) for development

    java aws ansible big-data hadoop azure apache
    Python Apache-2.0 32 18 32 1 Updated Sep 16, 2020
  • Uno (an Apache Fluo project) sets up Apache Accumulo or Apache Fluo on a single machine for development

    java big-data fluo
    Shell Apache-2.0 28 26 12 0 Updated Sep 16, 2020
  • Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    python workflow airflow scheduler apache apache-airflow
    26 packages Python Apache-2.0 7,077 18,246 594 (1 issue needs help) 169 Updated Sep 16, 2020
  • Apache Daffodil (Incubating)

    daffodil
    Scala Apache-2.0 38 27 0 7 Updated Sep 16, 2020
  • Apache Spark - A unified analytics engine for large-scale data processing

    python java r scala sql big-data spark
    Scala Apache-2.0 22,505 27,477 0 232 Updated Sep 16, 2020
  • Apache Flink

    java scala big-data flink
    Java Apache-2.0 7,707 14,142 1 459 Updated Sep 16, 2020
  • Mirror of Apache Phoenix

    java phoenix sql database big-data
    Java Apache-2.0 867 828 0 224 Updated Sep 16, 2020
  • Apache Superset is a Data Visualization and Data Exploration Platform

    react python flask data-science bi analytics superset
    Python Apache-2.0 6,226 30,089 277 (1 issue needs help) 74 Updated Sep 16, 2020
  • Mirror of Apache Kafka

    scala kafka
    Java Apache-2.0 8,990 16,864 0 772 Updated Sep 16, 2020
  • Apache Beam is a unified programming model for Batch and Streaming

    python java golang streaming sql big-data beam
    Java Apache-2.0 2,667 4,244 0 121 Updated Sep 16, 2020
  • Apache NLPCraft - API to convert natural language into actions.

    java nlp scala apache
    Scala Apache-2.0 7 18 0 0 Updated Sep 16, 2020
  • Open deep learning compiler stack for cpu, gpu and specialized accelerators

    javascript machine-learning performance deep-learning metal compiler gpu
    Python Apache-2.0 1,646 5,722 103 66 Updated Sep 16, 2020
  • Apache Geode

    geode
    Java Apache-2.0 604 1,821 0 43 Updated Sep 16, 2020
  • Apache Iceberg

    apache iceberg
    Java Apache-2.0 272 679 191 (3 issues need help) 35 Updated Sep 16, 2020
  • Mirror of Apache Commons FileUpload

    commons
    Java Apache-2.0 108 133 0 6 Updated Sep 16, 2020
  • Apache Commons BCEL

    commons
    Java Apache-2.0 84 129 0 7 Updated Sep 16, 2020
  • Apache Druid: a high performance real-time analytics database.

    druid
    Java Apache-2.0 2,688 10,069 929 77 Updated Sep 16, 2020
  • Apache druid

    druid
    HTML 40 5 0 1 Updated Sep 16, 2020
  • Apache NiFi

    java nifi
    Java Apache-2.0 1,851 2,245 0 273 Updated Sep 16, 2020
  • Apache MXNet Site

    mxnet
    HTML 19 12 10 1 Updated Sep 16, 2020
  • Apache Tomcat

    java http tomcat javaee network-server
    Java Apache-2.0 3,320 4,838 0 20 Updated Sep 16, 2020
  • Apache Royale Compiler

    royale
    Java 42 80 34 2 Updated Sep 16, 2020
  • Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

    java integration camel
    Java Apache-2.0 4,110 3,390 0 3 Updated Sep 16, 2020
  • Apache Traffic Control

    trafficcontrol
    Go 224 467 579 (1 issue needs help) 71 Updated Sep 17, 2020
  • Apache Accumulo

    java big-data accumulo
    Java Apache-2.0 332 822 136 (2 issues need help) 23 Updated Sep 16, 2020
  • Mirror of Apache Storm

    java big-data storm
    Java 4,057 6,137 0 45 Updated Sep 17, 2020
  • Apache Lucene and Solr open-source search software

    search java search-engine information-retrieval backend nosql solr
    Java Apache-2.0 2,541 3,764 0 308 Updated Sep 17, 2020
  • Top languages

    Java HTML JavaScript Python C++

    Most used topics

    java sling network-server big-data cplusplus

    People

    @vic @brianm @infil00p @jthomas @mjwall @mrkn @rubys @esjewett @valdar @reedox @iamaleksey @janl @vanto @realityforge @kocolosk @atoulme @eddyxu @mars @davelester @brondsem
  • Privacy
  • Security
  • Status
  • Help
  • You can’t perform that action at this time.