| May | JUN | Jul |
| 11 | ||
| 2017 | 2018 | 2019 |
COLLECTED BY
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.
Collection: Archive Team: The Github Hitrub
query
keyword which is the query text for elastic search.
Guidelines for good queries:
Queries should get as close as possible to fingerprinting the root cause. A
screen log query (e.g. tags:"screen-n-net.txt") is typically better than
a console one (tags:"console"), as that's matching a deep failure versus
a surface symptom.
Queries should not return any hits for successful jobs, this is a sign the
query isn't specific enough. A rule of thumb is > 10% success hits probably
means this isn't good enough.
If it's impossible to build a query to target a bug, consider patching the
upstream program to be explicit when it fails in a particular way.
Use the 'tags' field rather than the 'filename' field for filtering. This is
primarily because of grenade jobs where the same log file shows up in the
'old' and 'new' side of the grenade job. For example,
tags:"screen-n-cpu.txt" will query in logs/old/screen-n-cpu.txt and
logs/new/screen-n-cpu.txt. The tags:"console" filter is also used to
query in console.html as well as tempest and devstack logs.
Avoid the use of wildcards in queries since they can put an undue burden on
the query engine. A common case where wildcards are used and shouldn't be are
in querying against a specific set of build_name fields, e.g.
gate-nova-python26 and gate-nova-python27. Rather than use
build_name:gate-nova-python*, list the jobs with an OR. For example:
(build_name:"gate-nova-python26" OR build_name:"gate-nova-python27")When adding queries you can optionally suppress the creation of graphs and notifications by adding
suppress-graph: trueorsuppress-notification: true to the yaml file. These can be used to make
sure expected failures don't show up on the unclassified page.
If the only signature available is overly broad and adding additional logging
can't reasonably make a good signature, you can also filter the results of a
query based on the test_ids that failed for the run being checked.
This can be done by adding a test_ids keyword to the query file and then a
list of the test_ids to verify failed. The test_id also should exclude any
attrs, this is the list of attrs appended to the test_id between '[]'. For
example, 'smoke', 'slow', any service tags, etc. This is how subunit-trace
prints the test ids by default if you're using it. If any of the listed
test_ids match as failing for the run being checked with the query it will
return a match. Since filtering leverages subunit2sql which only receives
tempest test results from the gate pipeline, this technique will only work on
tempest or grenade jobs in the gate queue. For more information about this
refer to the infra subunit2sql documentation For example, if your query yaml file looked like:
query: >- message:"ExceptionA" test_ids: - tempest.api.compute.servers.test_servers.test_update_server_name - tempest.api.compute.servers.test_servers_negative.test_server_set_empty_namethis will only match the bug if the logstash query had a hit for the run and either test_update_server_name or test_server_set_empty name failed during the run. In order to support rapidly added queries, it's considered socially acceptable to approve changes that only add 1 new bug query, and to even self approve those changes by core reviewers.
Related-Bug tag in the footer, or add a comment
to the bug with the query you identified and a link to the logstash URL for
that query search.
Putting the logstash query link in the bug report is also valuable in the
case of rare failures that fall outside the window of how far back log
results are stored. In such cases the bug might be marked as Incomplete
and the e-r query could be removed, only for the failure to re-surface
later. If a link to the query is in the bug report someone can easily
track when it started showing up again.
Add the query to elastic-recheck/queries/BUGNUMBER.yaml
(All queries can be found on git.openstack.org)
and push the patch up for review.
You can also help classify Unclassified failed jobs, which
is an aggregation of all failed voting gate jobs that don't currently have
elastic-recheck fingerprints.
$ elastic-recheck-query queries/1331274.yaml total hits: 133 build_status 100% FAILURE build_name 48% check-grenade-dsvm 15% check-grenade-dsvm-partial-ncpu 13% gate-grenade-dsvm 9% check-grenade-dsvm-icehouse 9% check-grenade-dsvm-partial-ncpu-icehouse build_branch 95% master 4% stable/icehouse