4 captures
11 Jun 2018 - 17 Aug 2025
May JUN Jul
11
2017 2018 2019
success
fail

About this capture

COLLECTED BY

Organization: Archive Team

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.

The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.

This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.

Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.

The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

Collection: Archive Team: The Github Hitrub

This collection is a set of Github repository archives from two major sets: A panic grab upon the acquisition by Microsoft, and a larger, ongoing set of Pretty Much Everything.
TIMESTAMPS

The Wayback Machine - http://web.archive.org/web/20180611014808/https://github.com/openstack-infra/elastic-recheck
 
Skip to content  





Features  

Business  

Explore  

Marketplace  

Pricing  









Sign in  or Sign up  












Watch  

Star  

Fork  

/elastic-recheck

 
 






Join GitHub today


GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.

Sign up  






Classify tempest-devstack failures using ElasticSearch    







2,113   commits  

1 branch  

0   releases  

Fetching contributors  

Apache-2.0  




(一) Python  53.0%  

(二) JavaScript  39.3%  

(三) HTML  7.3%  

(四) CSS  0.4%  






Python  JavaScript  HTML  CSS  





Switch branches/tags  





Branches  

Tags  





master  

Nothing to show
 




Nothing to show
 






Find file  
Clone or download  



Clone with HTTPS  


Use Git or checkout with SVN using the web URL.  





Download ZIP  


Launching GitHub Desktop...


If nothing happens, download GitHub Desktop and try again.


 

Launching GitHub Desktop...


If nothing happens, download GitHub Desktop and try again.


 

Launching Xcode...


If nothing happens, download Xcode and try again.


 

Launching Visual Studio...


If nothing happens, download the GitHub extension for Visual Studio and try again.


 



Fetching latest commit  

Cannot retrieve the latest commit at this time.  
Permalink  
Failed to load latest commit information.
doc/source
elastic_recheck
queries
tools
web
.coveragerc
.gitignore
.gitreview
.testr.conf
.zuul.yaml
CONTRIBUTING.rst
LICENSE
MANIFEST.in
README.rst
babel.cfg
bindep.txt
elasticRecheck.conf.sample
recheckwatchbot.yaml
requirements.txt
setup.cfg
setup.py
test-requirements.txt
tox.ini
web_server.py

README.rst  

elastic-recheck


"Use ElasticSearch to classify OpenStack gate failures"


Open Source Software: Apache license

Idea


Identifying the specific bug that is causing a transient error in the gate is difficult. Just identifying which tempest test failed is not enough because a single tempest test can fail due to any number of underlying bugs. If we can find a fingerprint for a specific bug using logs, then we can use ElasticSearch to automatically detect any occurrences of the bug.

Using these fingerprints elastic-recheck can:


Search ElasticSearch for all occurrences of a bug.

Identify bug trends such as: when it started, is the bug fixed, is it getting worse, etc.

Classify bug failures in real time and report back to gerrit if we find a match, so a patch author knows why the test failed.

queries/


All queries are stored in separate yaml files in a queries directory at the top of the elastic-recheck code base. The format of these files is ######.yaml (where ###### is the launchpad bug number), the yaml should have a query keyword which is the query text for elastic search.

Guidelines for good queries:


Queries should get as close as possible to fingerprinting the root cause. A screen log query (e.g. tags:"screen-n-net.txt") is typically better than a console one (tags:"console"), as that's matching a deep failure versus a surface symptom.


Queries should not return any hits for successful jobs, this is a sign the query isn't specific enough. A rule of thumb is > 10% success hits probably means this isn't good enough.


If it's impossible to build a query to target a bug, consider patching the upstream program to be explicit when it fails in a particular way.


Use the 'tags' field rather than the 'filename' field for filtering. This is primarily because of grenade jobs where the same log file shows up in the 'old' and 'new' side of the grenade job. For example, tags:"screen-n-cpu.txt" will query in logs/old/screen-n-cpu.txt and logs/new/screen-n-cpu.txt. The tags:"console" filter is also used to query in console.html as well as tempest and devstack logs.


Avoid the use of wildcards in queries since they can put an undue burden on the query engine. A common case where wildcards are used and shouldn't be are in querying against a specific set of build_name fields, e.g. gate-nova-python26 and gate-nova-python27. Rather than use build_name:gate-nova-python*, list the jobs with an OR. For example:
(build_name:"gate-nova-python26" OR build_name:"gate-nova-python27")



When adding queries you can optionally suppress the creation of graphs and notifications by adding suppress-graph: trueorsuppress-notification: true to the yaml file. These can be used to make sure expected failures don't show up on the unclassified page.

If the only signature available is overly broad and adding additional logging can't reasonably make a good signature, you can also filter the results of a query based on the test_ids that failed for the run being checked. This can be done by adding a test_ids keyword to the query file and then a list of the test_ids to verify failed. The test_id also should exclude any attrs, this is the list of attrs appended to the test_id between '[]'. For example, 'smoke', 'slow', any service tags, etc. This is how subunit-trace prints the test ids by default if you're using it. If any of the listed test_ids match as failing for the run being checked with the query it will return a match. Since filtering leverages subunit2sql which only receives tempest test results from the gate pipeline, this technique will only work on tempest or grenade jobs in the gate queue. For more information about this refer to the infra subunit2sql documentation For example, if your query yaml file looked like:
query: >-
  message:"ExceptionA"
test_ids:
  - tempest.api.compute.servers.test_servers.test_update_server_name
  - tempest.api.compute.servers.test_servers_negative.test_server_set_empty_name

this will only match the bug if the logstash query had a hit for the run and either test_update_server_name or test_server_set_empty name failed during the run.

In order to support rapidly added queries, it's considered socially acceptable to approve changes that only add 1 new bug query, and to even self approve those changes by core reviewers.

Adding Bug Signatures


Most transient bugs seen in gate are not bugs in tempest associated with a specific tempest test failure, but rather some sort of issue further down the stack that can cause many tempest tests to fail.


Given a transient bug that is seen during the gate, go through the logs and try to find a log that is associated with the failure. The closer to the root cause the better.


Note that queries can only be written against INFO level and higher log messages. This is by design to not overwhelm the search cluster.

Since non-voting jobs are not allowed in the gate queue and e-r is primarily used for tracking bugs in the gate queue, it doesn't spend time tracking race failures in non-voting jobs since they are considered unstable by definition (since they don't vote).

There is, however, a special 'allow-nonvoting' key that can be added to a query yaml file to allow tracking non-voting job bug failures in the graph. They won't show up in the bot though (IRC or Gerrit comments).





Go to logstash.openstack.org and create an elastic search query to find the log message from step 1. To see the possible fields to search on click on an entry. Lucene query syntax is available at lucene.apache.org.


Tag your commit with a Related-Bug tag in the footer, or add a comment to the bug with the query you identified and a link to the logstash URL for that query search.

Putting the logstash query link in the bug report is also valuable in the case of rare failures that fall outside the window of how far back log results are stored. In such cases the bug might be marked as Incomplete and the e-r query could be removed, only for the failure to re-surface later. If a link to the query is in the bug report someone can easily track when it started showing up again.


Add the query to elastic-recheck/queries/BUGNUMBER.yaml (All queries can be found on git.openstack.org) and push the patch up for review.



You can also help classify Unclassified failed jobs, which is an aggregation of all failed voting gate jobs that don't currently have elastic-recheck fingerprints.

Removing Bug Signatures


Old queries which are no longer hitting in logstash and are associated with fixed or incomplete bugs are routinely deleted. This is to keep the load on the elastic-search engine as low as possible when checking a job failure. If a bug marked as Incomplete does show up again, the bug should be re-opened with a link to the failure and the e-r query should be restored.

Queries that have "suppress-graph: true" in them generally should not be removed since we basically want to keep those around, they are persistent infra issues and are not going away.

Steps:


(一)Go to the All Pipelines page.

(二)Look for anything that is grayed out at the bottom which means it has not had any hits in 10 days.

(三)From those, look for the ones that are status of Fixed/Incomplete/Invalid/Won't Fix in Launchpad - those are candidates for removal.



Note

Sometimes bugs are still New/Confirmed/Triaged/In Progress but have not had any hits in over 10 days. Those bugs should be re-assessed to see if they are now actually fixed or incomplete/invalid, marked as such and then remove the related query.

Running Queries Locally


You can execute an individual query locally and analyze the search results:
$ elastic-recheck-query queries/1331274.yaml
total hits: 133
build_status
  100% FAILURE
build_name
  48% check-grenade-dsvm
  15% check-grenade-dsvm-partial-ncpu
  13% gate-grenade-dsvm
  9% check-grenade-dsvm-icehouse
  9% check-grenade-dsvm-partial-ncpu-icehouse
build_branch
  95% master
  4% stable/icehouse

Notes



The html generation will generate links that work with Kibana3's logstash.json dashboard. If you want the links to work properly on these generated files you will need to host a Kibana3 with that dashboard.

View the OpenStack ElasticSearch cluster health here.

Future Work



Move config files into a separate directory

Make unit tests robust

Add debug mode flag

Expand gating testing

Cleanup and document code better

Add ability to check if any resolved bugs return

Move away from polling ElasticSearch to discover if its ready or not

Add nightly job to propose a patch to remove bug queries that return no hits -- Bug hasn't been seen in 2 weeks and must be closed

 










© 2018 GitHub, Inc.

Terms

Privacy

Security

Status

Help
 


Contact GitHub

API

Training

Shop

Blog

About
 




You cant perform that action at this time.  

You signed in with another tab or window. Reload to refresh your session.  You signed out in another tab or window. Reload to refresh your session.  

Press h to open a hovercard with more details.