Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Status 2 Recent activities 3 Current activity 4 To do 5 Forthcoming attractions 6 Ambiguity problems

User:The Anomebot2

●User page ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●User contributions ●User logs ●View user groups ●Upload file ●Special pages ●Permanent link ●Page information ●Get shortened URL ●Download QR code Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

This user account is a bot operated by The Anome (talk). It is used to make repetitive automatedorsemi-automated edits that would be extremely tedious to do manually, in accordance with the bot policy. The bot is approved and currently active – the relevant request for approval can be seen here.
Administrators: if this bot is malfunctioning or causing harm, please block it.

Emergency bot shutoff button

Administrators: Use this button if the bot is malfunctioning. (direct link)
Non-administrators can report a malfunctioning bot to Wikipedia:Administrators' noticeboard/Incidents.

Note: Blocking will stop further edits: the bot will intermittently retry errors for several minutes, but should then automatically shut itself down until restarted manually; please use a ten minute block or longer to be sure of stopping it.

This bot is designed to add standardized machine-readable geodata records to relevant articles in the English-language Wikipedia, using data from GNS, GNIS, OSGB coordinates in UK articles, plaintext geodata scraped from article text, and interwiki-linked geotag data from other-language Wikipedias. -- The Anome 12:13, 22 September 2007 (UTC)

Status

[edit]

125,000+ geotags added to date, based on data from a variety of sources

102,000+ more articles identified as potentially eligible for tagging have been marked using {{coord missing}}

61,000+ articles reformatted from old geotag formats to use {{coord}}

- The Anome (talk) 14:10, 13 September 2008 (UTC)

Update: As of 2009-06-12:

365,800 distinct articles edited -- The Anome (talk) 17:08, 12 June 2009 (UTC)

Recent activities

[edit]

Currently backfilling a number of corner cases missed by earlier over-cautious heuristics, using:

machine parsing of plaintext geodata found in dumps
automatched GNS data
interwiki-matched machine-readable geodata from other language editions

This is very laborious for the bot, as it requires the re-scanning of large numbers of false positives, and will result in only a few hundred articles being geocoded, but machine time is cheap, the re-scans are necessary in any case, and this will lay the foundations for larger systematic efforts to come later.

-- The Anome (talk) 14:30, 14 September 2008 (UTC)

Done. -- The Anome (talk) 04:29, 15 September 2008 (UTC)

Current activity

[edit]

Finishing adding a large number of {{coord missing}} tags. Almost complete. -- The Anome (talk) 23:50, 13 October 2008 (UTC)

To do

[edit]

Geotags:

~~Standardize existing geotags. "coor title *" is now done, "coor *" pending.~~
Finish adding "coord missing" to all eligible articles.
- Ancient sites should be templatable as missing, while still blocked from being given coordinates automatically.
- Go back and use CatScan to find any remaining franchises mis-tagged as "coord missing".
~~Rebuild article state map from log file and other stored data.~~

Interwiki:

In the absence of up-to-date Kolossus data, start using the externallink table API to live-scan non-en: Wikipedia editions for URLs in order to obtain interwiki patterns
Use full interwiki data to regenerate fuller tags where only KML data was used for earlier tagging.

Consistency and correctness:

~~Use 1-degree-tile binning to look for outliers~~
~~Look for misuse of coord tags for offplanet locations: report to WikiProject for fixing~~

Matching:

Hierarchical matching with disambiguation by subnational entities; rejected some time ago because ineffective, but may have become possible with greater navbox systematization in last year
Open research topic: Bayesian inference of relative locality from the link graph -- this may be an effective way of handling the above. Use places with known locations as training set.
Properly handle undersea features and disputed territories with no applicable recognized country
Types of places not yet keyword-matched during graph traversal:
- Casinos
- Resorts
- Historic districts [?]
- Ports and harbo[u]rs by country
- ~~Bus and some metro stations~~

New data sources:

Collect lists of country-specific coordinate data
Mine geodata from images included in articles (thanks to User:Planemad for the suggestion)

Infoboxes:

Scan for unusual/broken parameters in infoboxes.
Start work on standardizing infoboxes.

-- The Anome (talk) 13:47, 12 October 2008 (UTC)

Forthcoming attractions

[edit]

With >70,000 data points, I now have enough data to do a spatial analysis of the category tree, and to generate lists of possibly misclassified or mislocated outliers. The cleaned up bounding data could then be used as a Bayesian classifier for future work. -- The Anome 10:14, 24 August 2007 (UTC)

The category+link graph may be a better choice for this. -- The Anome (talk) 13:59, 12 October 2008 (UTC)

Ambiguity problems

[edit]

Because of severe name ambiguity problems,

Japanese locations are now filtered out of most machine-matched geodata sets.
Recent Canadian data has had similar problems, and is now also filtered from the output of several matching algorithms.
Because of numerous bizarrely-formatted disambig pages which confused the matching algorithms, Polish locations are also filtered.

-- The Anome (talk) 13:58, 12 October 2008 (UTC)

Retrieved from "https://en.wikipedia.org/w/index.php?title=User:The_Anomebot2&oldid=482254450" Categories: ●Active Wikipedia bots ●All Wikipedia bots ●This page was last edited on 16 March 2012, at 20:37 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view