For the policy on bot requirements, see WP:BOTREQUIRE.
This is a page for requesting tasks to be done by bots per the bot policy. This is an appropriate place to put ideas for uncontroversial bot tasks, to get early feedback on ideas for bot tasks (controversial or not), and to seek bot operators for bot tasks. Consensus-building discussions requiring large community input (such as request for comments) should normally be held at WP:VPPROP or other relevant pages (such as a WikiProject's talk page).
You can check the "Commonly Requested Bots" box above to see if a suitable bot already exists for the task you have in mind. If you have a question about a particular bot, contact the bot operator directly via their talk page or the bot's talk page. If a bot is acting improperly, follow the guidance outlined in WP:BOTISSUE. For broader issues and general discussion about bots, see the bot noticeboard.
Before making a request, please see the list of frequently denied bots, either because they are too complicated to program, or do not have consensus from the Wikipedia community. If you are requesting that a template (such as a WikiProject banner) is added to all pages in a particular category, please be careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively (see example difference).
Alternatives to bot requests
WP:AWBREQ, for simple tasks that involve a handful of articles and/or only needs to be done once (e.g. adding a category to a few articles).
WP:URLREQ, for tasks involving changing or updating URLs to prevent link rot (specialized bots deal with this).
WP:SQLREQ, for tasks which might be solved with an SQL query (e.g. compiling a list of articles according to certain criteria).
Note to bot operators: The {{BOTREQ}} template can be used to give common responses, and make it easier to keep track of the task's current status. If you complete a request, note that you did with {{BOTREQ|done}}, and archive the request after a few days (WP:1CA is useful here).
Please add your bot requests to the bottom of this page. Make a new request
The issue is that files in categories are displayed by default, and this violates WP:NFCC#9 if there are non-free files in the category. They have to be tagged with __NOGALLERY__ if you want to disable display of non-free files in a category. This is an urgent issue, as categories without this tag thatt contain non-free files are everywhere, and because we take copyright very seriously it cannot wait for a human user to find the category and add the __NOGALLERY__ tag, which is why this task requires a bot. Every other routine task involving non-free files, such removing instances without a valid fair use tag, is already handled by a bot.
The previous discussion stalled after a user objected and suggested adding a new feature to MediaWiki to disable category galleries by default, which is less convenient due to requiring WMF action, and it would create the opposite problem: we would need a bot to enable gallery mode on categories that contain only free files. Even though most files hosted locally are non-free, there is no reason why a bot couldn't handle the task of adding necessary __NOGALLERY__ tags at the required scale. Only one other person contributed to the discussion, who objected the suggestion for a new MediaWiki feature because it would hinder navigation of categories specifically for free files, and nothing else happened after that. –LaundryPizza03 (dc̄) 05:58, 30 March 2024 (UTC)[reply]
I'm aware I've been pinged, but I don't have time to look into this right now. Someone else can take this up, or I'll circle back when I have the time. — JJMC89 (T·C) 17:38, 11 April 2024 (UTC)[reply]
@LaundryPizza03: that means, first the bot would have to identify categories with non-free files. I am not sure how JJMC89's bot works, but I am guessing it works through recent changes patrolling. @JJMC89: Is the source code of relevant task public? —usernamekiran (talk)02:17, 13 April 2024 (UTC)[reply]
Free images mostly are moved to Commons, or not? So we seldom wil have a category containing only free files. Then the category identification is easy: all file categories should have the nogallery tag. Wikiwerner (talk) 18:43, 15 May 2024 (UTC)[reply]
Here's the SQL query I ended up with:
MariaDB [enwiki_p]> select count(distinct cl2.cl_to) from categorylinks as cl1 join categorylinks as cl2 on cl1.cl_from=cl2.cl_from join page on page_title=cl2.cl_to left join page_props on page_id=pp_page and pp_propname="nogallery" where cl1.cl_to="All_non-free_media" and cl1.cl_type="file" and page_namespace=14 and pp_propname IS NULL;
+---------------------------+
| count(distinct cl2.cl_to) |
+---------------------------+
| 5070 |
+---------------------------+
1 row in set (1 hour 35 min 26.445 sec)
So there are roughly 5,000 categories this applies to. I do think that we should make sure this bot also cleans up after itself, once a category no longer has any non-free files in it, the NOGALLERY switch should be removed. Creating a wrapper like {{non-free category gallery}} or something would make it explicit what the intention of the nogallery tag is so it can be safely removed once no longer necessary. Legoktm (talk) 17:32, 21 June 2024 (UTC)[reply]
I wonder whether a bot automatically adding __NOGALLERY__ really is the best idea in the first place, versus a report so humans can decide whether the best solution is to remove the non-free images instead. Anomie⚔20:21, 22 June 2024 (UTC)[reply]
Did you stop reading after the first half of the sentence? so humans can decide whether the best solution is to remove the non-free images instead. Although I was thinking more of Wikipedia:Database reports rather than a maintenance template and category. Anomie⚔11:50, 23 June 2024 (UTC)[reply]
I commented here hoping to just provide enough clues for someone else to pick up the task, but I suppose I can add another database report ;) Hopefully that provides enough data on whether this is suitable for an automated task or not. Legoktm (talk) 01:34, 24 June 2024 (UTC)[reply]
I used to have one but then the toolserver changed and somehow an account isn't easy to come by, now (I posted before, but nothing came of it)... ~Loftyabyss03:09, 10 April 2024 (UTC)[reply]
It's also, generally, what the cvn bots do already there, except this would need to be a more specific page that is watched. ~Loftyabyss20:16, 13 April 2024 (UTC)[reply]
Obviously, I can't reply for them, but isn't there some way to do this collaboratively, like on github? One person doing it might be a bit much... ~Loftyabyss10:29, 29 April 2024 (UTC)[reply]
@Lofty abyss: do you just want a bot that relays changes to the AIV wiki page? If so, you can use wm-bot for this. Also, getting a Toolforge account should be straightforward now, see the quickstart. If you end up making a membership request, please ping me and I can approve it for you. Legoktm (talk) 18:24, 21 June 2024 (UTC)[reply]
Bot to update match reports to cite template[edit]
Per this discussion on WikiProject Football, it appears to be the best course of action to edit match report external links to full cited templates because of WP:LINKROT. I am making a request for a bot that could automatically do this, as there are many football pages that use the direct link system. An example of a page that does not is 2024 OFC Nations Cup qualification, while a major page that does use the direct link is the 2022 FIFA World Cup page. Yoblyblob (Talk) :) 13:01, 16 April 2024 (UTC)[reply]
@Mdann52 no, but it does follow WP:LINKROT, but I could make another discussion for further consensus. Is the best place for that on the project page? Yoblyblob (Talk) :) 12:16, 21 May 2024 (UTC)[reply]
I think it is actually poor form (directed at @Yoblyblob:) to start manually making edits that do this task in advance of a BOT that is going to perform this task, once created. Especially for current tournaments. And especially reverting WP:GOODFAITH edits consistent with most other match results which follow that format, reverting without any edit summary. Pretty poor form for that. It will just create unnecessarily arguments, such as currently at User talk:J man708. Once the BOT is functional, which I understand that it is not yet, the changes will just happen and people will get used to it (and not argue with the BOT). It hurts nothing to leave the edits (by@J man708:) alone for the time being. Imagine if you started doing this in the middle of a World Cup or CONMEBOL competition! I recommend @Yoblyblob: simply to wait for the BOT functionality to occur.Matilda Maniac (talk) 22:46, 19 June 2024 (UTC)[reply]
I really do not understand the need for this bot at all and even less so understand the need for the Match Report function to be changed from what it is. Using a archived match report link prevents linkrot, surely. - J man708 (talk) 00:40, 20 June 2024 (UTC)[reply]
I would argue rather strongly against that. There are plenty of gnomes who do the same things as bots do, and while waiting for a bot request to be actioned, why not manually clean up some of these things? If someone wants to spend their time making edits that might not be made for a while (BRFA is not always a fast process) then more power to 'em. I will, of course, support your statement that edit summaries are Good Things. Primefac (talk) 01:06, 20 June 2024 (UTC)[reply]
@J man708 and Matilda Maniac: happy to put the BRFA on hold if the discussion needs to be resumed. Just let me know. There's also the option of another template to wrap this in to avoid the Bare URL issue, but then this won't automatically be picked up by the various bots that deal with dead links and archiving. Mdann52 (talk) 21:11, 20 June 2024 (UTC)[reply]
Did not expect to see people against this, not sure how to get more participation in a discussion as the ones at WP Football were evidently too limited Yoblyblob (Talk) :) 21:21, 20 June 2024 (UTC)[reply]
Bot to mass tag California State University sports seasons[edit]
Hi! This is my first request here, so please tell me if I did something wrong. As a part of the California State University task force, I'm looking to add {{WikiProject California|calstate=yes|calstate-importance=low}} to all of miscellaneous sports seasons' talk pages. The categories below contain the pages I'm looking to add the tag to, the bulk of the pages are from the football programs at each institution.
With ~800 pages to deal with, this might be worth an AWBTASKS request, but that's mainly because by the time someone other than me puts through a BRFA it could probably be done there (not saying this isn't worth a bot task, just thinking about timing). Primefac (talk) 18:13, 19 April 2024 (UTC)[reply]
I don't know if just removing the template from each member of this category is ideal - I've worked through this backlog before, and (if I remember correctly) quite a number of the redirects in this category were intended to be linked to a Wikidata item, but for whatever reason weren't. My understanding of the point of this tracking category was to allow editors to go through and remove the rcat where there isn't a Wikidata item to link to, and to connect an item to the page where there is -- in at least some cases that I recall, the Q-identifier had already been passed as a parameter to the template, but the page just hadn't been properly linked to the Wikidata item. To take a random example from that category, {{Wikidata redirect|Q78588304}} was added to Hester Fordinthis edit; but that redirect currently appears in the tracking category, as this connection was never made on Wikidata itself. Simply removing all the templates and losing this information isn't the best thing to do here, in my opinion. All the best, —a smart kitten[meow]16:12, 21 April 2024 (UTC)[reply]
You have a point if there is a QID that it may require manual review, but Wikidata redirect with no QID on an unlinked item has no useful information so can safely be removed. And I genuinely have no idea why i.e -ous should be linked to Wikidata - I couldn't find an item on it at all, so the tracking is useless. * Pppery *it has begun...16:18, 21 April 2024 (UTC)[reply]
Regarding templates with no QIDs passed, there's the possibility that an editor found a matching item but didn't properly link it; in which case a redirect appearing in this category would signify that there may be an item that can be linked - albeit limited information, but info which an interested editor could use to search for an item on Wikidata with a potentially higher likelihood of success than if they'd just chosen an unlinked redirect at random. However, there's also the possibility that a bot on Wikidata created an item for a non-d:WD:N-passing soft redirect, a {{Wikidata redirect}} template was added during an AWB run on enwiki, and the Wikidata item was eventually deleted (resulting in the page appearing within this category); as I speculate may have happened with -ous. I don't have the capacity right now to look that deeply into these scenarios, though, so I'll stick with no formal opinion on the removal of templates without a QID. All the best, —a smart kitten[meow]16:35, 21 April 2024 (UTC)[reply]
Having thought about it some more, I'm feeling at least slightly opposed to removing QID-less templates - as, otherwise, an unnoticed vandal on Wikidata could disconnect an item and a redirect; and we'd be removing the flag on enwiki that could (in practice) be saying 'this page should be linked to an item on Wikidata, please reconnect it!'. There is a large backlog in this category (which, as with all backlogs, is less than ideal), but my thoughts are that the best course of action to clear that backlog wouldn't be to remove the rcat template from all the category's members; which might just have the result of keeping the same amount of work needed overall, but artificially reducing the backlog size. This seems like a maintenance category where manual review for each item is needed - on that note, I'll try and work through this category a bit myself today. All the best, —a smart kitten[meow]10:56, 23 April 2024 (UTC)[reply]
Is there any way to see what Wikidata item a Wikipedia page used to be connected to? I know that some Wikidata changes show up in the logs here, but I'm not finding "connection/disconnection" in particular. jlwoodwa (talk) 06:41, 21 June 2024 (UTC)[reply]
Inthis edit I removed a link to http://www.mycoincollection.co.uk, because it's definitely not the intended page. At the top of the page, you can see The domain Mycoincollection.co.uk may be for sale. Click here to inquire about this domain.
Would it be practical to write a bot that examines bulk quantities of external links in some manner, and identifies links that begin with the text "may be for sale" or "is for sale"? I'm thinking of starting with a database dump (partial or complete, who cares), truncating to just the domain names, creating a page with a list of those domain names, and after checking each one, indicating whether it has this text at the top. Bonus points if the bot can be instructed to remove links after human review, e.g. a human checks a batch of links, marks some as "confirmed, rotten", and the bot goes around and removes those links from sections entitled "External links", and marks them with {{dead link}} if they're anywhere else. Nyttend (talk) 23:49, 30 April 2024 (UTC)[reply]
Would only need the root URL. Like in this case, http://mycoincollection.co.uk without the "www" or any other hostname. That will reduce the number of URLs to check. The view source has:
href="https://www.trifega.com/contact.php?domain=mycoincollection.co.uk">The domain Mycoincollection.co.uk may be for sale. Click here to inquire about this domain.</a>
So really looking for https://www.trifega.com because that message "may be for sale" may change .. or not. It is common phrasing. There are a lot of sites like this around. Keep in mind, trifega.com is paying for the domain name and won't hang on to it forever. Eventually it will let it go, and the URLs will resort to 404, and be repaired by the bots as normal. Or someone else will buy it and the old URLs will return 404 since the new owner won't support them. In some cases, we have seen domains for sale this way, a new owner picks up the domain, then serves spammy content at the old URLs. It's a devious way to monetize (steal) the good name of an old site. Another scenario can occur, the old owners go out of business for a while, then reconstitute and bring it back the old URLs working again. -- GreenC00:57, 1 May 2024 (UTC)[reply]
I'm unclear: do you think this is a (potentially) workable idea or not? I know I've seen this happen heaps of times with various websites, not just trifega — otherwise I'd request something trifega-related — so I'm interested in anything where the site is for sale. Nyttend (talk) 01:04, 1 May 2024 (UTC)[reply]
I've thought about doing this in the past, but for lack of time, the resources it will take, other hanging fruit, reasons I gave above (it might resolve on its own in time), I never did it. But go for it, would be an interesting experiment to see what you discover. It will probably be an exercise in edge case hunting for those 'for sale' strings without getting false positives or false negatives (the things we don't know). Also I just asked the WaybackMachine developers if they had a mechanism for detecting these, or a database of them, I don't think so, but if they respond I'll let you know. -- GreenC01:57, 1 May 2024 (UTC)[reply]
How many mass domain squatters are there? The coin site has a privacy link to skenzo.com, who presumably now own the domain and many similar ones. If, say 90% of parked domains use one of a dozen standard formats, thenan option less prone to false positives is to look for the distinctive HTML behind each of them and to mark them as usurped so they can be treated similarly to 404s. Certes (talk) 10:17, 1 May 2024 (UTC)[reply]
I come across them because I do a lot of manual link checking for soft-404s (essentially what these are). The pages and HTML change over time. Maybe we can start a project page off WP:LINKROT the purpose initially to record instances as they are encountered. Then we can determine how to best make matches, with keywords, or HTML patterns, or other ways. Worth noting they exist in the WaybackMachine also. And confirmed that Wayback does not do anything special to detect them. -- GreenC14:32, 1 May 2024 (UTC)[reply]
Since correct placement of stub tags is impossible using the VisualEditor, I've seen a tendency for articles created using VE to exhibit a jumbled mess of stub templates, categories and reference tags at the bottom. This is actually already an AWB genfix, but I wonder if it would be appropriate to have a bot routinely monitor VE edits and implement just this fix, which shouldn't need human supervision. --Paul_012 (talk) 09:39, 10 May 2024 (UTC)[reply]
Not a good task for a bot.. Visually speaking there is no difference if a stub tag is placed before or after the categories, making it a cosmetic edit and a rather trivial one at that given that the wikitext is at the very bottom of the page. Primefac (talk) 11:14, 10 May 2024 (UTC)[reply]
The visible difference is that stub tags placed before categories will lead to stub categories appearing first, before the content categories. That's the only reason WP:LAYOUT puts them last. --Paul_012 (talk) 14:28, 10 May 2024 (UTC)[reply]
WP:LAYOUT is general guidance, enforcing that specific provision about stub template order is rather futile. Fine as part of other edits, but simply reordering the categories on its own serve very little purpose. Headbomb {t · c · p · b}20:17, 19 May 2024 (UTC)[reply]
I would like for the bot to compare our citation templates ({{citation}}, {{cite xxx}}, {{doi}}, {{doi-inline}}, {{pmid}}) against the OriginalPaperDOI and OriginalPaperPubMedID columns of the database.
If the RetractionNature column list "Retraction" as the reason, add {{Retracted|doi=RetractionDOI|pmid=RetractionPubMedID|URLS ''Retraction Watch''}} as it applies.
If the RetractionNature column lists "Expression of concern" as the reason, instead , add {{Expression of Concern|doi=RetractionDOI|pmid=RetractionPubMedID|URLS ''Retraction Watch''}} as it applies.
If the RetractionNature column lists "Reinstatement" as the reason, remove {{Expression of Concern}}/{{Retracted}} entirely.
Lastly, if the DOI/PMID of
{{Expression of Concern|doi=RetractionDOI|pmid=RetractionPubMedID|URLS ''Retraction Watch''}} now have a reason of 'Retraction', then change it to {{Retracted|doi=RetractionDOI|pmid=RetractionPubMedID|URLS ''Retraction Watch''}}
For example, if you find
...Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.[1]
^Restrepo-Arango, Marcos; Gutiérrez-Builes, Lina Andrea; Ríos-Osorio, Leonardo Alberto (April 2018). "Seguridad alimentaria en poblaciones indígenas y campesinas: una revisión sistemática". Ciência & Saúde Coletiva. 23 (4): 1169–1181. doi:10.1590/1413-81232018234.13882016. PMID29694594.
change it to
...Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.[1]
The bot could run weekly (or maybe daily if it's a quick task?) in the Main/Draft spaces, possibly others, each time redownloading the CSV.
Headbomb {t · c · p · b}19:42, 19 May 2024 (UTC)[reply]
I think it's a good idea. I don't think this comes up very often, so I doubt that we need daily runs.
I agree it's a good idea. I've got some time this weekend, so happy to contact the bot op and have a look at recoding this up again. Looks like it was stopped over edit warring concerns, so there might well be more work to do before starting this up again (unless someone else is super keen to do this!). Cheers, Mdann52 (talk) 09:01, 21 May 2024 (UTC)[reply]
Coding... - I've gained access to the existing toollabs instance and am upgrading the code to use the new datasource Headbomb has linked above. I'm planning to run within the previous approval for now (ie just flagging up DOIs), but I'm also ingesting PMED ids as well and I'm happy to look at expanding this in due course.
I've got a few additional changes I want to make (mainly because pywikibot has several extra features over the last few years(!)), so I may not be fully happy with this for a few weeks.
I note the request to add the extra columns, I think that would require an addition/update to the BRFA. Certainly happy to consider this as a slower task.
@Headbomb: I note you've suggested a report in that original BRFA - are there any you would be interested in having? Happy to host some reports/data on toollabs if this is of interest. Mdann52 (talk) 14:02, 24 May 2024 (UTC)[reply]
YDone - coding complete and loaded on toollabs ready to go. Will be running this under supervision initially with fortnightly runs (just until I find a more reasonable way to finding the references).
I don't think this will require a BRFA as it's an existing tasks and the bot is still flagged, however will do the inital batches under supervision just in case. Mdann52 (talk) 16:06, 25 May 2024 (UTC)[reply]
Bot to change citations to list defined references[edit]
I'm writing this from the perspective of a new editor. I've been struggling a lot with the way citations are commonly inserted into articles, and I think it would be a good idea to automatically convert all articles to the list defined references citation standard.
As far as I understand this is only relevant for editors using the source edit mode (?). Here's my perspective: There's two main problems with the inline citation style. 1) It's very difficult to read the text and find the relevant positions in the article, since citations - especially several citations in a row - will create long breaks in the text. 2) The even bigger problem (especially for new editors) is that inserting an already existing citation (or citing something twice) becomes unnecessarily complicated. Finding the original citation in the text, inserting a name-tag, and then using that name-tag in the new citation is confusing and tedious. List-defined-references would alleviate all these problems and make the page source codes more readable and understandable.
I understand that while editing it can be tedious to go down to the ref list, edit that, and then go back to the position in the text. That's why I think a bot would be a good solution, that can clean up articles later without affecting the editors workflow. Apoptheosis (talk) 16:50, 9 June 2024 (UTC)[reply]
Not a good task for a bot. Touching references is one of the most contentious areas on Wikipedia, and changing the manner and style in which citation are done is both a violation of WP:CITEVAR and will never have consensus as as task. Headbomb {t · c · p · b}16:56, 9 June 2024 (UTC)[reply]
@HouseBlaster: - Coding... - I'll try and pop something together over the next few days. I'm assuming the Wikidata IDs are to be relied on? Looking at some articles (such as d:Q6325806 and KBFL (AM), I'm wondering if it's worth a parallel task to sync the callsigns with Wikidata as well? (although having them as page titles may make this redundant Mdann52 (talk) 06:06, 10 June 2024 (UTC)[reply]
Mdann52, I don't love relying on Wikidata, but the few that I spot-checked were okay and I doubt anyone will systematically check the thousands of transclusions. I will note that some articles use the templates with a parameter other than the page title. Up to you if you want to work on the parallel task. Thank you so much for taking this on! — HouseBlaster (talk · he/they) 16:36, 10 June 2024 (UTC)[reply]
@HouseBlaster: I did spot the Wikidata entries, however the FCC (suprisingly) seems to have a decent API that I can query by callsign, so I can probably get up-to-date data from there to base the callsigns off... this will take slightly longer to code, however this should avoid me having to rely on Wikidata. Give me a few weeks and I'll spin something up, unless this is super urgent, when I can just use WD. Mdann52 (talk) 16:44, 10 June 2024 (UTC)[reply]
@HouseBlaster: - that new template will not work for Mexican stations (and some Canadian ones, as noted on the TfD), as the FCC has removed them from the database. I'm happy to remove the template from pages without FCC data, if this is inline with the deletion database?
The Mexican IFT have a similar database here, but this doesn't look like it's as easy to link through and the API is not published (it exists, but it's very difficult to use compared to the FCC one!), so that may be a future task. Mdann52 (talk) 05:48, 12 June 2024 (UTC)[reply]
@Qwerfjkl: - I was actually debating uploading the data to wikidata as I go, especially as I'll be getting it from the source directly and there's a lot of fields in there that won't appear on enwiki... but if you prefer to sort afterwards I can cope with that! Mdann52 (talk) 19:29, 11 June 2024 (UTC)[reply]
Tagging women's basketball article talk pages with project tags[edit]
I am requesting assistance tagging the talk pages of women's basketball articles with {{WikiProject Basketball|women=yes}} and {{WikiProject Women's sport|basketball=yes}} if not already tagged.
Well, 200K of categories isn't going to be considered likely... but is this list just the subcategories of Category:Women's basketball? The ones I've dip sampled all appear to be a member of it. If so, please just say so and spare someone a lot of work! This will likely be a good WP:AWBTASKS however, if it's a straightforward addition of WP articles. If you clarify the cats I'll do some queries to work out exactly how many pages this will likely affect. Mdann52 (talk) 18:05, 10 June 2024 (UTC)[reply]
@Hmlarson: Looks like around 11k pages in those categories, including pages already in those taskforces. Can you just drop a curtosy message onto the Wikiproject talk pages and make sure they are happy with the articles being tagged, if you haven't checked with them already, just due to the amount of edits needed? Thanks, Mdann52 (talk) 19:15, 10 June 2024 (UTC)[reply]
Usernamekiran, I have some code for this back from Task 26 that's pretty good at handling edge cases when it comes to managing talk pages (it did, after all, run on several million of them). That might help? — Qwerfjkltalk21:03, 29 June 2024 (UTC)[reply]
Friendly support for Draft categories – feedback request[edit]
I've also got time over the next few days to get this sorted - however I'll probably take a different approach to the previous BRFA to do this! Rusty, I'll drop you an email but if you're happy to give this a go, I'll hold off! Mdann52 (talk) 18:57, 16 June 2024 (UTC)[reply]
@Mdann52: Thanks for your email -- I'd be happy to give this a go and get some experience with pwb. I'd appreciate it if you'd point me in the right direction on coding a new approach to coding the bot. Rustytalkcontribs02:21, 17 June 2024 (UTC)[reply]
I don't like the use of the dictionary to read/write data into, and a database is probably a better way to store/manage the data. Thirdly, I wouldn't have a continuous edit mode, and just have a scan every 24 hours or so to pick up the new boxes. I haven't dived into the logs too deeply to pick up all the differences however, so I can't get any more particular on other changes I would make to how they are parsing things! Mdann52 (talk) 09:22, 19 June 2024 (UTC)[reply]
@Qwerfjkl: partially personal preference (this isn't an urgent task, it doesn't really have an impact on anyone if there's a short delay, so I can't see the need for the extra resources that come out of scanning the RC feed), and the lack of changes. Yes, you could scan the page after each edit, but the majority of these won't be closes, and also it means that any malformed closes have time to be fixed before the next run.
However, my day job is in a field where I need to worry about performance, scheduling and resources, which isn't as much as a priority here! Mdann52 (talk)
Sometimes a user generates citations that point to the same source, but in a separate ref tag (Statement 1<ref>Me!</ref> Statement 2! <ref>Me!</ref>), rather than using <ref name=me>Me!</ref> for the first call, and <ref name=me/> for subsequent ones. This results in a separate entry for each citation.
The easy case is to detect and fix exact copy-paste duplicates.
If you want to see how bad it can get, check out Mavis Beacon Teaches Typing (perma: [2]). That one generates the same citation in different ways.
I'd be happy with a bot that only fixes the first type, however, I'd be overjoyed with something that fixes both.
Genfixes does this on the condition that named refs are used in the article. Doing it on a mass scale when named refs are not used might ruffle some feathers. Then again, it might not. Headbomb {t · c · p · b}00:34, 18 June 2024 (UTC)[reply]
I suspect that the condition reflected a time before the Visual Editor was a thing. On copy-pasting a reference, the visual editor already adds named references with the name being something like ":0". Acebulf(talk | contribs)02:12, 18 June 2024 (UTC)[reply]
How will you know which pages to target? Also, dup refs may already have names, possibly different names. -- GreenC01:07, 18 June 2024 (UTC)[reply]
So far I've been trying it with random articles, and duplicate references are present on about 2% of all pages with the easy case. At an edit per minute, that's roughly 3 months to do the entire encyclopedia. The scanning is getting me a few matches per minute, so this seems sustainable. Acebulf(talk | contribs)02:36, 18 June 2024 (UTC)[reply]
OK. That's probably the only way. 2% is a lot, like over 100,000 pages. That many edits without a WP:BRFA will probably get noticed. By the time scanning/logging is done it could be approved. -- GreenC15:21, 18 June 2024 (UTC)[reply]
I can possibly do this following the footballbox task above. Unfortunately I've got 2 BRFAs open, so won't want to take anymore on until those are closed. Mdann52 (talk) 10:11, 23 June 2024 (UTC)[reply]
The URL scheme of the National Statistical Comitee of the Kyrgyz Republic changed from stat.kg to stat.gov.kg, everything else stayed the same. The website is often used as a reference, but the links to it don't work anymore (e.g. in Chaek: all the links lead to 404 not found) Is it possible that someone migrates the links? MarcelloIV (talk) 09:10, 23 June 2024 (UTC)[reply]
Add constituency numbers to Indian assembly constituency boxes[edit]
I currently have the bot User:C1MM-bot which already adds image maps of assembly constituencies (previously uploaded) to Indian state assembly constituency page infoboxes. I would like to extend this to adding constituency numbers to those pages which don't have them in infoboxes already. Numbers are in the filename of the uploaded images and are from a reliable source, namely the eci.gov.in website. I would also like to add the total number of electors for the constituency with source. The bot is run one state at a time. C1MM (talk) 13:47, 24 June 2024 (UTC)[reply]