Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Page table slow query  
3 comments  




2 Links in common between two pages, with a twist  
12 comments  




3 Redirects with Possibilities  
4 comments  




4 First edit  
11 comments  













Wikipedia:Request a query






Bahasa Indonesia
עברית
Magyar
 

Edit links
 









Project page
Talk
 

















Read
Edit
Add topic
View history
 








Tools
   


Actions  



Read
Edit
Add topic
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


  • WP:QUARRY
  • WP:SQLREQ
  • WP:QUERY
  • This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

    You may also be interested in the following:

    Quarry does not have access to page content, so queries which require checking wikitext cannot be answered with Quarry. However, someone may be able to assist by using Quarry in another way (e.g. checking the table of category links rather than the "Category:" text) or suggest an alternative tool.

    Page table slow query[edit]

    Any way to speed up this page table query?CirrusSearch does it fast, but maybe elasticsearch has its own indexing or something to help make it fast. –Novem Linguae (talk) 18:08, 15 June 2024 (UTC)[reply]

    Yes, CirrusSearch has full-text indices on page titles, while the main db only has normal (prefix) ones.
    Using the page_name_title index speeds this sort of query up considerably, since titles not matching '%.js' can be filtered out there (even though every title still needs to be looked at, the whole row won't need to be fetched). There's no way to force that, though, since we only have access to views. Sometimes you can fool the optimizer into the indexed search if you say which namespaces you want instead of the ones you don't, even if the list is long, something like WHERE page_namespace IN (0,1,3,4,5,6,7,9,10,11,12,13,14,15,100,101,118,119,710,711,828,829), but that doesn't work here. So you're going to have to partition it into several (well, lots of) queries that look small enough that the optimizer uses the index instead of a full table scan - WHERE page_namespace = 0 AND page_title < 'M', WHERE page_namespace =0 AND page_title >= 'M', and so on for the other possible namespaces. —Cryptic 00:38, 22 June 2024 (UTC)[reply]
    An alternate approach: since you know that there won't be many titles ending in '.js' except in userspace and mediawiki:, find the titles yourself by grepping enwiki-20240601-all-titles.gz from a dump. Link to all of them from a page in your userspace, then you can use pagelinks to find them in a query and check page_content_model and page_restrictions. —Cryptic 04:19, 22 June 2024 (UTC)[reply]

    Links in common between two pages, with a twist[edit]

    I realize PetScan can be used to show links that appear on two pages, but I have a need for it to show such a result sorted by the order in which they appear on the first page. Based on my limited understanding of the database, I don't think the order of the links on a page are tracked. But just in case I'm wrong, can someone show me a query to do this? Or suggest an alternative approach? Thanks. Stefen Towers among the rest! GabGruntwerk 23:49, 22 June 2024 (UTC)[reply]

    No, that's not possible; Quarry doesn't contain information about the content of the page.
    Could you give some additional information about what you want this for? There are several alternative approaches that would work, but I need to know a little more about what you want it for. BilledMammal (talk) 23:53, 22 June 2024 (UTC)[reply]
    I thought Quarry knows the links on a page, so that's where I was coming from. Anyway, I have a page that shows links sorted by popularity (views, descending) and another page that shows links to articles having old issues. I'd like an intersection of the links between them, in order of popularity (order of appearance on first page). Stefen Towers among the rest! GabGruntwerk 00:03, 23 June 2024 (UTC)[reply]
    It can see which pages a given page links to (or is linked from), but doesn't have any information on what order it's done in - that would need the page content.
    Is this a one-time query, or will you need it repeated? —Cryptic 00:08, 23 June 2024 (UTC)[reply]
    I'd like for it to be repeatable as the underlying pages will change. I'm fine with having to run it manually. Stefen Towers among the rest! GabGruntwerk 00:10, 23 June 2024 (UTC)[reply]
    (edit conflict) It does, but it doesn't have information beyond that, such as about the text of the page.
    I'm not aware of any tools that can help you with that, but I threw together the information you wanted using a quick and dirty script:

    Extended content

    1. Deion Sanders
    2. Ford Explorer
    3. Maurice Lucas
    4. Josh Hamilton
    5. Fort Knox
    6. Diane Sawyer
    7. Aroldis Chapman
    8. Secretariat (film)
    9. UPS Airlines
    10. Joe Torre
    11. Presbyterian Church (USA)
    12. Damaris Phillips
    13. Jim Beam
    14. Louis Brandeis
    15. Jack McCall
    16. My Morning Jacket
    17. Carlton Fisk
    18. David Pajo
    19. Adam Dunn
    20. Kentucky Colonels
    21. Humana
    22. Earl Weaver
    23. Pope Lick Monster
    24. Meriwether Lewis Clark Jr.
    25. John Marshall Harlan
    26. B. Brian Blair
    27. Frank Ramsey (basketball)
    28. Interstate 71
    29. A. J. Foyt IV
    30. Oldham County, Kentucky
    31. Susanne Zenor
    32. Andy Van Slyke
    33. Harvey Fuqua
    34. Dan Uggla
    35. Homer Bailey
    36. Louisville Cardinals
    37. Aristides (horse)
    38. Playa (band)
    39. Greg Page (boxer)
    40. Terry Pendleton
    41. Kentucky Derby Festival
    42. Louisville Metro Police Department
    43. 5th Cavalry Regiment
    44. Taylor Nichols
    45. David Grissom
    46. Valley of the Drums
    47. Ward Hill Lamon
    48. Jefferson C. Davis
    49. Robert Nardelli
    50. Jim Caldwell (American football)
    51. John Cowan
    52. Mildred J. Hill
    53. Johnny Edwards (musician)
    54. Lance Burton
    55. IWA Mid-South
    56. Mickie Knuckles
    57. Run for the Roses (song)
    58. Taeler Hendrix
    59. Sovereign Grace Churches
    60. Fabian Ver
    61. Tori Hall
    62. Larry Collmus
    63. New Grass Revival
    64. Rebel (bourbon)
    65. 2011 Kentucky Derby
    66. Bullitt County, Kentucky
    67. Catherine McCord
    68. Shelley Duncan
    69. Dan Boyle (ice hockey)
    70. History of Louisville, Kentucky
    71. Rudy Rucker
    72. Big Four Bridge
    73. Optimist International
    74. Larnelle Harris
    75. C. J. Mahaney
    76. Thunder Over Louisville
    77. Belle of Louisville
    78. Bertha Palmer
    79. Sports in Louisville, Kentucky
    80. Gary Matthews Jr.
    81. George Devol
    82. John Yarmuth
    83. Travis Stone
    84. June of 44
    85. Ted Washington
    86. Larry Elmore
    87. Parents Involved in Community Schools v. Seattle School District No. 1
    88. Interstate 64 in Kentucky
    89. Corey Patterson
    90. Stith Thompson
    91. Roman Catholic Archdiocese of Louisville
    92. Louisville Zoo
    93. Boyce Watkins
    94. James Speed
    95. Jefferson County Public Schools (Kentucky)
    BilledMammal (talk) 00:23, 23 June 2024 (UTC)[reply]
    I see you want something that can run repeatedly. I don't have time right now to put something together for you, but if Cryptic doesn't come up with something I'll do it sometime in the next couple of weeks - if I don't, feel free to remind me on my talk page. BilledMammal (talk) 00:24, 23 June 2024 (UTC)[reply]
    Thanks for the list - that's good for a start. Would you mind giving me a few clues on your approach for the script you did? It might snap me into figuring it out. Also, I have thought of using a spreadsheet or text compare software, but was hoping for an on-wiki or otherwise online approach. Stefen Towers among the rest! GabGruntwerk 00:40, 23 June 2024 (UTC)[reply]
    A few Regex operations to get just the links in order, and then a basic python script that works down the first list and if the item exists on the second outputs it. Unfortunately, nothing online ATM. BilledMammal (talk) 00:49, 23 June 2024 (UTC)[reply]
    Thanks for the tips! Stefen Towers among the rest! GabGruntwerk 00:50, 23 June 2024 (UTC)[reply]
    Any pure-sql-against-the-wmf-databases approach would have to start with something similar to either "Make a page in your userspace with redlinks to 1!Tom Cruise, 2!Muhammad Ali, ... 1000!Frank Torre" or "manually edit this stupidly long query that includes all that data" (like how quarry:query/81948 includes the namespace names, but with a thousand items instead of 30). —Cryptic 01:04, 23 June 2024 (UTC)[reply]
    Indeed, those don't seem like tenable approaches. But this discussion has helped me figure out a solution, not optimal but workable:
    1. Copy popular articles wikitext into a text editor, and break down into a flat list using a macro I constructed with regex and other tricks.
    2. Use PetScan to create a flat list of articles with old issues (this didn't have to be in any particular order).
    3. Insert both flat lists into their own column in a spreadsheet, then find matches of first column entries in the second column, then apply a filter of matches, and voila.
    Stefen Towers among the rest! GabGruntwerk 04:18, 23 June 2024 (UTC)[reply]

    Redirects with Possibilities[edit]

    Any way for someone to provide me a list from Category:Redirect-Class Green Bay Packers articles of redirects with possibilities ({{R with possibilities}}). Thank you! « Gonzo fan2007 (talk) @ 19:19, 27 June 2024 (UTC)[reply]

    My first thought was WP:PETSCAN, but it looks like Category:Redirect-Class Green Bay Packers articles is on talk pages, whereas Category:Redirects with possibilities is on article pages. So yeah, will need an SQL query. Will see if I can whip something up. Testing note: 1994 NFC Wild Card playoff game (Detroit–Green Bay) meets the criteria and should appear in the result set. –Novem Linguae (talk) 19:46, 27 June 2024 (UTC)[reply]
    Got it. Here you go. quarry:query/84446. –Novem Linguae (talk) 19:58, 27 June 2024 (UTC)[reply]
    Thank you Novem Linguae! « Gonzo fan2007 (talk) @ 21:07, 28 June 2024 (UTC)[reply]

    First edit[edit]

    In 2013, I heard that about a quarter of newcomers' first edits were to create a page, and three-quarters were to edit an existing page. (My own first edit was to a Talk: page, which was a distinctly unpopular choice.)

    For editors who made their first edit during 2023, is it possible to find out how many of those first edits were page creations vs editing existing pages, and which namespaces they happened in? I imagine a set of numbers like this:

    Namespace New page Existing page
    Article 5339 24209
    Talk 249 3590
    User 1934 781
    User talk 429 9045
    Draft 1930 89

    all adding up to 100% of first edits. WhatamIdoing (talk) 06:06, 29 June 2024 (UTC)[reply]

    Lots of ways for this to go wrong. Consider deleted edits, and that there's no way to reliably tell where a page was when an edit was made to it (unless it's also in the page creation log, but that still means at least half the data is bad). Also no way to tell which of two deleted edits with the same timestamp was actually earlier, but a user making a second edit in the same second as their first is going to be rare enough that we can just pick one. Getting a list of users whose first edit was in 2023 is impractical; a list of users created in 2023 and have made at least one edit is at least close, so that's what I'm doing. quarry:query/84486 should finish in about 15 minutes. It may or may not be right - I haven't seen the results yet - but I've got to get to bed. Will take another look tomorrow. —Cryptic 07:47, 29 June 2024 (UTC)[reply]
    Hm, corrected query finished a lot faster than the obviously-broken ones I ran before (which is usually to be expected, but I didn't think the data would cache that well). Something's not quite right - there shouldn't have that row with the completely blank namespace and 52458 existing-page edits, for starters, and that one creating a page in the MediaWiki namespace looks really suspicious - but otherwise the numbers seem at least plausible. —Cryptic 07:59, 29 June 2024 (UTC)[reply]
    Pretty sure I know why the blank namespace happened, should be fixed when it completes again. The first-edit-was-to-namespace-8 was at MediaWiki:Campaigns-event-discovery-survey-question. —Cryptic 08:15, 29 June 2024 (UTC)[reply]
    If we ran that for, say, last week, would that give us an estimate of how many first edits are being 'lost' or 'misplaced' due to deletion and moving pages? WhatamIdoing (talk) 03:34, 30 June 2024 (UTC)[reply]
    Well, I can run the numbers (quarry:query/84512), but I don't know how much to infer from the comparison with the longer time period. The only edits that are going to be 'lost' are ones that have been revdeleted or oversighted, and in most cases - other than, perhaps for copyvio revdels - that will happen pretty quickly after they're made. I wouldn't care to guess how quickly a typical page gets moved between namespaces after creation, either. —Cryptic 08:17, 30 June 2024 (UTC)[reply]
    Does it only lose revdel edits, and not ordinary/whole page deletion?
    A lot of deletions for WP:UGLY articles happen after draftification, and borderline articles often don't get draftified for a couple of weeks. WhatamIdoing (talk) 01:00, 1 July 2024 (UTC)[reply]
    Just revdelled and suppressed edits. Everything about deleted pages and edits except for edit summaries and page text is in the public database replicas; accounting for those is most of why the query is as complex as it is. —Cryptic 10:52, 2 July 2024 (UTC)[reply]
    So here are a few things I notice:
    • Last week, 4,754 newbies made their first edit on an existing page and 1,842 newbies made their first edit to a new page.
    • Last week, if your first edit was to an existing page, then about 90% of the time, it was to the [i.e., a page most recently in] mainspace.
    • Last week, if your first edit was to create a new page, then about 70% of the time, it was to the User: namespace and about 20% of the time, it was to the Draft: space.
    • Within the first week, about a third of User: space pages and a third of Draft: space pages get deleted.
    • In 2023, 342,087 newbies made their first edit to an existing page and 144,181 newbies made their first edit to a new page.
    • In 2023, if your first edit was to an existing page, then about 90% of the time, it was to the mainspace.
    • In 2023, if your first edit was to create a new page, then about 70% of the time, it was to the User: namespace and about 20% of the time, it was to the Draft: space.
    • Within the last 18 months, looking at pages created on the first edit during 2023, about a third of User: space pages and almost 95% of Draft: space pages were deleted.
      • Conclusion from last week vs 2023: If your User: space page is going to get deleted, it'll happen in the first week. Draft pages are much more likely to get deleted, but it takes longer.
    • Within the last 18 months, only 1.6% of pages created as the account's first-ever edit were moved to the mainspace. 20% of those were also subsequently deleted. One in 75 first-edit page creations are still visible in the mainspace.
      • Actual User: pages probably aren't meant to get moved to the mainspace, and the existence of these pages depresses the overall 'success' numbers. An unknown proportion User sandbox pages probably are meant to move to the mainspace, while others are probably meant to be used for test edits. We don't have a good way to differentiate between these two types of user space contribution.
    WhatamIdoing (talk) 17:17, 2 July 2024 (UTC)[reply]
    Within the last 18 months, looking at pages created on the first edit during 2023, about a third of User: space pages and almost 95% of Draft: space pages were deleted. - this is a shocking statistic but if I understand correctly it can't be taken at face value because the namespace is where the page is now, right? That is, 95% of drafts that were never moved to mainspace are deleted, which is explicable in terms of G13 (I assume the 5% is drafts less than six months old?) – Joe (talk) 20:35, 2 July 2024 (UTC)[reply]
    @Joe Roe (mobile), all of the drafts created in 2023 are more than six months old at this point. There were probably one or two drafts that were a few hours short of being six months old when the query was run, so it's like 0.05%. The 5% that have been retained were probably created later in the year and had an edit made (by anyone) since then. For example, the 5% probably includes articles that were submitted to AFC in December, declined in January, and will be deleted – just not quite yet.
    The namespace listed is the namespace in which either the page exists now (more specifically, at 30 Jun 2024 08:21:10 UTC) or where the page existed at the time that it was deleted (e.g., ordinary CSD or AFD, not revdel or OS/suppression – those latter ones are invisible to the query). WhatamIdoing (talk) 21:14, 2 July 2024 (UTC)[reply]

    Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:Request_a_query&oldid=1232268185"

    Category: 
    Wikipedia requests
     



    This page was last edited on 2 July 2024, at 21:14 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki