Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Syntax  



1.1  Anchors  





1.2  Character classes  





1.3  Tokens  





1.4  Groups  





1.5  Quantifiers  





1.6  Metacharacters and the escape character  





1.7  Back references  





1.8  Look-around  





1.9  Commenting  





1.10  Using captured groups in the replacement string  





1.11  Tokens and groups  





1.12  Greed and quantifiers  







2 Examples  



2.1  Sample patterns  





2.2  Commonly used expressions  







3 Tips and tricks  



3.1  Regex behavior options  



3.1.1  Inline syntax  







3.2  User-made shortcut editing macros  





3.3  Efficiency  







4 References  





5 External links  



5.1  Online regular expressions testing tools  





5.2  Desktop regular expression testing tool  





5.3  Documentation about regular expressions  
















Wikipedia:AutoWikiBrowser/Regular expression






فارسی
Italiano
Svenska
اردو
 

Edit links
 









Project page
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

< Wikipedia:AutoWikiBrowser
(Redirected from Wikipedia:Regex)

  • e
  • Modules
  • Regular expression
  • Sandbox
  • Template redirects
  • Typos
  • Usage stats
  • Userbox
  • Chapters: Core · Database scanner · Find and replace · Regular expressions · General fixes
  • WP:AWBREGEX
  • Aregular expressionorregex is a sequence of characters that define a pattern to be searched for in a text. Each occurrence of the pattern may then be automatically replaced with another string, which may include parts of the identified pattern. AutoWikiBrowser uses the .NET flavor of regex.[1]

    Syntax[edit]

    Anchors[edit]

    Used to anchor the search pattern to certain points in the searched text.

    Syntax Comments
    ^ Start of string Before all other characters on page (or line if multiline option is active)
    (Note that "^" has a different meaning inside a token.)
    \A Start of string Before all other characters on page
    $ End of string After all other characters on page (or line if multiline option is active)
    \Z End of string After all other characters on page
    \b On a word boundary On a letter, number or underscore character
    \B Not on a word boundary Not on a letter, number or underscore character

    Character classes[edit]

    Expressions which match any character in a pre-defined set. This list is not exhaustive.

    Character class Will match
    . "wildcard" Any character except newline
    (Newline is included if singleline option is active; see #Regex behavior options below)
    \w Any "word" character (letters, digits, underscore) abcdefghijklmnopqstuvwxyz​ABCDEFGHIJKLMNOPQRSTUVWXYZ
    0123456789_
    \W Any character other than "word" characters $?!#%*@&;:.,+-±=^"`\|/<>{}[]()~(newline)(tab)(space)
    \s Any whitespace character (space) (tab) (literal new line) (return)
    \S Any character other than white space abcxyz_ABCXYZ$?!#%*@&;:.,+-=^"/<{[(~0123789 (incomplete list)
    \d Any digit 0123456789
    \D Any character other than digits abcxyz_ABCXYZ$?!#%*@&;:.,+-=^"/<{[(~(newline)(tab)(space) (incomplete list)
    \n Newline (newline)
    \p{L} Any Unicode letter[2] AaÃãÂâĂăÄäÅå (incomplete list)
    \p{Ll} Any lowercase Unicode letter aãâăäå (incomplete list)
    \p{Lu} Any uppercase Unicode letter AÃÂĂÄÅ (incomplete list)
    \r Carriage return (carriage return)
    \t Tab (tab)
    \c Control character Ctrl-A through Ctrl-Z (0x01–0x1A)
    \x Any hexadecimal digit 0123456789abcdefABCDEF
    \0 Any octal digit 01234567

    Tokens[edit]

    Tokens match a single character from a specified set or range of characters.

    Tokens Examples
    [...] Set – matches any single character in the brackets [def] matches doreorf
    [^...] Inverse – match any single character except those in the brackets [^abc] – anything (including newline) except aorborc
    [...-...] Range – matches any single character in the specified range
    (including the characters given as the endpoints of the range)
    [a-q] – any lowercase letter between a and q

    [A-Q] – any uppercase letter between A and Q
    [0-7] – any digit between 0 and 7

    Groups[edit]

    Groups match a string of characters (including tokens) in sequence. By default, matches to groups are captured for later reference. Groups may be nested within other groups.

    Syntax Examples
    (...) Capture group – matches the string in parentheses
    (Output captured groups in the replacement string with $1, $2, etc.)
    (abc) matches abc
    (?<name>...) Named capture group
    (for use in back references or the replacement string)
    (?<year>\b\d{4}\b) matches the whole word 2016

    Output the named group using ${year}

    (?:...) Non-capturing parentheses (?:abc) matches and consumes, but doesn't capture, abc
    | Alternation/disjunction (read as "or") (ab|cd|ef) matches aborcdoref

    (ab(cd|ef)) matches abcdorabef

    Quantifiers[edit]

    Quantifiers specify how many of the preceding token or group may be matched.

    Syntax Examples
    * 0 or more b* matches nothing, b, bb, bbb, etc.
    + 1 or more b+ matches b, bb, bbb, etc.
    ? 0 or 1 b? matches nothing, or b
    {3} Exactly 3 b{3} matches bbb
    {3,} 3 or more b{3,} matches bbb, bbbb, etc.
    {2,4} At least 2 and no more than 4 b{2,4} matches bb, bbb, or bbbb

    By default, quantifiers are "greedy", meaning they will match as many characters as possible while still allowing the full expression to find a match. Adding a question mark ("?") after a qualifier will make it non-greedy, meaning it will match as few characters as possible while still allowing the full expression to find a match. See #Greed and quantifiers for examples.

    Metacharacters and the escape character[edit]

    Metacharacters are characters with special meaning in regex; to match these characters literally, they must be "escaped" by being preceded with with the escape character \.

    Escape character Comments
    \ Escape Character Allows metacharacters (listed below) to be matched literally
    Metacharacter Metacharacter escaped  
    ^ \^ Not in this list: =}#!/%&_:; (incomplete list)
    $ \$
    ( \(
    ) \)
    < \<
    . \.
    * \*
    + \+
    ? \?
    [ \[
    ] \]
    { \{
    \ \\
    | \|
    > \>
    - \- Hyphens must be escaped within tokens, where they indicate a range; outside of tokens, they do not need to be escaped.

    Back references[edit]

    Used to match a previously captured group again.

    Syntax Comments
    \1, \2, \3, etc. Match unnamed captured groups in order. (\n[^\n]+)\1 matches identical adjacent lines; $1 will replace with a single copy.
    \k<name> Match named captured group (?<name>...).

    Look-around[edit]

    Used to check what comes before or after, without consuming or capturing. ("Without consuming" means that matches for look-around assertions do not become part of the string to be replaced. In the following examples, only "abc" is consumed.) In .NET regex, all regex syntax can be used within a look-around assertion.

    Syntax Examples
    (?=...) positive lookahead abc(?=xyz) matches abc only if it's followedbyxyz.
    (?!...) negative lookahead abc(?!xyz) matches abc except when it's followedbyxyz
    (?<=...) positive lookbehind (?<=xyz)abc matches abc only if it's precededbyxyz
    (?<!...) negative lookbehind (?<!xyz)abc matches abc except when it's precededbyxyz

    Commenting[edit]

    Comments in the search string do not affect the resulting matches.

    Syntax Comments
    (?#...) comment (?#Just a comment in here)

    Using captured groups in the replacement string[edit]

    Captured groups can be output as part of the replacement string.

    Reference style Example search string Example output
    $# Unnamed capture group (Sam)(Max)(Pete) $2 returns Max
    ${2}0 returns Max0
    ${...} Named capture group (?<foo>ABC)(?<bar>DEF) ${foo} returns ABC

    Tokens and groups[edit]

    Tokens and groups are portions of a regular expression which can be followed by a quantifier to modify the number of consecutive matches. A token is a character, special character, character class, or range (e.g. [m-q]). A group is formed by enclosing tokens or other groups within parentheses. All of these can be modified to match a number of times by a quantifier. For example: a?, \n+, \d{4}, [m-r]*, (a?\n+\d{4}[m-r]*|not){3,7}, and ((?:97[89]-?)?(?:\d[ -]?){9}[\dXx]).

    Greed and quantifiers[edit]

    Greed, in regular expression context, describes the number of characters which will be matched (often also stated as "consumed") by a variable length portion of a regular expression – a token or group followed by a quantifier, which specifies a number (or range of numbers) of tokens. If the portion of the regular expression is "greedy", it will match as many characters as possible. If it is not greedy, it will match as few characters as possible.

    By default, quantifiers in AWB are greedy. To make a quantifier non-greedy, it must be followed by a question mark. For example:

    In this string:

    [[Lorem ipsum]] dolor sit amet, [[consectetur adipisicing]] elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
    

    this expression:

    \[\[.*\]\]
    

    will match [[Lorem ipsum]] dolor sit amet, [[consectetur adipisicing]].

    This expression:

    \[\[.*?\]\]
    

    will match [[Lorem ipsum]] and [[consectetur adipisicing]].

    Be careful with expressions like (\w)(<ref[^<>]*>.*?</ref>)([,.:;]), whose center capture group will span more than one ref group if the outer conditions are met:
    sed do eiusmod tempor<ref>reference</ref> incididunt ut <ref>reference 2</ref>. labore

    Examples[edit]

    Sample patterns[edit]

    Regex pattern Will Match
    ([A-Za-z0-9-]+) One or more letters, numbers or hyphens
    (\d{1,2}\/\d{1,2}\/\d{4}) Any date in dd/mm/yyyy or mm/dd/yyyy format, e.g. 3/24/2008or03/24/2008or24/03/2008
    \[\[\d{4}\]\] Any wiki-linked four-digit number, e.g. [[2008]]
    (Jan(?:uary|\.|)|Feb(?:ruary|\.|)|Mar(?:ch|\.|)| Apr(?:il|\.|)|May\.?|Jun(?:e|\.|)|Jul(?:y|\.|)| Aug(?:ust|\.|)|Sep(?:tember|\.|t\.?|)|Oct(?:ober|\.|)| Nov(?:ember|\.|)|Dec(?:ember|\.|)) Full name or abbreviated month name. (Only the abbreviations are captured.)
    Regular expression examples
    Search for flagicon template and remove
    Find {{\s*?[Ff]lagicon\s*?\|.*?}}
    Replace With (nothing)
    Example of text to search {{flagicon|USA}} [[United States]]
    Result [[United States]]
    Comments
    Search for any of three template parameters and replace the value with some new value
    Find (?<=\|\s*(occupation|spouse|notableworks)\s*=\s*)[^\|}]+(?=\s*(\||}}))
    Replace With new value
    Example of text to search {{infobox person|name=Steveo|occupation=dancer|nationality=The moon}}
    Result {{infobox person|name=Steveo|occupation=new value|nationality=The moon}}
    Comments

    Commonly used expressions[edit]

    Match inside <ref></ref>
    Regex: <ref[^>]*>([^<]|<[^/]|</[^r]|</r[^e]|</re[^f]|</ref[^>])+</ref>
    
    Match inside <ref></ref> using a (?! not match) notation
    Regex: <ref[^>]*>([^<]|<(?!/ref>))+</ref>
    
    Match template {{...}} possibly with templates inside it, but no templates inside those
    Regex: {{([^{]|{[^{]|{{[^{}]+}})+}}
    
    Match words and spaces
    Regex: [\w\s]+
    
    Match bracketed URLs
    Regex: \[(https?://[^\]\[<>\s"]+) *((?<= )[^\n\]]*|)\]
    

    Tips and tricks[edit]

    Regex behavior options[edit]

    Regex offers several options to change the default behavior.[3] Five of these options can be controlled with inline expressions, as described below. Four of these options can also be applied to the entire search pattern with check boxes in the AWB "Find-and-replace" tools. By default, all options are off.

    Option Inline flag Check box available Effect
    IgnoreCase i Yes Specifies case-insensitive matching (upper and lowercase letters are treated the same).
    SingleLine s Yes Treats the searched text as a single line, by allowing (.) to match newlines (\n), which it otherwise does not.
    MultiLine m Yes Changes the meaning of the (^) and ($) anchors to match the beginning and end, respectively, of any line, rather than just the start and end of the whole string.
    ExplicitCapture n Yes Specifies that only groups that are named or numbered (e.g. with the form (?<name>)) will be captured.
    IgnorePatternWhitespace x No Causes whitespace characters (spaces, tabs, and newlines) in the pattern to be ignored, so that they can be used to keep the pattern visually organized.[a]
    1. ^ To match whitespace characters while the IgnorePatternWhitespace option is enabled, they must be identified with character classes, i.e. \s (whitespace), \n (newline), or \t (tab). (To match only a space, but not a tab or newline, use the pattern \p{Zs}.)

    Inline syntax[edit]

    The options statement (?flags-flags) turns the options given by "flags" on (or off, for any flags preceded by a minus sign) from the point where the statement appears to the end of the pattern, or to the point where a given option is cancelled by another options statement. For example:

    (?im-s)    #Turn ON IgnoreCase (i) and MultiLine (m) options, and turn OFF SingleLine (s) option, from here to the end of the pattern or until cancelled
    

    Alternatively, the syntax (?flags-flags:pattern) applies the specified options only to the part of the pattern appearing inside the parentheses:

    (?x:pattern1)pattern2    #Apply the IgnorePatternWhitespace (x) option to pattern1, but not to pattern2
    

    User-made shortcut editing macros[edit]

    You can make your own shortcut editing macros. When you edit a page, you can enter your short-cut macro keys into the page anywhere you want AWB to act upon them.

    For example, you are examining a page in the AWB edit box. You see numerous items like adding {{fact}}, inserting line breaks <br />, commenting out entire lines <!--comment-->, inserting state names, <ref>Insert footnote text here</ref>, insert Level 2,3,or even 4 headlines, etc... This can all be done by creating your short-cut macro keys.

    1. Create a rule. See Find and replace, Advanced settings.
    2. Edit your page in the edit box. Insert your short-cut editing macro key(s) anywhere in the page you want AWB to make the change(s) for you.
    3. Re-parse the page. Right click on the edit box and select Re-parse from the context pop up menu. AWB will then re-examine your page with your macro short-cut key(s), find your short-cut key(s) and perform the action you specified in the rule.

    Naming a short-cut macro key can be any name. But it is best to try and make it unique so that it will not interfere with any other process that AWB may find and suggest. For that reason using /// followed by a set of lowercase characters that you can easily remember is best (lowercase is used so that you do not have to use the shift key). You can then enter these short-cut macros keys you create into the page manually or by using the edit box context menu paste more function. The reason why we use three '/' is so that AWB will not confuse web addresses/url's in a page when re-parsing.

    Examples:

    Create a rule as a regular expression.

    User made short-cut editing macros
    ///col  Comment out entire line
    Short-cut key: ///col
    Name Comment out entire line
    Find ///col(.*)
    Replace With <!--$1-->
    Example before reparsing ///colThe quick brown fox jumps over the lazy dog
    Result after re-parsing <!--The quick brown fox jumps over the lazy dog-->
    Comments
    ///fac  Insert {{citation needed}} with current date
    Short-cut key ///fac
    Name Insert {{citation needed}} with current date
    Find ///fac
    Replace With {{citation needed|date={{subst:CURRENTMONTHNAME}} {{subst:CURRENTYEAR}}}}
    Example before reparsing The quick brown fox jumps over the lazy dog///fac
    Result after re-parsing The quick brown fox jumps over the lazy dog[citation needed]
    Comments

    Efficiency[edit]

    Efficiency is how long the regex engine takes to find matches, which is a function of how many characters the engine has to read, including backtracking. Complex regular expressions can often be constructed in several different ways, all with the same outputs but with greatly varying efficiency. If AWB is taking a long time to generate results because of a regex rule:

    References[edit]

    1. ^ adegeo (18 June 2022). "Regular Expression Language - Quick Reference". learn.microsoft.com. Archived from the original on 2023-02-05. Retrieved 2023-02-05.
  • ^ "Regex Tutorial – Unicode Characters and Properties". www.regular-expressions.info. Archived from the original on 19 December 2022. Retrieved 3 January 2023.
  • ^ adegeo (29 June 2022). "Options for regular expression". learn.microsoft.com. Archived from the original on 2023-02-05. Retrieved 2023-02-05.
  • External links[edit]

    Online regular expressions testing tools[edit]

    Desktop regular expression testing tool[edit]

    Documentation about regular expressions[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Wikipedia:AutoWikiBrowser/Regular_expression&oldid=1213225682"

    Category: 
    AutoWikiBrowser
     



    This page was last edited on 11 March 2024, at 19:48 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki