Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 IUCN API  





2 taxobox updates  



2.1  IUCN status  





2.2  IUCN status system  





2.3  IUCN status reference  







3 {{cite IUCN}} template updates  





4 plain-text citation updates  





5 duplicate citations  





6 ancillary tasks  





7 abandoned edits  





8 edit summaries  





9 script  














User:Monkbot/task 19: cite iucn update

















User page
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
User contributions
User logs
View user groups
Upload file
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 

< User:Monkbot
(Redirected from User:Monkbot/task 19)

Task 19 was originally conceived to update, from the IUCN Red List API, the 13,000 or so articles that use {{cite IUCN}} where |url= holds an old-form IUCN url. These articles are listed in Category:cite IUCN maint (1,410).

There are several old-form urls (not all of these work):

Old-form urls are considered 'old-form' because (when they work) they always point to the current assessment.

Most of these old-form urls are used in {{cite IUCN}} templates that are found in the |status_ref= parameter of {{speciesbox}} and {{taxobox}} templates (collectively hereafter 'taxobox') to support the values in the taxobox |status= and |status_system= parameters. Because values for |status= (IUCN uses the term 'category') and for |status_system= can be extracted or derived from the results of an additional IUCN API call, task 19 was expanded to support updating these taxobox parameters.

IUCN API

[edit]

This task is generally slow. IUCN do not want anyone or anything hammering away at their API as fast as possible so task 19's calls to the IUCN API are spaced about 3 seconds apart. To accomplish this, the AWB Bots→Auto save→Delay setting is 3 seconds. This prevents task 19 from making edits that require only a single IUCN API call too quickly. For edits that require multiple IUCN API calls, task 19 imposes a 3-second pause before executing each IUCN API call after the first one.

IUCN API calls require a token. While the code for this task is published, the task's token is not. Anyone considering reuse of this code must obtain their own token; do not use the publicly available demo token.

Task 19 fetches data from the IUCN API in four forms; two of species data and two of species citations. These examples are for Anthus roseatus (the name) and 22718564 (the taxon id). The IUCN API returns for Anthus roseatus (name) and 22718564 (taxon id) are:

name:
{"name":"Anthus roseatus","result":[{"taxonid":22718564,"scientific_name":"Anthus roseatus","kingdom":"ANIMALIA","phylum":"CHORDATA","class":"AVES","order":"PASSERIFORMES","family":"MOTACILLIDAE","genus":"Anthus","main_common_name":"Rosy Pipit","authority":"Blyth, 1847","published_year":2019,"assessment_date":"2019-06-13","category":"LC","criteria":null,"population_trend":"Stable","marine_system":false,"freshwater_system":true,"terrestrial_system":true,"assessor":"BirdLife International","reviewer":"Smith, D.","aoo_km2":null,"eoo_km2":"3530000","elevation_upper":5000,"elevation_lower":2700,"depth_upper":null,"depth_lower":null,"errata_flag":null,"errata_reason":null,"amended_flag":null,"amended_reason":null}]}
taxon id:
{"name":"22718564","result":[{"taxonid":22718564,"scientific_name":"Anthus roseatus","kingdom":"ANIMALIA","phylum":"CHORDATA","class":"AVES","order":"PASSERIFORMES","family":"MOTACILLIDAE","genus":"Anthus","main_common_name":"Rosy Pipit","authority":"Blyth, 1847","published_year":2019,"assessment_date":"2019-06-13","category":"LC","criteria":null,"population_trend":"Stable","marine_system":false,"freshwater_system":true,"terrestrial_system":true,"assessor":"BirdLife International","reviewer":"Smith, D.","aoo_km2":null,"eoo_km2":"3530000","elevation_upper":5000,"elevation_lower":2700,"depth_upper":null,"depth_lower":null,"errata_flag":null,"errata_reason":null,"amended_flag":null,"amended_reason":null}]}

The citation data returns are:

name:
{"name":"Anthus roseatus","result":[{"citation":"BirdLife International 2019. Anthus roseatus. The IUCN Red List of Threatened Species 2019: e.T22718564A152671411. https://dx.doi.org/10.2305/IUCN.UK.2019-3.RLTS.T22718564A152671411.en .Downloaded on 21 September 2021"}]}
taxon id:
{"name":"22718564","result":[{"citation":"BirdLife International 2019. Anthus roseatus. The IUCN Red List of Threatened Species 2019: e.T22718564A152671411. https://dx.doi.org/10.2305/IUCN.UK.2019-3.RLTS.T22718564A152671411.en .Downloaded on 21 September 2021"}]}

taxobox updates

[edit]

Task 19 confirms, updates, or adds taxobox parameters |status=, |status_system=, and |status_ref= using data extracted from the IUCN API. The IUCN API data are fetched using a binomial species name; task 19 does not attempt to fetch IUCN API data using the taxon id found in any existing IUCN references in the taxobox. For taxobox updates, task 19 attempts to get the binomial from various taxobox parameters:

when the taxobox has none of the above parameters, task 19 will use the article title in the IUCN API call.

Task 19 does not confirm, update, or add |status=, |status_system=, and |status_ref= when:

{{speciesbox}} parameters |status2=, |status2_system=, and |status2_ref= are not handled in the same way as their non-enumerated counterparts. This is because there are relatively few instances of the enumerated forms (~25 according to this search 2021-09-20). |status2_ref= may be updated by subsequent task 19 processes but |status2= and |status2_system= will not be.

{{automatic taxobox}} and {{subspeciesbox}} support |status=, |status_system=, and |status_ref= but task 19 does not attempt to update these parameters as a group because the use of these parameters in those templates is comparatively rare and because species names upon which task 19 depends are inconsistent in comparison to {{speciesbox}} and {{taxobox}}. Task 19 may choose to update the content of |status_ref= in these templates if the parameter uses an old-form url or is a plain-text citation but will not attempt to update |status= and |status_system= nor will it remove duplicate |status_ref= references.

IUCN status

[edit]

From the IUCN API call for species data using the binomial, task 19 extracts the category value and the assessment_date value. The species IUCN status is confirmed when |status= has the same value as the category returned from the IUCN API. When they are different, task 19 updates |status= to the value from the IUCN API. When |status= is missing (because it was never there or because an empty parameter was deleted) task 19 updates |status= or adds a new |status= at the end of the taxobox. Updates, confirmation, and additions are noted in the edit summary.

IUCN status displayed on an IUCNredlist web page may be different from the category returned from the IUCN API – task 19 uses the IUCN API's category; cf. (as of 2021-09-22):

IUCN status system

[edit]

To update or add a taxobox |status_system= parameter, task 19 extracts the year portion from the IUCN API's assessment_date value. If the assessment year is 2000 or earlier, task 19 sets |status_system=IUCN2.3 otherwise |status_system=IUCN3.1. The threshold date is taken from Wikipedia:Conservation status. When |status_system= is missing, task 19 adds a new parameter at the end of the taxobox. Updates and additions are noted in the edit summary, confirmations are not.

IUCN status reference

[edit]

To update or add |status_ref=, task 19 inspects the parameter value for a date that task 19 would have written (<ref name="iucn status date">...</ref>) or the existing citation's |access-date= (in that order). When a date can be extracted from one of these, it is compared to the current date. Task 19 will attempt to update |status_ref= only when the difference between the current date and the reference date is greater than six months or when no date can be extracted. This six-month limit was arbitrarily chosen on the presumption that IUCN updates their database twice a year.

Task 19 will not update templated citations in |status_ref= if the citation has one of:

Similarly, task 19 will not update plain-text citations in |status_ref= if the citation has one of:

This because the IUCN API does not provide the <year> of amendment or errata.

When the six month limit is met, and when the citation in |status_ref= does not hold the amended or errata parameters or strings, task 19 then inspects the associated reference tag:

  1. <ref> – unnamed reference;
    • replaces the value assigned to |status_ref= with <ref name="iucn status date"><new {{cite IUCN}} from IUCN API></ref>
    where dateinname="iucn status date" is a copy of the value assigned to the new {{cite IUCN}} template's |access-date= parameter
  2. <ref name=name> – named reference:
    • replaces that reference with <ref name="iucn status date"><new {{cite IUCN}} from IUCN API></ref>
    • replaces all instances of <ref name=name /> with <ref name="iucn status date" />
    where dateinname="iucn status date" is a copy of the value assigned to the new {{cite IUCN}} template's |access-date= parameter
  3. <ref name=name /> – named self-closed reference:
    • swaps the self-closed reference tag with the reference definition
    • replaces the citation as described in 2
    • if the definition was (and now the self-closed ref tag is) inside {{reflist|refs=}} then the self-closed ref tag is deleted

{{cite IUCN}} template updates

[edit]

For {{cite IUCN}} templates that have old-form urls, task 19 extracts the taxon id from the url and attempts to fetch citation data from the IUCN API using the taxon id. If the IUCN API does not recognize the taxon ID, task 19 will attempt to get a citation from the API by using the value assigned to |title= in the {{cite IUCN}} template. When successful, task 19 replaces the old {{cite IUCN}} template with a new {{cite IUCN}} template that has parameter values from the IUCN API citation.

When the taxon/assessment ids in a new {{cite IUCN}} template's |page= and |doi= parameters are not the same, the citation is not updated because {{cite IUCN}} will emit a |doi= / |url= mismatch error message. The mismatch is usually (usually) an indication that the assessment has errata. The citation rendered on an IUCN species web page indicates the errata year but, at the time of this writing, that value is not available in the citation returned from the IUCN API. IUCN have been notified of this discrepancy.

plain-text citation updates

[edit]

For the purposes of this task, plain-text references are untemplated IUCN references inside named or unnamed <ref>...</ref> tags or IUCN references as a line item in an unordered list (* markup). Task 19 will update plain-text references when it can extract a taxon id from an IUCN page identifier (e.T###A###), from an IUCN doi (as a doi inside {{doi}} or as a url), or from an IUCN url.

duplicate citations

[edit]

Task 19 will replace named and unnamed references that hold {{cite IUCN}} templates that match {{cite IUCN}}in|status_ref= with <ref name="iucn status date" /> tags. <ref name=name /> associated with named references that hold {{cite IUCN}} templates that match {{cite IUCN}}in|status_ref= are replaced with <ref name="iucn status date" /> tags.

Duplicate references that wholly make up an entry in an unordered list are deleted as redundant.

Task 19 does not remove any other references.

ancillary tasks

[edit]

Task 19 may update a {{IUCN status}} template's status value in its first positional parameter ({{{1|}}}) from the IUCN API when {{IUCN status}} has a valid taxon id as its second positional parameter ({{{2|}}}).

As with all other monkbot tasks, task 19 does not run with AWB general fixes turned on.

abandoned edits

[edit]

Task 19 will abandon edits when:

edit summaries

[edit]

Task 19 emits terse edit summaries. An edit summary is a concatenation of one or more of these message fragments:

script

[edit]
/*
use the iucn api to fetch IUCN categories to update {{taxobox}} and {{speciesbox}} |status= and status_system=
parameters

use the iucn api to fetch assessment citations to update {{taxobox}} and {{speciesbox}} |status_ref= parameters
with current {{cite IUCN}} templates

use the iucn api to fetch assessment citations to update {{cite IUCN}} templates with old-form urls

use the iucn api to fetch IUCN categories to update second positional parameter in {{IUCN status}} templates

source categories:
  Category:Taxonomy articles created by Polbot
  Category:cite IUCN maint

source searches:
  insource:/Downloaded on [0-3][0-9] +[JFMASOND][a-z]+ +[0-9]{4}/
  hastemplate:"cite IUCN" -incategory:"Taxobox binomials not recognized by IUCN" -insource:/iucn status [0-9]+[^0-9]+2021/
*/


//---------------------------< P R O C E S S A R T I C L E >--------------------------------------------------
//
//
//

List<string> error_log_list = new List<string>();


public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
 {
 Skip = false;          // assume that something will be changed

              // these use redirect to User:Monkbot/task 19: cite IUCN update
// Summary = "[[User:Monkbot/task 19|Task 19]] (manual dev test): convert/update IUCN references to {{[[Template:cite IUCN|cite IUCN]]}} using data from [[IUCN Red List]] [[API]];";
// Summary = "[[User:Monkbot/task 19|Task 19]] (BRFA trial): convert/update IUCN references to {{[[Template:cite IUCN|cite IUCN]]}} using data from [[IUCN Red List]] [[API]];";
 Summary = "[[User:Monkbot/task 19|Task 19]]: convert/update IUCN references to {{[[Template:cite IUCN|cite IUCN]]}} using data from [[IUCN Red List]] [[API]];";

 int  template_modified_count = 0;    // number of cite IUCN templates that were modified from the iucn api
 int  other_template_modified_count = 0;   // number of cite journal/web templates that were converted to {{cite IUCN}}

              // reset these static counters
 plain_text_modified_count = 0;      // number of plain-text citations that were modified from the iucn api
 plain_text_count = 0;        // total number of plain-text iucn references
 api_call_count = 0;         // number of api calls made
 api_fetch_fail_count = 0;       // number of api fetches that failed
 api_no_cite_return_count = 0;      // number of times that the api returned a non-citation value

 api_no_species_return_name_count = 0;    // number of times that the api returned a non-species value (species binomial)
 api_no_species_return_id_count = 0;     // number of times that the api returned a non-species value (species id for {{IUCN status}})
 iucn_status_confirmed_count = 0;     // number of times that we confirmed the iucn status in taxobox-like templates
 iucn_status_updated_count = 0;      // number of times that we updated the iucn status in taxobox-like templates
 iucn_status_system_updated_count = 0;    // number of times that we updated the iucn status system in taxobox-like templates
 iucn_template_count = 0;       // total number of cite IUCN templates
 other_template_count = 0;       // total number of cite journal/web templates


 parse_fail_count = 0;        // number of times that we couldn't parse the api return
 page_doi_skip_count = 0;       // number of templates or plain-text references skipped because page and doi assessment ID mismatch

 status_added = false;        // set to true when |status= created
 status_system_added = false;      // set to true when |status_system created
 status_ref_added = false;       // set to true when |status_ref= created
 status_ref_updated = false;       // set to true when |status_ref= updated
 status_ref_current = false;       // set to true when |status_ref= less than 6 months old
 duplicates_removed_count = 0;      // number of duplicate status references removed

 taxobox_blank = null;        // gets blank taxobox as flag
 unrecognized_species_name = null;     // gets taxobox species name that IUCN doesn't recognize


 System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();  // set up a stopwatch
 stopwatch.Start();                 // and start it

 if (Regex.Match (ArticleText, @"\{\{\s*#tag:ref").Success)
  {
  Summary = "Article uses {{#tag:ref}} parser function(s)";
  error_log_add ("Article uses " + code_nowiki("{{#tag:ref}}") + " parser function(s)");  // add error message to list
  log_errors (ArticleTitle, error_log_list);          // dump list to the log file
  Skip = true;
  return ArticleText;
  }

 if (Regex.Match (ArticleText, @"\{\{\s*[Rr]\s*\|").Success)
  {
  Summary = "Article has {{r}} template(s)";
  error_log_add ("Article has " + code_nowiki("{{r}}") + " template(s)");   // add error message to list
  log_errors (ArticleTitle, error_log_list);          // dump list to the log file
  Skip = true;
  return ArticleText;
  }

 if (null == api_token)
  {
  System.IO.StreamReader sr = new System.IO.StreamReader (iucn_api_token_file); // open the api token file for reading
  api_token = "?token=" + sr.ReadLine();           // read the token (must be the only thing in the file)
  sr.Close();                  // close and done
  }

 if (null == api_token)                // but just in case
  {
  Summary = "Failed to read: " + iucn_api_token_file;        // announce failure
  error_log_add ("Failed to read: " + iucn_api_token_file);      // add error message to list
  log_errors (ArticleTitle, error_log_list);          // dump list to the log file
  Skip = true;
  return ArticleText;
  }

 ArticleText = Regex.Replace (ArticleText, @"[\r\n]+\[\[Category:Taxobox binomials not recognized by IUCN\]\][^\r\n]*", "");  // remove if present; will be restored if necessary


//---------------------------< T A X O B O X >----------------------------------------------------------------
//
// <taxobox> holds the content of {{taxobox}} or {{Speciesbox}} and then is modified by taxobox_update().  The
// source template in <ArticleText> is replaced with an empty skeleton ('{{taxobox}}' or '{{Speciesbox}}' but
// without contents.  At the end, this skeleton is replaced with the modified taxobox held in <taxobox>.
//
// The reason for this round-about is to prevent other portions of this script from evaluating and tallying
// the reference in |status_ref=.  Also permits easy replacement of references that duplicate the reference in
// |status_ref=.
//

 ArticleText = Regex.Replace (ArticleText, hide_non_ref_tag_pattern, hide_non_ref_replace_val);

 ArticleText = hide (ArticleText, HIDE_ALL_BUT_TAXOBOX);        // hide all templates except taxobox-like templates
 ArticleText = hide (ArticleText, HIDE_ALL_BUT_TAXOBOX);        // hide all templates except taxobox-like templates
//if (1 == 1) return ArticleText;
 string taxobox = taxobox_get (ArticleText);
 taxobox_status_ref = null;               // reset the 'new' value for |status_ref; used at the end to remove duplicates
 taxobox_status_ref_open_tag = null;             // its matching ref open tag
 taxobox_status_ref_sc_tag = null;             // and its matching self-closed tag

 taxobox_update (ref taxobox, ref ArticleText, ArticleTitle);      // update the taxobox |status=, |status_system=, and |status_ref=

 ArticleText = unhide (ArticleText);


//---------------------------< C I T E   I U C N   U P D A T E S >--------------------------------------------
//
// this segment updates {{cite IUCN}} templates that have old-form urls.  There are a variety of old-form urls
// but the most common indicator is the taxon id followed by a zero (0) for the assessment id.  This section
// fetches the current citation from the IUCN API using the taxon id (preferred) or the using the 'name' in |title=.
// The 'name' in |title= is presumed to be an italicized binomial
//
// {{cite IUCN}} templates with |ref= holding any value retain the parameter so that {{sfn}} or {{harv}} links
// aren't broken.  Any replacement citation that does not use |ref= may have a different author list from the
// 'original' so, when the underlying {{cite journal}} creates a CITEREF id for the new name list, the {{sfn}}
// or {{harv}} links will be broken ...
//
// does not update references in the taxobox (|status_ref= handled above); example: [[Picea abies]]
//

 ArticleText = hide (ArticleText, IS_CITE_IUCN);         // hide all templates except cite IUCN templates

 if (Regex.Match (ArticleText, iucn_template_pattern).Success)
  ArticleText = Regex.Replace (ArticleText, iucn_template_pattern,
   delegate(Match match)
    {
    string template = match.Groups[0].Value;       // this will be returned if no changes
    string ref_param = null;

    iucn_template_count++;            // bump total number of cite IUCN templates tally

    string id = taxon_id_from_old_form_url_get (template);

    if (null == id)              // not an old-form-url template so ignore it
     return template;

    if (Regex.Match (template, @"__P1P3__\s*(?:errata|amends)\s*=\s*\d{4}").Success)
     {
     error_log_add ("[cite IUCN update]: template has |errata= or |amends= parameter (id: " + id + ")");
     return template;
     }

    string name = null;
    if (Regex.Match (template, iucn_title).Success)
     {
     name = Regex.Match (template, iucn_title).Groups[1].Value.Trim();
     name = species_name_cleanup (name);        // remove markup, extinction markers, disambiguation, etc
     }

    string api_url_id = api_id_url + id + api_token;     // build the url from its various parts
    string api_url_name = api_name_url + name + api_token;    // build the url from its various parts

    string cite_iucn = cite_iucn_get (api_url_id, api_url_name, ArticleTitle, id, name);
    if (null == cite_iucn)
     return template;

    template = Regex.Replace (template, ref_param_empty, "$1");   // remove empty |ref= parameters from template

    if (Regex.Match (template, ref_param_not_empty).Success)   // if this template has |ref=<something>
     ref_param = Regex.Match (template, ref_param_not_empty).Groups[1].Value.Trim(); // get its assigned value

    if (null != ref_param)
     cite_iucn = Regex.Replace (cite_iucn, @"(\}\})", " |ref=" + ref_param + "$1"); // add the preexisting |ref= param

    template_modified_count++;
    return cite_iucn;
    });

 ArticleText = unhide (ArticleText);            // unhide all that is hidden


//---------------------------< C I T E   J O U R N A L / W E B   U P D A T E S >------------------------------
//
// this segment updates {{cite journal}} abd {{cite web}} templates that have iucn urls, or pages or dois.  This
// section fetches the current citation from the IUCN API using the taxon id (preferred) or the using the 'name'
// in |title=.  The 'name' in |title= is presumed to be an italicized binomial
//
// {{cite journal}} and {{cite web}} templates with |ref= holding any value retain the parameter so that {{sfn}}
// or {{harv}} links aren't broken.  Any replacement {{cite IUCN}} that does not use |ref= may have a different
// author list from the 'original' so, when the underlying {{cite journal}} creates a CITEREF id for the new name
// list, the {{sfn}} or {{harv}} links will be broken ...
//
// does not update references in the taxobox (|status_ref= handled above)
//

 ArticleText = hide (ArticleText, IS_CITE_OTHER);       // hide all templates except cite journal and cite web templates

 if (Regex.Match (ArticleText, other_template_pattern).Success)
  ArticleText = Regex.Replace (ArticleText, other_template_pattern,
   delegate(Match match)
    {
    string template = match.Groups[0].Value;       // this will be returned if no changes
    string ref_param = null;

    other_template_count++;            // bump total number of cite journal/web templates tally

    string id = plain_text_taxon_id_get (template);     // attempt to get taxon id from page -> doi -> url

    if (null == id)              // not an 'iucn' template so ignore it
     return template;

// cite journal and cite web don't support |errata= or |amends=
//    if (Regex.Match (template, @"__P1P3__\s*(?:errata|amends)\s*=\s*\d{4}").Success)
//     {
//     error_log_add ("[cite IUCN update]: template has |errata= or |amends= parameter (id: " + id + ")");
//     return template;
//     }

    string name = null;
    if (Regex.Match (template, iucn_title).Success)      // get value assigned to |title=
     {
     name = Regex.Match (template, iucn_title).Groups[1].Value.Trim();
     name = species_name_cleanup (name);        // remove markup, extinction markers, disambiguation, etc
     }

    string api_url_id = api_id_url + id + api_token;     // build the api url from its various parts
    string api_url_name = api_name_url + name + api_token;    // build the api url from its various parts

    string cite_iucn = cite_iucn_get (api_url_id, api_url_name, ArticleTitle, id, name);
    if (null == cite_iucn)
     return template;

    template = Regex.Replace (template, ref_param_empty, "$1");   // remove empty |ref= parameters from template

    if (Regex.Match (template, ref_param_not_empty).Success)   // if this template has |ref=<something>
     ref_param = Regex.Match (template, ref_param_not_empty).Groups[1].Value.Trim(); // get its assigned value

    if (null != ref_param)
     cite_iucn = Regex.Replace (cite_iucn, @"(\}\})", " |ref=" + ref_param + "$1"); // add the preexisting |ref= param

    other_template_modified_count++;
    return cite_iucn;
    });

 ArticleText = unhide (ArticleText);            // unhide all that is hidden


//---------------------------< P L A I N _ T E X T _ R E F _ U P D A T E >------------------------------------
//
// update plain-text references first in ArticleText and then in the taxobox
//

 ArticleText = plain_text_ref_update (ArticleText, ArticleTitle);
                     // all of these create or rely on <ref iucn status <'date'>>{{cite IUCN}}
 if ((status_added || (0 != iucn_status_confirmed_count) || (0 != iucn_status_updated_count)) && (status_ref_added || status_ref_updated || status_ref_current))
  taxobox = plain_text_ref_update (taxobox, ArticleTitle);     // do not update plain-text references in taxobox because |status_ref= might be plain text


//---------------------------< I U C N   P L A I N - T E X T   B I B L I O G R A P H Y   U P D A T E >--------
//
// this is the plain-text form API id only.  Plain-text references in bibliographies must be in unordered list
// markup \n*...\n
//
// known issues:
//  because this attempts to locate 'correct' plain-text citations and because any non-template and non-
//  wikilink text is plain text, plain text that is part of the unordered list item that is not part of the
//  actual IUCN citation will be treated as part of the citation and will be replaced with the {{cite IUCN}}
//  template if the API returns a citation for the taxon id.
//

 if (Regex.Match (ArticleText, plain_text_bib_pattern).Success)     // must have the form \n*plain text\n must be constrained because article is plain text
  ArticleText = Regex.Replace (ArticleText, plain_text_bib_pattern,
   delegate(Match match)
    {
    string plain_text = match.Groups[0].Value;       // this will be returned if no changes

    string taxon_id = plain_text_taxon_id_get (plain_text);   // attempt to get taxon id
    if (null == taxon_id)
     return plain_text;            // no taxon id so abandon

    if (is_plain_text_rejected (plain_text))       // returns true when plain_text is rejected
     return plain_text;

    string ref_open = match.Groups[1].Value;       // the opening \n*
    string ref_close = match.Groups[3].Value;       // the closing \n tag

    plain_text_count++;             // bump total number of plain-text references found

    string api_url = api_id_url + taxon_id + api_token;     // build the url from its various parts
    string cite_iucn = cite_iucn_get (api_url, null, ArticleTitle, taxon_id, null); // go build a {{cite IUCN}} template from the api

    if (null == cite_iucn)
     return plain_text;            // template build failed

    plain_text_modified_count++;
    return ref_open + cite_iucn + ref_close;
    });


//---------------------------< I U C N   S T A T U S   T E M P L A T E >--------------------------------------
//
// Update status in {{IUCN status|<status>|<taxon id>|<options>}}
//

 if (Regex.Match (ArticleText, iucn_status_template_pattern).Success)
  ArticleText = Regex.Replace (ArticleText, iucn_status_template_pattern,
   delegate(Match match)
    {
    string template = match.Groups[0].Value;       // if no change, return this
    string status = null;
    string id = null;

    if (Regex.Match (template, iucn_status_status).Success)
     status = Regex.Match (template, iucn_status_status).Groups[2].Value;
    else
     return template;

    if (Regex.Match (template, iucn_status_id).Success)
     id = Regex.Match (template, iucn_status_id).Groups[2].Value;
    else
     return template;

    string species_from_api;           // species data from the API will go here
    string api_url = api_species_id_url + id + api_token;    // build the url from its various parts

    species_from_api = api_fetch (api_url, ArticleTitle);    // fetch species data from the IUCN API

    if (null == species_from_api)          // if api_fetch() failed
     return template;

    string status_from_api = null;
    if (Regex.Match (species_from_api, status_from_api_pattern).Success)
     status_from_api = Regex.Match (species_from_api, status_from_api_pattern).Groups[1].Value;
    else
     {
     error_log_add ("[iucn status template]: API did not return species data: " + code_nowiki (species_from_api));
     api_no_species_return_id_count++;
     return template;
     }

    if (status == status_from_api)          // if status same as api status
     iucn_status_confirmed_count++;         // bump the confirmed count and done
    else
     {
     template = Regex.Replace (template, iucn_status_lead + status, "$1" + status_from_api); // update
     iucn_status_updated_count++;         // bump the updated count
     }

    return template;
    });


//--------------------------- R E M O V E   D U P L I C A T E   S T A T U S   R E F >-------------------------
//
// convert |status_ref= {{cite IUCN}} template into a regex to find duplicates of itself in ArticleText and
// then replace any duplicates with the |status_ref= self-closed tag from |status_ref=
//
// replaces duplicates in taxobox only after hiding the |status_ref= definition so that we don't lose the definition
//
// problem: if the duplicate is named and is the definition for other self-closed ref tags, all of those tags
// need to be renamed ... argh example: [[Bellamya trochlearis]], [[Catarina pupfish]]
//

 if ((null != taxobox_status_ref) && (null != taxobox_status_ref_sc_tag))
  {
  string taxobox_status_ref_pattern = taxobox_status_ref;

  foreach (string symbol in symbols)
   taxobox_status_ref_pattern = Regex.Replace (taxobox_status_ref_pattern, symbol, symbol);   // convert taxobox_status_ref to a regex pattern

                            // references in unordered lists always ok to replace
  ArticleText = counted_replace (ArticleText, bib_open_ul + taxobox_status_ref_pattern + bib_close_ul, "$1", ref duplicates_removed_count);

                            // references with unnamed <ref> tags always ok to replace
  ArticleText = counted_replace (ArticleText, ref_open_tag_unnamed + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count);
  taxobox = counted_replace (taxobox, ref_open_tag_unnamed + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count);

  taxobox = hide_taxobox_status_ref (taxobox, taxobox_status_ref_open_tag, taxobox_status_ref_pattern); // hide |status_ref= {{cite IUCN}} template so we don't replace it with sc tag

  named_status_ref_dup_remove (ref ArticleText, ref taxobox, taxobox_status_ref_pattern, taxobox_status_ref_sc_tag); // remove duplicates

                          // remove sequential instances of taxobox_status_ref_open_tag_sc TODO: this could be improved
  string taxobox_status_ref_open_tag_sc = Regex.Replace (taxobox_status_ref_open_tag, @"([^\>]+)\>", "$1 />");

  taxobox = Regex.Replace (taxobox, taxobox_status_ref_open_tag_sc + @"\s*" + taxobox_status_ref_open_tag_sc, taxobox_status_ref_sc_tag);
  ArticleText = Regex.Replace (ArticleText, taxobox_status_ref_open_tag_sc + @"\s*" + taxobox_status_ref_open_tag_sc, taxobox_status_ref_sc_tag);
  }


//---------------------------< C L E A N U P >----------------------------------------------------------------

 if (null != taxobox)
  taxobox = unhide (taxobox);

 ArticleText = hide (ArticleText, "[Rr]eflist");

 while (Regex.Match (ArticleText, reflist_cleanup).Success)       // remove self-closed ref tags from {{reflist}} (European fire-bellied toad)
  {
  ArticleText = Regex.Replace (ArticleText, reflist_cleanup, "$1");
  ArticleText = Regex.Replace (ArticleText, @"(\{\{)\s*([Rr]eflist[^\|]*)\s*\|\s*refs\s*=\s*(\}\})", "$1$2$3");
  }

 ArticleText = unhide (ArticleText);

 if (null != taxobox)
  ArticleText = Regex.Replace (ArticleText, taxobox_blank_pattern, taxobox);

 ArticleText = Regex.Replace (ArticleText, angle_open, "<");
 ArticleText = Regex.Replace (ArticleText, angle_close, ">");


//---------------------------< F I N I S H >------------------------------------------------------------------

 if (status_added)               // build our edit summary
  Summary = summary_concat (Summary, " IUCN status added;");
 if (0 != iucn_status_confirmed_count)          // build our edit summary
  Summary = summary_concat (Summary, " IUCN status confirmed" + ((1 < iucn_status_confirmed_count) ? " (" + iucn_status_confirmed_count + "×);" : ";"));
 if (0 != iucn_status_updated_count)
  Summary = summary_concat (Summary, " IUCN status updated" + ((1 < iucn_status_updated_count) ? " (" + iucn_status_updated_count + "×);" : ";"));

 if ((0 != iucn_status_confirmed_count) || (0 != iucn_status_updated_count) || status_added)
  {
  if (0 != iucn_status_system_updated_count)
   Summary = summary_concat (Summary, " IUCN status system updated;");
  else if (status_system_added)
   Summary = summary_concat (Summary, " IUCN status system added;");
  }

 string dup_text = "";
 switch (duplicates_removed_count)
  {
  case 0:
   dup_text = ";";
   break;
  case 1:
   dup_text = " [duplicate removed];";
   break;
  default:
   dup_text = " [duplicates removed (" + duplicates_removed_count + "×)];";
   break;
  }

 if (status_ref_added)
  Summary = summary_concat (Summary, " IUCN status ref added" + dup_text);

 if (status_ref_updated)
  Summary = summary_concat (Summary, "  IUCN status ref updated" + dup_text);

 if (status_ref_current)
  Summary = summary_concat (Summary, "  IUCN status ref current;");


 if (0 != plain_text_count)             // build our edit summary
  {
  Summary = summary_concat (Summary, " evaluated " + plain_text_count + " reference" + (1 == plain_text_count ? ";" : "s;"));

  if (0 != plain_text_modified_count)
   Summary = summary_concat (Summary, " " + plain_text_modified_count + " reference" + (1 == plain_text_modified_count ? " " : "s ") + "modified;");
  }

 if (0 != iucn_template_count)
  {
  Summary = summary_concat (Summary, " evaluated " + iucn_template_count + " {{cite IUCN}}" + (1 == iucn_template_count ? ";" : "s;"));

  if (0 != template_modified_count)
   Summary = summary_concat (Summary, " " + template_modified_count + " template" + (1 == template_modified_count ? " " : "s ") + "modified;");
  }

 if ((0 != other_template_count) && (0 != other_template_modified_count)) // only report 'other templates' when we modify
  {
  Summary = summary_concat (Summary, " evaluated " + other_template_count + " other template" + (1 == other_template_count ? ";" : "s;"));

  if (0 != other_template_modified_count)
   Summary = summary_concat (Summary, " " + other_template_modified_count + " template" + (1 == other_template_modified_count ? " " : "s ") + "modified;");
  }

 if (0 != page_doi_skip_count)
  Summary = summary_concat (Summary, " skipped doi/page mismatch (" + page_doi_skip_count + "×);");

 if (0 != api_no_cite_return_count)
  Summary = summary_concat (Summary, " API cite nil return (" + api_no_cite_return_count + "×);");

 if (0 != api_no_species_return_id_count)         // for {{IUCN status}}
  Summary = summary_concat (Summary, " API species nil return (id) (" + api_no_species_return_id_count + "×);");

 if (0 != api_no_species_return_name_count)
  Summary = summary_concat (Summary, " API species nil return (name) (" + api_no_species_return_name_count + "×);");

 if (null != unrecognized_species_name)
  Summary = summary_concat (Summary, " unrecognized binomial: " + unrecognized_species_name + ";");

 stopwatch.Stop();               // stop the stopwatch
 TimeSpan ts = stopwatch.Elapsed;           // get the elapsed time and tack it onto the edit summary

 Summary = Summary + " (" + api_call_count + "/" + String.Format("{0:00}:{1:00}.{2:00}", ts.Minutes, ts.Seconds, ts.Milliseconds / 10) + ");";

 if (!status_ref_added && !status_ref_updated && (0 == iucn_status_updated_count)) // iucn_status_updated_count for {{IUCN status}} updates (List of reptiles of North America)
  {
  if (0 == iucn_template_count)
   {
   if ((0 != plain_text_count) && (plain_text_count == page_doi_skip_count))
    {
    error_log_add ("auto-skipped: doi/page mismatch");
    Skip = true;
    }

   if ((0 != plain_text_count) && (plain_text_count == api_no_cite_return_count))
    {
    error_log_add ("auto-skipped: number of cite IUCN templates is same as number of API citation nil returns");
    Skip = true;
    }
   }
  if (0 == plain_text_count)
   {
   if ((0 != iucn_template_count) && (iucn_template_count == page_doi_skip_count))
    {
    error_log_add ("auto-skipped: doi/page mismatch");
    Skip = true;
    }

   if ((0 != iucn_template_count) && (iucn_template_count == api_no_cite_return_count))
    {
    error_log_add ("auto-skipped: number of plain-text citations is same as number of API citation nil returns");
    Skip = true;
    }
   }
  }

 if ("" == ArticleText)              // trap to see if the 'blanked' pages that sometimes occur are the fault of this script
  {
  error_log_add ("auto-skipped: ArticleText is empty string");   // error message
  Skip = true;               // force a skip
  }

 if (0 != error_log_list.Count)
  log_errors (ArticleTitle, error_log_list);
 return ArticleText;
 }


//===========================<< S U P P O R T >>==============================================================

//---------------------------< N A M E D _ S T A T U S _ R E F _ D U P _ R E M O V E >------------------------
//
//
//

//private string named_status_ref_dup_remove (ref string text, string taxobox_status_ref_pattern, string taxobox_status_ref_sc_tag)
// {
// Match dup_match = Regex.Match (text, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
// if (dup_match.Success)
//  {
//  string name = dup_match.Groups[1].Value;             // get the reference's name from <ref name=...> tag
//  string ref_tag_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*\>"; // make a <ref name=... > pattern from name
//  string sc_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*/\>"; // make a self-closed <ref name=... /> pattern from name

//  text = Regex.Replace (text, sc_replace_pattern, taxobox_status_ref_sc_tag);     // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
//  text = counted_replace (text, ref_open_tag_named + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count); // now remove any duplicates

//  return sc_replace_pattern;
//  }
// return null;
// }

private void named_status_ref_dup_remove (ref string article_text, ref string taxobox, string taxobox_status_ref_pattern, string taxobox_status_ref_sc_tag)
 {
 Match dup_match;
 string name = null;
 string ref_tag_replace_pattern = null;
 string sc_replace_pattern = null;

 dup_match = Regex.Match (taxobox, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
 while (dup_match.Success)
  {
  name = dup_match.Groups[1].Value;                  // get the reference's name from <ref name=...> tag

  ref_tag_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*\>";     // make a <ref name=... > pattern from name
  sc_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*/\>";      // make a self-closed <ref name=... /> pattern from name

  taxobox = Regex.Replace (taxobox, sc_replace_pattern, taxobox_status_ref_sc_tag);      // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
  article_text = Regex.Replace (article_text, sc_replace_pattern, taxobox_status_ref_sc_tag);    // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag

  taxobox = counted_replace (taxobox, ref_tag_replace_pattern + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count); // now remove any duplicates

  dup_match = Regex.Match (taxobox, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
  }

 dup_match = Regex.Match (article_text, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
 while (dup_match.Success)
  {
  name = dup_match.Groups[1].Value;                  // get the reference's name from <ref name=...> tag

  ref_tag_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?" + name + @"""?\s*\>";     // make a <ref name=... > pattern from name
  sc_replace_pattern = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""""?" + name + @"""""?\s*/\>";      // make a self-closed <ref name=... /> pattern from name

  article_text = Regex.Replace (article_text, sc_replace_pattern, taxobox_status_ref_sc_tag);    // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag
  taxobox = Regex.Replace (taxobox, sc_replace_pattern, taxobox_status_ref_sc_tag);      // replace any <ref name=... /> with <ref name="iucn status <date> /> sc tag

  article_text = counted_replace (article_text, ref_tag_replace_pattern + @"\s*" + taxobox_status_ref_pattern + @"\s*" + ref_close_tag, taxobox_status_ref_sc_tag, ref duplicates_removed_count); // now remove any duplicates

  dup_match = Regex.Match (article_text, @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?([^""\>]+)""?\>\s*" + taxobox_status_ref_pattern + @"\s*\</[Rr][Ee][Ff]\>");
  }
 }


//---------------------------< H I D E _ T A X O B O X _ S T A T U S _ R E F >--------------------------------
//
//
//

private string hide_taxobox_status_ref (string taxobox, string taxobox_status_ref_open_tag, string taxobox_status_ref_pattern)
 {
 Match dup_match = Regex.Match (taxobox, "(" + taxobox_status_ref_open_tag +")(" + taxobox_status_ref_pattern + ")"); // look for and capture |status_ref= definition
 if (dup_match.Success)
  {
  string hidden_status_ref = hide (dup_match.Groups[2].Value, IS_TAXOBOX);       // spoof to hide {{cite IUCN}} in |status_ref=
  return Regex.Replace (taxobox, "(" + taxobox_status_ref_open_tag +")(" + taxobox_status_ref_pattern + ")", "$1" + hidden_status_ref); // replace with the hidden definition
  }

 return taxobox;
 }




//---------------------------< I U C N   P L A I N - T E X T   R E F E R E N C E   U P D A T E >--------------
//
// this is the plain-text form API id only.  Plain-text citations must be wrapped with <ref ...>...</ref> tags
//
// known issues:
//  because this attempts to locate 'correct' plain-text citations and because any non-template and non-
//  wikilink text is plain text, plain text inside <ref ...>...</ref> that is not part of the actual IUCN
//  citation will be treated as part of the citation and will be replaced with the {{cite IUCN}} template
//  if the API returns a citation for the taxon id.
//
//  does not update plain-text references in the taxobox (|status_ref= handled above); example: [[Picea abies]]
//

private string plain_text_ref_update (string text, string article_title)
 {
 if (Regex.Match (text, plain_text_ref_pattern).Success)       // must have the form <ref ...>plain text</ref> must be constrained because article is plain text
  text = Regex.Replace (text, plain_text_ref_pattern,
   delegate(Match match)
    {
    string plain_text = match.Groups[0].Value;       // this will be returned if no changes

    string taxon_id = plain_text_taxon_id_get (plain_text);   // attempt to get taxon id
    if (null == taxon_id)
     return plain_text;            // no taxon id so abandon

    if (is_plain_text_rejected (plain_text))       // returns true when plain_text is rejected
     return plain_text;

    string ref_open = match.Groups[1].Value.Trim();     // the opening <ref> tag
    string ref_close = match.Groups[3].Value.Trim();     // the closing </ref> tag

    plain_text_count++;             // bump total number of plain-text references found

    string api_url = api_id_url + taxon_id + api_token;     // build the url from its various parts
    string cite_iucn = cite_iucn_get (api_url, null, article_title, taxon_id, null); // go build a {{cite IUCN}} template from the api

    if (null == cite_iucn)
     return plain_text;            // template build failed

    plain_text_modified_count++;
    return ref_open + cite_iucn + ref_close;
    });

 return text;
 }


//---------------------------< T A X O B O X _ G E T >--------------------------------------------------------
//
// gets the {{taxobox}} or {{speciesbox}} template from <article_text>
//

private string taxobox_get (string article_text)
 {
 if (Regex.Match (article_text, taxobox_template_pattern).Success)
  return Regex.Match (article_text, taxobox_template_pattern).Groups[0].Value;

 return null;
 }


//---------------------------< T A X O B O X _ U P D A T E >--------------------------------------------------
//
// updates |status=, |status_system=, and |status_ref= parameters; returns true when updated; false else
//

private bool taxobox_update (ref string taxobox, ref string article_text, string article_title)
 {
 if (null == taxobox)                // if no taxobox
  return false;

 taxobox_blank = Regex.Replace (taxobox, taxobox_template_pattern, "$1$3");

 taxobox = Regex.Replace (taxobox, stray_dot, "$1");         // delete stray . because I found one such
 taxobox = Regex.Replace (taxobox, stray_splat, "$1");        // delete stray * because I found one such
 taxobox = Regex.Replace (taxobox, stray_equal, "$1");        // delete stray = because I found one such
 taxobox = Regex.Replace (taxobox, stray_nbsp, "$1");        // delete stray &nbsp; because I found one such
 taxobox = Regex.Replace (taxobox, html_comment, "$1");        // and html comments (Euconocephalus remotus)

 string taxobox_status_val = null;
 string taxobox_status_system_val = null;
 string taxobox_status_ref_val = null;
 string taxobox_status_ref_type = null;
 string taxobox_status_ref_name = null;            // original name from <ref name="original name"> or <ref name="original name" />
 bool taxobox_status_ref_is_empty = false;
 string taxobox_status_date = null;
 int  taxobox_status_date_diff = 100;
 string taxobox_species_name_val = null;

 string api_status_val = null;
 string api_status_system_val = null;

 taxobox_species_name_val = taxobox_species_name_get (taxobox, article_title);  // get species name from taxobox or article title
 if (api_species_data_get (taxobox_species_name_val, ref api_status_val, ref api_status_system_val, article_title))
  {                        // when here presume that we can also get citation data from api
  taxobox_status_val = taxobox_status_get (taxobox);
  taxobox_status_system_val = taxobox_system_get (taxobox);

  if ((((null != taxobox_status_val) && is_iucn_status (taxobox_status_val)) ||     // has a value that is an IUCN status or
   ((null != taxobox_status_system_val) && is_iucn_system (taxobox_status_system_val))) ||  // has a value that is an IUCN system or
   ((null == taxobox_status_val) && (null == taxobox_status_system_val)))      // both are missing or empty
    {
    taxobox_status_update (ref taxobox, api_status_val, taxobox_status_val);
    taxobox_system_update (ref taxobox, api_status_system_val, taxobox_status_system_val);
    }
  else
   return false;

  taxobox_status_ref_val = taxobox_status_ref_get (taxobox, ref taxobox_status_ref_is_empty);

  if (null != taxobox_status_ref_val)
   {
   if (Regex.Match (taxobox_status_ref_val, amended_text).Success)
    {
    error_log_add ("taxobox_update(): plain-text |status_ref= has amended text");
    return false;
    }

   if (Regex.Match (taxobox_status_ref_val, errata_text).Success)
    {
    error_log_add ("taxobox_update(): plain-text |status_ref= has errata text");
    return false;
    }

   if (Regex.Match (taxobox_status_ref_val, @"__P1P3__\s*(?:errata|amends)\s*=\s*\d{4}").Success)
    {
    error_log_add ("taxobox_update(): |status_ref= citation has |errata= or |amends= parameter");
    return false;
    }
   }

  taxobox_status_ref_type = taxobox_status_ref_type_get (taxobox_status_ref_val, ref taxobox_status_ref_name);

  string api_url = null;

  if (("named" == taxobox_status_ref_type) || ("unnamed" == taxobox_status_ref_type) || (null == taxobox_status_ref_type))
   {
   if (null != taxobox_status_ref_val)
    {
    taxobox_status_date = taxobox_status_date_get (taxobox_status_ref_val, taxobox_status_ref_name);
    taxobox_status_date_diff = taxobox_status_date_diff_get (taxobox_status_date);
    }

   if (6 < taxobox_status_date_diff)
    {
    api_url = api_name_url + taxobox_species_name_val + api_token;   // build citation url from its various parts
    taxobox_status_ref = cite_iucn_get (api_url, null, article_title, null, taxobox_species_name_val); // go build a {{cite IUCN}} template from the api

    if (null == taxobox_status_ref)
     return false;              // template build failed

    new_ref_tags_make (taxobox_status_ref, ref taxobox_status_ref_sc_tag, ref taxobox_status_ref_open_tag);

    if (null == taxobox_status_ref_val)          // if empty or missing
     {
     if (taxobox_status_ref_is_empty)
      {
      taxobox = Regex.Replace (taxobox, taxobox_status_ref_empty_pattern, "$1" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>$2");
      status_ref_added = true;
      }
     else                // here when |status_ref= is missing
      {
      taxobox = Regex.Replace (taxobox, taxobox_new_stat_sys_ref_pattern, "$1$2|status_ref=" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>$2$3");
      status_ref_added = true;
      }
     }
    else
     {
     taxobox = Regex.Replace (taxobox, taxobox_status_ref_pattern, "$1" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>");
     if ("named" == taxobox_status_ref_type)        // go rename all of the self-closed ref tags in article text and in the taxobox
      {
      article_text = Regex.Replace (article_text, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
      taxobox = Regex.Replace (taxobox, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
      }

     status_ref_updated = true;
     }
    }
   else
    status_ref_current = true;
   }
  else if ("named_sc" == taxobox_status_ref_type)
   {
   if (Regex.Match (article_text, ref_def_begin + taxobox_status_ref_name + ref_def_end).Success)
    {
    taxobox_status_ref_val = Regex.Match (article_text, ref_def_begin + taxobox_status_ref_name + ref_def_end).Groups[0].Value;
    taxobox_status_ref_val = unhide (taxobox_status_ref_val);
    taxobox_status_date = taxobox_status_date_get (taxobox_status_ref_val, taxobox_status_ref_name);
    taxobox_status_date_diff = taxobox_status_date_diff_get (taxobox_status_date);

    if (6 < taxobox_status_date_diff)
     {
     api_url = api_name_url + taxobox_species_name_val + api_token;   // build citation url from its various parts
     taxobox_status_ref = cite_iucn_get (api_url, null, article_title, null, taxobox_species_name_val); // go build a {{cite IUCN}} template from the api

     if (null == taxobox_status_ref)
      return false;              // template build failed

     new_ref_tags_make (taxobox_status_ref, ref taxobox_status_ref_sc_tag, ref taxobox_status_ref_open_tag);

                       // replace original definition with new sc ref tag
     article_text = Regex.Replace (article_text, ref_def_begin + taxobox_status_ref_name + ref_def_end, taxobox_status_ref_sc_tag);

                       // replace original |status_ref= sc ref tag with new definition
     taxobox = Regex.Replace (taxobox, taxobox_status_sc_ref_pattern, "$1" + taxobox_status_ref_open_tag + taxobox_status_ref + "</ref>");

                       // rename original sc ref tags
     article_text = Regex.Replace (article_text, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);
     taxobox = Regex.Replace (taxobox, sc_ref_tag_begin + taxobox_status_ref_name + sc_ref_tag_end, taxobox_status_ref_sc_tag);

     status_ref_updated = true;
     }
    }

   else
    error_log_add ("taxobox_update(): no definition for: " + code_nowiki (taxobox_status_ref_val));
   }
  else
   {
   error_log_add ("taxobox_update(): no " + code_nowiki ("|status_ref="));
   }
  }

 else                          // here when binomial is not recognized by iucn
  {
  if (null != taxobox_species_name_val)
   {
   taxobox_status_val = taxobox_status_get (taxobox);             // if either of these then add a maintenance category and ...
   taxobox_status_system_val = taxobox_system_get (taxobox);           // ... save unrecognized binomial for edit summary only when ...

   if ((((null != taxobox_status_val) && is_iucn_status (taxobox_status_val)) ||      // ... |status= has a value that is an IUCN status or
    ((null != taxobox_status_system_val) && is_iucn_system (taxobox_status_system_val))) ||   // |status_system= has a value that is an IUCN system or
    ((null == taxobox_status_val) && (null == taxobox_status_system_val)))       // both are missing or empty (example: Barlow's lark)
     {
     unrecognized_species_name = Uri.UnescapeDataString (taxobox_species_name_val);    // remove percent encoding
     string cat_plus_name = "[[Category:Taxobox binomials not recognized by IUCN]]" + " <!-- " + unrecognized_species_name + " -->";

     MatchCollection matches = Regex.Matches (article_text, @"__WL1NK_O__[Cc]ategory:.+__WL1NK_C__"); // find all of the categories

     if (0 != matches.Count)         // non-zero when categories found
      {
      int index = matches.Count - 1;      // make an indexer from Count and then replace last one with itself + our category
      article_text = Regex.Replace (article_text, matches[index].Value, matches[index].Value + '\n' + cat_plus_name);
      }
     else             // here when no categories; look for stub templates
      {
      matches = Regex.Matches (article_text, @"__0P3N__.+\-stub__CL0S3__"); // find all of the stub templates
      if (0 != matches.Count)        // non-zero when stub templates found
       article_text = Regex.Replace (article_text, matches[0].Value, cat_plus_name + '\x0A' + '\x0A' + matches[0].Value);
      else            // here when no categories and no stub templates
       article_text = article_text + '\x0A' + cat_plus_name; // no cats and no stub templates, add to the end
      }

     // binomial may not be recognized for a global assessment but is recognized for a regional assessment;
     // this script cannot know which region so cannot use the regional form of the citation API call:
     //  /api/v3/species/citation/:name/region/:region_identifier?token='YOUR TOKEN'
     // binomial may be recognized in iucn search box (as a redirect-like name) but that is not available
     // to the API (and if it were probably shouldn't be used)
     }
   }
  }
 taxobox = unhide (taxobox);
 article_text = Regex.Replace (article_text, taxobox_template_pattern, taxobox_blank); // install a blank so that we don't spend time evaluating the citation in |status_ref=
 return true;
 }


//---------------------------< N E W _ S E L F _ C L O S E D _ T A G S _ M A K E >----------------------------
//
// makes self-closed and normal <ref> tags for new |status_ref= {{cite IUCN}} reference using |access-date= from
// the {{cite IUCN}} template
//

private void new_ref_tags_make (string cite_iucn, ref string new_self_closed_tag, ref string taxobox_status_ref_open_tag)
 {
 string date = Regex.Match (cite_iucn, access_date).Groups[1].Value.Trim();  // date from new {{cite IUCN}} |access-date=
 new_self_closed_tag = @"<ref name=""iucn status " + date + @""" />";   // make a version to replace short-form ref tags that need to be renamed
 taxobox_status_ref_open_tag = @"<ref name=""iucn status " + date + @""">";  // make a version for |status_ref=
 }


//---------------------------< T A X O B O X _ S T A T U S _ G E T >------------------------------------------
//
// gets value assigned to {{taxobox}} or {{speciesbox}} |status= parameter; returns that value; status validation
// is done by calling function; returns null if |status= is missing or empty.
//

private string taxobox_status_get (string taxobox_template)
 {
 if (!Regex.Match (taxobox_template, taxobox_status_missing).Success || Regex.Match (taxobox_template, taxobox_status_empty).Success)
  return null;               // |status= is missing or empty

 return Regex.Match (taxobox_template, taxobox_status_value).Groups[2].Value.Trim();
 }


//---------------------------< I S _ I U C N _ S T A T U S >--------------------------------------------------
//
// return true if <status> is known IUCN category; false else
//

private bool is_iucn_status (string status)
 {
 if (null == status)
  return false;

 return Regex.Match (status, IS_IUCN_STATUS).Success;
 }


//---------------------------< T A X O B O X _ S T A T U S _ U P D A T E >------------------------------------
//
// updates, adds, or confirms |status= in taxobox using value from iucn API
//

private void taxobox_status_update (ref string taxobox, string api_status_val, string taxobox_status_val)
 {
 if (null == api_status_val)          // did api return species data with IUCN category?
  return;

 if (!Regex.Match (taxobox, taxobox_status_missing).Success)  // if |status= not in taxobox
  {
  taxobox = Regex.Replace (taxobox, taxobox_new_stat_sys_ref_pattern, "$1$2|status=" + api_status_val + "$2$3");
  status_added = true;
  }
 else if (api_status_val != taxobox_status_val)
  {
  taxobox = Regex.Replace (taxobox, taxobox_status_pattern, "$1" + api_status_val + "$2");
  iucn_status_updated_count++;
  }
 else               // here when <api_status_val> == <taxobox_status_val>
  iucn_status_confirmed_count++;        // bump the confirmed count and done
 }


//---------------------------< T A X O B O X _ S Y S T E M _ G E T >------------------------------------------
//
// gets value assigned to {{taxobox}} or {{speciesbox}} |status_system= parameter; returns that value; status_system
// validation is done by calling function; returns null if |status_system= is missing or empty.
//

private string taxobox_system_get (string taxobox_template)
 {
 if (!Regex.Match (taxobox_template, taxobox_system_missing).Success || Regex.Match (taxobox_template, taxobox_system_empty).Success)
  return null;               // |status= is missing or empty

 return Regex.Match (taxobox_template, taxobox_system_value).Groups[2].Value.Trim();
 }


//---------------------------< I S _ I U C N _ S Y S T E M >--------------------------------------------------
//
// return true if <system> is known IUCN category; false else
//

private bool is_iucn_system (string system)
 {
 if (null == system)
  return false;

 return Regex.Match (system, IS_IUCN_SYSTEM).Success;
 }


//---------------------------< T A X O B O X _ S Y S T E M _ U P D A T E >------------------------------------
//
// updates, adds, or confirms |status_system= in taxobox using value from iucn API
//

private void taxobox_system_update (ref string taxobox, string api_status_system_val, string taxobox_status_system_val)
 {
 if (null == api_status_system_val)        // did api return species data with IUCN category?
  return;

 if (!Regex.Match (taxobox, taxobox_system_missing).Success)  // if |status_system= not in taxobox
  {
  taxobox = Regex.Replace (taxobox, taxobox_new_stat_sys_ref_pattern, "$1$2|status_system=" + api_status_system_val + "$2$3");
  status_system_added = true;
  }

 else if (api_status_system_val != taxobox_status_system_val)
  {
  taxobox = Regex.Replace (taxobox, taxobox_system_pattern, "$1" + api_status_system_val + "$2");
  iucn_status_system_updated_count++;
  }
 }


//---------------------------< T A X O B O X _ S T A T U S _R E F _ G E T >-----------------------------------
//
// gets value assigned to {{taxobox}} or {{speciesbox}} |status_system= parameter; returns that value; ref tags,
// ref name, and reference text extracted by calling function
//

private string taxobox_status_ref_get (string taxobox, ref bool taxobox_status_ref_is_empty)
 {
 if (!Regex.Match (taxobox, taxobox_status_ref_missing).Success)
  return null;               // |status= is missing

 if (Regex.Match (taxobox, taxobox_status_ref_empty).Success)
  {
  taxobox_status_ref_is_empty = true;
  return null;               // |status= is empty
  }

 return Regex.Match (taxobox, taxobox_status_ref_value).Groups[2].Value.Trim();
 }


//---------------------------< T A X O B O X _ S T A T U S _ R E F _ T Y P E _ G E T >------------------------
//
// look at opening <ref> tag and return its type (order of evaluation is important here:
//  <ref> returns 'unnamed'
//  <ref ... name = .../>returns 'named_sc'
//  <ref ... name = ...> returns 'named'
// if none of these, or <taxobox_status_ref_val> is null, returns null
//

private string taxobox_status_ref_type_get (string taxobox_status_ref_val, ref string taxobox_status_ref_name)
 {
 if (null == taxobox_status_ref_val)
  return null;

 if (Regex.Match (taxobox_status_ref_val, ref_tag_unnamed_pattern).Success)
  return "unnamed";

 if (Regex.Match (taxobox_status_ref_val, ref_tag_named_sc_pattern).Success)  // order here important; named_sc test before named test
  {
  taxobox_status_ref_name = Regex.Match (taxobox_status_ref_val, ref_tag_named_sc_pattern).Groups[2].Value.Trim();
  return "named_sc";
  }

 if (Regex.Match (taxobox_status_ref_val, ref_tag_named_pattern).Success)  // order here important; named test after named_sc test
  {
  taxobox_status_ref_name = Regex.Match (taxobox_status_ref_val, ref_tag_named_pattern).Groups[2].Value.Trim();
  return "named";
  }

 return null;           // should never get here
 }


//---------------------------< T A X O B O X _ S T A T U S _ D A T E _ G E T >--------------------------------
//
// attempt to get date of last status update from ref tag (<ref name="iucn status 29 September 2021">) or from
// |access-date= value
//

private string taxobox_status_date_get (string taxobox_status_ref_val, string taxobox_status_ref_name)
 {
 if ((null != taxobox_status_ref_name) && Regex.Match (taxobox_status_ref_name, preferred_status_ref_tag_name).Success)
  return Regex.Match (taxobox_status_ref_name, preferred_status_ref_tag_name).Groups[1].Value.Trim();

 taxobox_status_ref_val = unhide (taxobox_status_ref_val);

 if (Regex.Match (taxobox_status_ref_val, access_date).Success)
  return Regex.Match (taxobox_status_ref_val, access_date).Groups[1].Value.Trim(); // date from |access-date=

 return null;
 }


//---------------------------< T A X O B O X _ S T A T U S _ D A T E _ D I F _ G E T >------------------------
//
// return the difference in months between today's date and a date from the |status_ref= <ref> tag or from the
// |status_ref= citation's |access-date=
//
// script will not update |status_ref= if date difference is less than 7 months
//

private int taxobox_status_date_diff_get (string date)
 {
 if (null == date)
  {
 // error_log_add ("taxobox_status_date_diff_get(): nil date value; forcing update"); // not really an error
  return 100;           // any value greater than 6 forces citation update attempt
  }

 int  current_month = DateTimeOffset.Now.Month;
 int  current_year = DateTimeOffset.Now.Year;

 string month = null;
 string year = null;

 foreach(KeyValuePair<string, string> date_pattern in date_patterns)
  {
  Match match = Regex.Match (date, date_pattern.Value);
  if (match.Success)
   {
   if ("ymd" == date_pattern.Key)     // because year precedes month, Group[1] and Group[2] are ordered differently
    {
    month = match.Groups[2].Value.Trim().ToLower();
    year = match.Groups[1].Value.Trim();
    }
   else           // here when dmy or mdy
    {
    month = match.Groups[1].Value.Trim().ToLower();
    year = match.Groups[2].Value.Trim();
    }
   }
  }

 if ((null == month) || (null == year))
  {
  error_log_add ("taxobox_status_date_diff_get(): month and/or year null; forcing update");
  error_log_add ("year: " + year);
  error_log_add ("month: " + month);
  return 100;           // any value greater than 6 forces citation update attempt
  }

 if (months.ContainsKey (month))
  return ((current_year - Int32.Parse(year)) * 12) + current_month - months[month];
 else
  {
  error_log_add ("taxobox_status_date_diff_get(): month not recognized: " + month + "; forcing update");
  return 100;
  }
 }


//---------------------------< T A X O B O X _ S P E C I E S _ N A M E _ G E T >------------------------------
//
// attempts to get binomial from various parameters in {{taxobox}} or {{speciesbox}} and failing that the article
// title.
//
// taxobox: |binomial= -> |name= -> article title
// speciesbox: |taxon= -> |genus= + |species= -> |name= -> article title
//
// returns null when <name> is not binomial-like (two words); example [[Africanogyrus]]
//

private string taxobox_species_name_get (string taxobox, string article_title)
 {
 string template_name = Regex.Match (taxobox, taxobox_template_pattern).Groups[2].Value.ToLower();   // capture is the template name (Taxobox, Speciesbox, etc)

 string name = null;            // name of this species from various possible parameters in the taxobox template

 if ("taxobox" == template_name)
  {
  if (Regex.Match (taxobox, binomial_pattern).Success)
   name = Regex.Match (taxobox, binomial_pattern).Groups[1].Value.Trim();  // use |binomial=
  else if (Regex.Match (taxobox, name_pattern).Success)
   name = Regex.Match (taxobox, name_pattern).Groups[1].Value.Trim();   // fallback to |name=
  }
 else if ("speciesbox" == template_name)
  {
  if (Regex.Match (taxobox, taxon_pattern).Success)
   name = Regex.Match (taxobox, taxon_pattern).Groups[1].Value.Trim();   // use |taxon=
  else if (Regex.Match (taxobox, genus_pattern).Success && Regex.Match (taxobox, species_pattern).Success)
   name = Regex.Match (taxobox, genus_pattern).Groups[1].Value.Trim() + " " + Regex.Match (taxobox, species_pattern).Groups[1].Value.Trim();
  else if (Regex.Match (taxobox, name_pattern).Success)
   name = Regex.Match (taxobox, name_pattern).Groups[1].Value.Trim();   // fallback to |name=
  }

 if (null == name)             // when none of the above
  {
  name = article_title;           // TODO: don't use article title?
  error_log_add ("using article title");
  }

 name = species_name_cleanup (name);         // remove markup, extinction markers, disambiguation, etc

 if (!Regex.Match (Uri.UnescapeDataString (name), @"[A-Za-z]+ [A-Za-z]+").Success) // does <name> look like a binomial?
  {
  error_log_add ("name not a binomial: " + name);
  return null;
  }

 return name;
 }


//---------------------------< T A X O N _ I D _ O L D _ F O R M _ U R L _ G E T >----------------------------
//
// loops through a series of old-form IUCN urls and returns the taxon id if the pattern matches; null else
//

private string taxon_id_from_old_form_url_get (string text)
 {
 foreach (string url_pattern in url_patterns)      // loop through a series of old-form url patterns
  {
  Match url_match = Regex.Match (text, url_pattern);
  if (url_match.Success)           // if found
   return url_match.Groups[1].Value.Trim();     // extract and return the taxon id
  }
 return null;
 }


//---------------------------< P L A I N _ T E X T _ T A X O N _ I D _ G E T >--------------------------------
//
// extract taxon id from IUCN page, doi, or url.  For plain-text citations, accept any form of iucn url when
// attempting to get the taxon id; prefer page -> doi -> url; returns taxon id if available, null else
//

private string plain_text_taxon_id_get (string plain_text)
 {
 if (Regex.Match (plain_text, plain_text_page_taxon_id).Success)  // get taxon id from page?
  return Regex.Match (plain_text, plain_text_page_taxon_id).Groups[1].Value;

 if (Regex.Match (plain_text, plain_text_doi_taxon_id).Success)  // get taxon id from doi?
  return Regex.Match (plain_text, plain_text_doi_taxon_id).Groups[1].Value;

 if (Regex.Match (plain_text, plain_text_taxon_id_url).Success)  // get taxon id from url?
  return Regex.Match (plain_text, plain_text_taxon_id_url).Groups[1].Value;

 return null;              // couldn't find taxon id; might not be iucn reference
 }


//---------------------------< I S _ P L A I N _ T E X T _ R E J E C T E D >----------------------------------
//
// evaluates <plain_text> looking for things that oughtn't to be there or that are not currently supported
// returns true when <plain_text> is rejected; null else
//

private bool is_plain_text_rejected (string plain_text)
 {
 if (Regex.Match (plain_text, @"\{\{\s*[Cc]it[ae]").Success)   // if 'plain text' has {{cit...}} template
  {
 // error_log_add ("is_plain_text_rejected(): plain-text has cite template: " + plain_text); // don't do this because it alarms on valid cite IUCN templates
  return true;             // skip this reference
  }

 if (Regex.Match (plain_text, amended_text).Success)
  {
  error_log_add ("is_plain_text_rejected(): plain-text has amended text");
  return true;             // because API doesn't yet identify amended assessment year
  }

 if (Regex.Match (plain_text, errata_text).Success)
  {
  error_log_add ("is_plain_text_rejected(): plain-text has errata text");
  return true;             // because API doesn't yet identify errata assessment year
  }

 return false;
 }


//---------------------------< S P E C I E S _ N A M E _ C L E A N U P >--------------------------------------
//
// removes stuff that isn't part of the binomial; returns name modified or not.
//

private string species_name_cleanup (string name)
 {
 name= Regex.Replace (name, "__4ng13_0__", "<");      // unhide html comments that might be part of <name>
 name= Regex.Replace (name, "__4ng13_C__", ">");

 foreach (string [] cleanup_pattern in cleanup_patterns)
  name = Regex.Replace (name, cleanup_pattern[0], cleanup_pattern[1]);

 name = name.Trim();             // and remove any leading/trailing whitespace
 name = Uri.EscapeDataString (name);         // percent encode uri reserved characters

 return name;
 }


//---------------------------< C I T E _ I U C N _ G E T >----------------------------------------------------
//
// creates {{cite IUCN}} template from api call.  Tries <first_url> first and if successful ignores <second_url>
// tries <second_url> else
//

private string cite_iucn_get (string first_url, string second_url, string ArticleTitle, string taxon_id, string species_name)
 {
 string citation_from_api = null;
 string raw_citation = null;

 if ((null == first_url) && (null == second_url))
  return null;

 var urls = new List<string>();
  urls.Add (first_url);
  urls.Add (second_url);

 foreach (string url in urls)
  {
  if (null != url)
   {
   citation_from_api = api_fetch (url, ArticleTitle);    // fetch citation from the IUCN API

   if (null == citation_from_api)
    return null;

   if (Regex.Match (citation_from_api, citation_from_api_pattern).Success)
    {
    raw_citation = Regex.Match (citation_from_api, citation_from_api_pattern).Groups[1].Value.Trim();
    break;
    }
   }
  }

 if (null == raw_citation)            // <raw_citation> must have a value
  {
  string text = "cite_iucn_get(): API did not return citation:";
  if (null != taxon_id)
   text = text + " id: " + taxon_id;
  if (null != species_name)
   text = text + " name: " + species_name;

  text = text + " " + code_nowiki (citation_from_api);

  error_log_add (text);
  api_no_cite_return_count++;
  return null;
  }

 string author_list = "";
 string date = "";
 string title = "";
 string volume = "";
 string page = "";
 string page_assessment = "";
 string doi = "";
 string doi_assessment = "";
 string access_date = "";

 Match parse = Regex.Match (raw_citation, parse_pattern);
 if (parse.Success)
  {
  author_list = author_names_get (parse.Groups[1].Value.Trim());
  date = @" |date=" + parse.Groups[2].Value.Trim();
  title = title_get (parse.Groups[3].Value.Trim());
  volume = @" |volume=" + parse.Groups[4].Value.Trim();
  page = @" |page=" + parse.Groups[5].Value.Trim();
  page_assessment = parse.Groups[6].Value.Trim();
  doi = @" |doi=" + parse.Groups[7].Value.Trim();
  doi_assessment = parse.Groups[8].Value.Trim();
  access_date = @" |access-date=" + parse.Groups[9].Value.Trim();
  }
 else
  {
  error_log_add ("cite_iucn_get(): parse failure: " + code_nowiki (citation_from_api));
  parse_fail_count++;
  return null;
  }

 if (page_assessment != doi_assessment)      // until errata date information available from the API
  {
  error_log_add ("cite_iucn_get(): doi/page mismatch: page assessment: " + code_nowiki (parse.Groups[5].Value.Trim()));
  page_doi_skip_count++;         // skip template when page- and doi-assessment ids are mismatched
  return null;
  }

 return @"{{cite IUCN" + author_list + date + title + volume + page + doi + access_date + @"}}";
 }


//---------------------------< A P I _ S P E C I E S _ D A T A _ G E T >--------------------------------------
//
// using taxon name, attempt to get species data from the IUCN API.
//

private bool api_species_data_get (string taxobox_species_name_val, ref string api_status_val, ref string api_status_system_val, string article_title)
 {
 if (null == taxobox_species_name_val)             // when taxobox_species_name_get() can't get a binomial-like name
  return false;

 string api_url = api_species_url + taxobox_species_name_val + api_token;    // build a url from its various parts (taxon name)
 string species_from_api = api_fetch (api_url, article_title);       // fetch species data from the IUCN API (taxon name)

 if (null == species_from_api)               // if the api call failed
  return false;                  // abandon

 if (Regex.Match (species_from_api, status_from_api_pattern).Success)     // update <api_status_val> from api return
  api_status_val = Regex.Match (species_from_api, status_from_api_pattern).Groups[1].Value;

 if (Regex.Match (species_from_api, status_system_from_api_pattern).Success)    // update <api_status_system_val> from api return
  {
  int year = Int32.Parse (Regex.Match (species_from_api, status_system_from_api_pattern).Groups[1].Value); // convert to an integer
  api_status_system_val = ((2000 < year) ? "IUCN3.1" : "IUCN2.3");     // and then convert to the appropriate status system
  }

 if ((null == api_status_val) || (null == api_status_system_val))      // if either of these are null, declare an error
  {
  error_log_add ("api_species_data_get(): API did not return species data: " + code_nowiki (species_from_api));
  api_no_species_return_name_count++;
  return false;                  // and abandon
  }

 return true;
 }


//---------------------------< A P I _ F E T C H >------------------------------------------------------------
//
// calls the iucn api with <api_url>; returns raw data string on success; null else.  Bumps the api call counter
//
//

private string api_fetch (string api_url, string ArticleTitle)
 {
 if (0 < api_call_count)            // pause here for 3 seconds if <api_call_count> is greater than 0 (pause is skipped for the first api access)
  System.Threading.Thread.Sleep (3000);       // this prevents us from banging on the API too quickly

 api_call_count++;             // bump the call counter
 string string_from_api = null;

 try
  {
  // this WebRequest code courtesy of en.wiki editor User:DavidBrooks
  System.Net.HttpWebRequest webRequest = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(api_url);
  webRequest.UserAgent = "Wikipedia IUCN citation update experiment (https://en.wikipedia.org/wiki/User:Trappist_the_monk)";
  System.IO.Stream str = webRequest.GetResponse().GetResponseStream();
  string_from_api = new System.IO.StreamReader(str).ReadToEnd();
  }
 catch
  {
  error_log_add ("api_fetch(): Exception occurred reading: " + code_nowiki (api_url));
  api_fetch_fail_count++;
  return null;
  }

 return string_from_api;
 }


//---------------------------< A U T H O R _ N A M E S _ G E T >----------------------------------------------
//
// attempts to extract individual author names from iucn api citation.  Derived from [[Module:cite IUCN]] function
// make_cite_iucn()
//

private string author_names_get (string raw_author_list)
 {
 string collaboration = null;
 string pattern = @"(,\s+[A-Z]),";             // for when iucn forgets to include final dot
 raw_author_list = Regex.Replace (raw_author_list, pattern, "$1" + ".,");
 pattern = @"(\.[A-Z]),";               // for when iucn forgets to include final dot
 raw_author_list = Regex.Replace (raw_author_list, pattern, "$1" + ".,");

 pattern = @"\s\(([^\)]+)\)$";

 if (Regex.Match (raw_author_list, pattern).Success)
  {
  collaboration = Regex.Match (raw_author_list, pattern).Groups[1].Value.Trim(); // save the collaboration name
  raw_author_list = Regex.Replace (raw_author_list, pattern, "");     // remove collaboration from raw_author_list
  }

 raw_author_list = Regex.Replace (raw_author_list, @"\.?,?\s+&\s+", ".|");   // replace <opt. dot><opt. comma><space><ampersand><space> with <dot><pipe>
 raw_author_list = Regex.Replace (raw_author_list, @"\.,\s+", ".|");     // replace <dot><comma><space> with <dot><pipe>
 raw_author_list = Regex.Replace (raw_author_list, @"(\.[A-Z]),\s+", "$1.|");  // special case where iucn drops the dot after an initial

 string   author_list = "";
 string[] authors = Regex.Split (raw_author_list, @"\|");       // split the string on the <pipe>
 int   i = 1;

 foreach (string author in authors)
  {
  if (1 == i)
   author_list = author_list + " |author" + "=" + author;      // don't enumerate first author
  else
   author_list = author_list + " |author" + i + "=" + author;
  i++;
  }

 if (null != collaboration)
  author_list = author_list + " |collaboration=" + collaboration;

 return author_list;
 }


//---------------------------< T I T L E _ G E T >------------------------------------------------------------
//
// extracts title from iucn API citation; attempts to add markup so that it renders correctly
//

private string title_get (string raw_title)
 {
 string title = null;              // formatted title goes here
 string errata = "";               // errata year, if present, goes here; empty string for concatenation
 string amends = "";               // amends year, if present, goes here; empty string for concatenation
 string pattern = null;
 string replace = null;

 foreach (string[] search_and_replace in search_and_replaces)
  {
  pattern = search_and_replace[0];
  replace = search_and_replace[1];          // replace includes wiki markup for title
  if (Regex.Match (raw_title, pattern).Success)
   {
   title = Regex.Replace (raw_title, pattern, replace);
   break;
   }
  }

 if (null == title)
  {
  title = "''" + raw_title + "''";          // pattern not found apply italic markup to raw_title from API citation
 // error_log_add ("title_get(): using raw title: " + raw_title);   // not really an error
  }

 pattern = errata_text;              // look for an errata string; as of 2021-10-01, errata string not available in API citation
 Match match = Regex.Match (title, pattern);
 if (match.Success)
  errata = " |errata=" + match.Groups[1].Value.Trim();

 pattern = amended_text;              // look for an amended string; as of 2021-10-01, amended string not available in API citation
 match = Regex.Match (title, pattern);
 if (match.Success)
  amends = " |amends=" + match.Groups[1].Value.Trim();

 return " |title=" + title + errata + amends;
 }


//---------------------------< H I D E >----------------------------------------------------------------------
//
// HIDE TEMPLATES: find templates that are not <dont_hide>; replace the opening {{ with __0P3N__, the closing }}
// with __CL0S3__, and internal | (pipes) with __P1P3__
//
// single curly braces in urls and other parameter values can confuse other regex in this code so replace {
// with __0CU!21Y__ and } with __CCU!21Y__
//

private string hide (string ArticleText, string dont_hide)
 {
 string pattern = @"\{\{(?!\s*" + dont_hide + @")[^\{\}]*\}\}";
 if (Regex.Match (ArticleText, pattern).Success)
  {
  ArticleText = Regex.Replace(ArticleText, pattern,
   delegate(Match match)
    {
    string fixed_template;          // a hidden template is assembled here
    string raw_template = match.Groups[0].Value;    // the whole template

    pattern = @"\{\{";           // hide the opening {{
    fixed_template = Regex.Replace (raw_template, pattern, "__0P3N__");

    pattern = @"\}\}";           // hide the closing }}
    fixed_template = Regex.Replace (fixed_template, pattern, "__CL0S3__");

    pattern = @"\|";           // and hide the pipes
    fixed_template = Regex.Replace (fixed_template, pattern, "__P1P3__");

    return fixed_template;
    });
  }

 pattern = @"(\<!\-{2,}\s*[^\>\|\}]*)\{\{(\s*" + dont_hide + @"[^\}]*)\}\}([^\>]*\-{2,}\>)";  // <!-- {{citx...}} -->
 ArticleText = Regex.Replace(ArticleText, pattern, "$1__0P3N__$2__CL0S3__$3");

 pattern = @"\{\|";              // open table markup
 ArticleText = Regex.Replace(ArticleText, pattern, "__0T4BL3__");

 pattern = @"\|\}(?!\})";            // close table markup
 ArticleText = Regex.Replace(ArticleText, pattern, "__CT4BL3__");

 pattern = @"([^\{])\{([^\{])";           // single opening curly brace
 ArticleText = Regex.Replace(ArticleText, pattern, "$1__0CU!21Y__$2");

 pattern = @"([^\}])\}([^\}])";           // single closing curly brace
 ArticleText = Regex.Replace(ArticleText, pattern, "$1__CCU!21Y__$2");

 pattern = @"\[\[(?![Ff]ile|[Ii]mage)([^\|\]]+)\|([^\]]+)\]\]";   // HIDE complex wikilinks: [[article title|label]] to __WL1NK_O__article title__P1P3__label__WL1NK_C__
 ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__P1P3__$2__WL1NK_C__"); // [[File: with wikilinks inside can be confusing

 pattern = @"\[\[([^\]]+)\]\]";           // HIDE simple wikilinks: [[article title]] to __WL1NK_O__article title__WL1NK_C__
 ArticleText = Regex.Replace(ArticleText, pattern, "__WL1NK_O__$1__WL1NK_C__");

 return ArticleText;
 }


//---------------------------< U N H I D E >------------------------------------------------------------------
//
// UNHIDE TEMPLATES: find templates and wikilinks that are hidden; replace the 'hide' keywords with the
// appropriate wiki markup
//

private string unhide (string ArticleText)
 {
 ArticleText = Regex.Replace(ArticleText, @"__WL1NK_O__", "[[");  // UNHIDE: replace __WL1NK_O__ with [[
 ArticleText = Regex.Replace(ArticleText, @"__WL1NK_C__", "]]");  // UNHIDE: replace __WL1NK_C__ with ]]
 ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|");   // UNHIDE: replace __P1P3__ with |

 ArticleText = Regex.Replace(ArticleText, @"__0T4BL3__", "{|");  // UNHIDE: replace __0T4BL3__ with {|
 ArticleText = Regex.Replace(ArticleText, @"__CT4BL3__", "|}");  // UNHIDE: replace __CT4BL3__ with |}

 ArticleText = Regex.Replace(ArticleText, @"__0CU!21Y__", "{");  // UNHIDE: replace __0CU!21Y__ with {
 ArticleText = Regex.Replace(ArticleText, @"__CCU!21Y__", "}");  // UNHIDE: replace __CCU!21Y__ with }

 ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{");  // UNHIDE: replace __0P3N__ with {{
 ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}");  // UNHIDE: replace __CL0S3__ with }}

 return ArticleText;
 }


//---------------------------< S U M M A R Y _ C O N C A T >--------------------------------------------------
//
// concatenates text onto an existing edit summary string, limiting the string to a length of no more than 347
// characters.  When <summary> appended with <text> would be longer than the allowed 347 character limit, this
// function replaces <text> with an ellipsis.  Once an ellipsis is added, no more <text> can be added to <summary>
//

private string summary_concat (string summary, string text)
 {
 if (0 <= summary.IndexOf ("..."))         // if ellipsis already present in <summary>, abandon
  return summary;

 if (347 >= (summary.Length + text.Length + 3))      // if adding <text> to summary will overrun the 347 char limit (+ 3 to make sure we can add ellipsis if necessary)
  return summary + text;           // append <text> to <summary> and done

 return summary + "...";            // append ellipsis instead
 }


//---------------------------< C O D E _ N O W I K I >--------------------------------------------------------
//
// wraps 'text' in <code><nowiki>text</nowiki></code> tags for error log
//

private string code_nowiki (string text)
 {
 return "<code><nowiki>" + text + "</nowiki></code>";
 }


//---------------------------< E R R O R _ L O G _ A D D >----------------------------------------------------
//
// adds an error message to the error log list.  Probably superfluous.
//

private void error_log_add (string message)
 {
 error_log_list.Add (message);
 }


//---------------------------< L O G _ E R R O R S >----------------------------------------------------------
//
// writes the content of the error log list to the log file, prettified with wiki markup.
//

private void log_errors (string article_title, List<string> error_log_list)
 {
 System.IO.StreamWriter sw;
 string time = DateTimeOffset.Now.ToString("u").Substring (11, 9);
 string date = DateTimeOffset.Now.ToString("u").Substring (0, 10);

 string log_file = @"Z:\Wikipedia\AWB\Monkbot_tasks\Monkbot_task_19_cite_iucn_update\logs\" + date + ".txt";

 int  seconds = DateTimeOffset.Now.Second;
 int  minutes = DateTimeOffset.Now.Minute;
 int  hours = DateTimeOffset.Now.Hour;


 sw = System.IO.File.AppendText (log_file);
 sw.WriteLine ("*[[" + article_title + "]] (" + time + "):");

 foreach (string list_item in error_log_list)
  sw.WriteLine ("*:" + list_item);

 error_log_list.Clear();

 sw.Close();
 }


//---------------------------< C O U N T E D _ R E P L A C E >------------------------------------------------
//
// common function to replace <pattern> with <replace> and bump <count> until no more <pattern>
//

private string counted_replace (string template, string pattern, string replace, ref int count)
 {
 Regex rgx = new Regex (pattern);           // make a new regex from <pattern>

 while (Regex.Match (template, pattern).Success)        // look for <pattern> in <template>
  {
  template = rgx.Replace (template, replace, 1);       // replace one copy of <pattern> with <replace>
  count++;                // bump the counter
  }

 return template;
 }


//===========================<< S T A T I C   D A T A >>======================================================

static bool  status_added = false;     // set to true when |status= created in taxobox

static int  plain_text_modified_count = 0;   // number of plain-text citations that were modified from the iucn api
static int  plain_text_count = 0;     // total number of plain-text iucn references

static int  api_call_count = 0;      // number of api calls made; this value not reported in edit summary
static int  api_fetch_fail_count = 0;    // number of api fetches that failed
static int  api_no_cite_return_count = 0;   // number of times that the api returned a non-citation value like: {"value":"0","species":"202965"}
static int  parse_fail_count = 0;     // number of times that we couldn't parse the api return
static int  page_doi_skip_count = 0;    // number of templates or plain-text references skipped because page and doi assessment ID mismatch (could be errata but since no errata date ...)
static int  api_no_species_return_name_count = 0; // number of times that the api returned a non-species value (species name)
static int  api_no_species_return_id_count = 0;  // number of times that the api returned a non-species value (species id for {{IUCN status}})
static int  iucn_status_updated_count = 0;   // number of times that we updated the iucn status in taxobox-like templates
static int  iucn_status_confirmed_count = 0;  // number of times that we confirmed the iucn status in taxobox-like templates
static int  iucn_status_system_updated_count = 0; // number of times that we updated the iucn status system in taxobox-like templates

static string taxobox_blank = null;     // gets blank taxobox as flag
static bool  status_ref_added = false;    // set to true when |status_ref= created
static bool  status_system_added = false;   // set to true when |status_system created
static bool  status_ref_updated = false;    // set to true when |status_ref= updated
static bool  status_ref_current = false;    // set to true when |status_ref= less than 6 months old
static int  duplicates_removed_count = 0;   // number of duplicate status references removed


static string sc_ref_tag_begin = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?"; // these for taxobox |status_ref= handling
static string sc_ref_tag_end = @"""?\s*/\>";

static string ref_def_begin = @"\<[Rr][Ee][Ff]\s*name\s*=\s*""?";  // these for taxobox |status_ref= <ref name=... /> handling to locate the matching definition
static string ref_def_end = @"""?\s*\>[^\<]*\</[Rr][Ee][Ff]\>";

static string reflist_cleanup = @"(\{\{\s*[Rr]eflist[^\}]*\|\s*refs\s*=[^\}]*)\<\s*[Rr][Ee][Ff][^\>]*/\>";

static string hide_non_ref_tag_pattern = @"\<((?!/[Rr][Ee][Ff]|[Rr][Ee][Ff])[^\>]*)\>";
static string angle_open = "__4ng13_0__";
static string angle_close = "__4ng13_C__";
static string hide_non_ref_replace_val = angle_open + "$1" + angle_close;

static int  iucn_template_count = 0;    // total number of cite IUCN templates
static int  other_template_count = 0;    // total number of cite journal/web templates


//---------------------------< A P I >------------------------------------------------------------------------

static string api_species_url = "http://apiv3.iucnredlist.org/api/v3/species/"; // for fetching species data from the api by name
static string api_species_id_url = api_species_url + "id/";      // for fetching species data from the api by taxon id (for {{IUCN status}})
static string api_id_url = api_species_url + "citation/id/";      // for fetching citation data from the api using taxon id
static string api_name_url = api_species_url + "citation/";      // for fetching citation data from the api using binomial

static string iucn_api_token_file = @"Z:\Wikipedia\AWB\Monkbot_tasks\Monkbot_task_19_cite_iucn_update\iucn_api_token"; // token required to be private; stored locally here
static string api_token = null;             // stored at iucn_api_token_file


//---------------------------< C I T E   I U C N >------------------------------------------------------------

 static string IS_CITE_IUCN = @"(?:[Cc]ite iucn|[Cc]ite IUCN)";
 static string iucn_template_pattern = @"\{\{\s*" + IS_CITE_IUCN + @"[^\}]+\}\}";    // basic cite IUCN template pattern
 static string iucn_title = @"\|\s*title\s*=([^\|\}]*)";          // everything in cite IUCN |title= for api calls

 static string[] url_patterns = new string[]
  {
  @"https?://www\.iucnredlist\.org/details/(\d+)/\b(?:all|full)",
  @"https?://www\.iucnredlist\.org/details/full/(\d+)/\d+",
  @"https?://www\.iucnredlist\.org/details/(\d+)/\d+",
  @"https?://www\.iucnredlist\.org/details/(\d+)/?",
  @"https?://www\.iucnredlist\.org/details/summary/(\d+)",
  @"https?://www\.iucnredlist\.org/search/details\.php/(\d+)/(?:all|summ)",
  @"https?://oldredlist\.iucnredlist.org/details/(\d+)/\d+",
  };

 static string ref_param_empty = @"\|\s*ref\s*=\s*([\|\}])";
 static string ref_param_not_empty = @"\|\s*ref\s*=\s*([^\|\}]+)";


//---------------------------< C I T E   J O U R N A L / W E B >----------------------------------------------

 static string IS_CITE_OTHER = @"(?:[Cc]ite journal|[Cc]ite web)";  // TODO: expand this to include more redirects?
 static string other_template_pattern = @"\{\{\s*" + IS_CITE_OTHER + @"[^\}]+\}\}";    // basic cite IUCN template pattern



//---------------------------< N E W   C I T E   I U C N >----------------------------------------------------
//
// parse_pattern doesn't work for citations like this (from [[Cantleya]]) because of the 'extra' year ahead of
// the binomial:
//  Asian Regional Workshop (Conservation & Sustainable Management of Trees, Viet Nam, August 1996) 1998. Cantleya corniculata. The IUCN Red List of Threatened Species 1998: e.T33197A9760751. https://dx.doi.org/10.2305/IUCN.UK.1998.RLTS.T33197A9760751.en .Downloaded on 1 October 2021
//
// Haven't seen enough of these to attempt a second parse pattern
//

//static string citation_from_api_pattern = @"\[\{""citation"":""([^""]*)""\}\]";
static string citation_from_api_pattern = @"\[\{""citation"":""([^\}]*)""\}\]";
static string parse_pattern = @"(^\D+)(\d{4})\.(\D+)\. The IUCN Red List of Threatened Species (\d{4}): (e\.T\d+A(\d+))\.\D+(10\.2305\/IUCN\.UK\.[\d\-]+\.RLTS\.T\d+A(\d+)\S+)\D+(\d{1,2} [A-Za-z]+ \d{4})";

static string[][] search_and_replaces =
 {
 new string[] {@"(.+?)\sssp\.\s+(.+?)\s(\([^\)]+\))$",  @"''$1'' ssp. ''$2'' $3"},  // binomen ssp. subspecies (zoology) with errata or amended text
 new string[] {@"(.+?)\sssp\.\s+(.+)",      @"''$1'' ssp. ''$2''"},   // binomen ssp. subspecies (zoology)
 new string[] {@"(.+?)\ssubsp\.\s+(.+?)\s(\([^\)]+\))$",  @"''$1'' subsp. ''$2'' $3"}, // binomen subsp. subspecies (botany) with errata or amended text
 new string[] {@"(.+?)\ssubsp\.\s+(.+)",      @"''$1'' subsp. ''$2''"},  // binomen subsp. subspecies (botany)
 new string[] {@"(.+?)\svar\.\s+(.+?)\s+(\([^\)]+\))$",  @"''$1'' var. ''$2'' $3"},  // binomen var. variety (botany) with errata or amended text
 new string[] {@"(.+?)\svar\.\s+(.+)",      @"''$1'' var. ''$2''"},   // binomen var. variety (botany)
 new string[] {@"(.+?)\ssubvar\.\s+(.+?)\s(\([^\)]+\))$", @"''$1'' subvar. ''$2'' $3"}, // binomen subvar. subvariety (botany) with errata or amended text
 new string[] {@"(.+?)\ssubvar\.\s+(.+)",     @"''$1'' subvar. ''$2''"},  // binomen subvar. subvariety (botany)
 new string[] {@"(.+?)\s*(\([^\)]+\))$",      @"''$1'' $2"}     // binomen with errata or amended text
 };

static string errata_text = @"\(errata version published in (\d{4})\)";
static string amended_text = @"\(amended version of (\d{4}) assessment\)";


//---------------------------< T A X O B O X >----------------------------------------------------------------

static string HIDE_ALL_BUT_TAXOBOX = @"(?:[Tt]axobox\s*\||[Ss]peciesbox\s*\|)";       // this to prevent confusion with {{Taxobox authority}} when hiding
static string IS_TAXOBOX = @"(?:[Tt]axobox|[Ss]peciesbox)";            // for hiding all non-taxobox-like templates
static string taxobox_template_pattern = @"(\{\{\s*(" + IS_TAXOBOX + @"))[^\}]+(\}\})";     // basic taxobox-like template pattern; TODO: {{subspeciesbox}}?
static string taxobox_blank_pattern = @"\{\{\s*" + IS_TAXOBOX + @"\}\}";

static string taxobox_new_stat_sys_ref_pattern = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]+?)(\s*)(\}\})";  // used to create new |status=, |status_system=, and |status_ref= params in taxobox
static string taxobox_status_ref_pattern = @"(\|\s*status_ref\s*=\s*)(\<ref[^\>]*\>)[^\<]*(\</ref\>)"; // used to replace |status_ref= param in taxobox
static string taxobox_status_ref_empty_pattern = @"(\|\s*status_ref\s*=[ \t]*)([\r\n]*[\|\}])";   // used to add reference to |status_ref= param in taxobox

static string taxobox_status_sc_ref_pattern = @"(\|\s*status_ref\s*=\s*)(\<[Rr][Ee][Ff][^\>]+/\>)";  // used to replace |status_ref= param in taxobox

static string taxobox_status_ref = null;                 // the 'new' value for |status_ref
static string taxobox_status_ref_open_tag = null;               // it matching ref open tag
static string taxobox_status_ref_sc_tag = null;               // and its matching self-closed tag

static string stray_dot = @"(\|\s*status_ref\s*=\s*)\.";             // delete stray dot; because I found one such (Astroblepus pholeter)
static string stray_splat = @"(\|\s*status_ref\s*=\s*)\*";            // delete stray spat; because I found one such (Gray short-tailed bat)
static string stray_equal = @"(\|\s*status_ref\s*=\s*)=";             // delete stray equal; because I found one such (Cyprinus hieni)
static string stray_nbsp = @"(\|\s*status_ref\s*=\s*)&nbsp;";            // delete stray &nbsp; because I found one such (Euconocephalus remotus)
static string html_comment = @"(\|\s*status_ref\s*=[^\|\}]*)\<!\-\-[^\>]*\-\-\>";       // and html comments
static string unrecognized_species_name = null;               // gets taxobox species name that IUCN doesn't recognize


//---------------------------< T A X O B O X _ S T A T U S >--------------------------------------------------

static string IS_IUCN_STATUS = @"(\b(?:LC|LR/lc|NT|LR/nt|LR/cd|VU|EN|CR|PE|PEW|EW|EX|DD|NE)\b)";   // also used with {{IUCN status}}

static string taxobox_status_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=";
static string taxobox_status_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=\s*([\|\}])";
static string taxobox_status_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status\s*=\s*([^\|\}]+)";

static string taxobox_status_pattern = @"(\|\s*status\s*=\s*)[^\|\}]*?(\s*[\|\}])";

static string status_from_api_pattern = @"""category"":""([^""]+)""";    // for |status=


//---------------------------< T A X O B O X _ S Y S T E M >--------------------------------------------------

static string IS_IUCN_SYSTEM = @"(\b(?:IUCN2.3|IUCN3.1)\b)";

static string taxobox_system_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=";
static string taxobox_system_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=\s*([\|\}])";
static string taxobox_system_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_system\s*=\s*([^\|\}]+)";

static string taxobox_system_pattern = @"(\|\s*status_system\s*=\s*)[^\|\}]*([^\|\}])";

static string status_system_from_api_pattern = @"""assessment_date"":""(\d+)"; // for |status_system=


//---------------------------< T A X O B O X _ S T A T U S _ R E F >------------------------------------------

static string taxobox_status_ref_missing = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=";
static string taxobox_status_ref_empty = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=\s*([\|\}])";
static string taxobox_status_ref_value = @"(\{\{\s*" + IS_TAXOBOX + @"[^\}]*)\|\s*status_ref\s*=\s*([^\|\}]+)";

static string ref_tag_named_pattern = @"(\<[Rr][Ee][Ff][^\>]*name\s*=\s*""?([^""\>]*)""?\s*\>)";
static string ref_tag_named_sc_pattern = @"(\<[Rr][Ee][Ff][^\>]*name\s*=\s*""?([^""/]*)""?\s*/\s*\>)";
static string ref_tag_unnamed_pattern = @"(\<[Rr][Ee][Ff]\>)";


//---------------------------< T A X O B O X _ S P E C I E S _ N A M E >--------------------------------------

static string binomial_pattern = @"\|\s*binomial\s*=\s*([^\|\}]*)";    // taxobox

static string taxon_pattern = @"\|\s*taxon\s*=\s*([^\|\}]*)";      // speciesbox
static string genus_pattern = @"\|\s*genus\s*=\s*([^\|\}]*)";      // these two combined to make binomial name
static string species_pattern = @"\|\s*species\s*=\s*([^\|\}]*)";

static string name_pattern = @"\|\s*name\s*=\s*([^\|\}]*)";      // taxobox and speciesbox


//---------------------------< D A T E S >--------------------------------------------------------------------

static Dictionary<string, string> date_patterns = new Dictionary<string, string>()
 {
 {"dmy", @"\d{1,2}\s+([JFMASOND][a-z]+)\s+(\d{4})"},  // dmy
 {"mdy", @"([JFMASOND][a-z]+)\s+\d{1,2}\s*,\s+(\d{4})"}, // mdy
 {"ymd", @"(\d{4})\-(\d{2})\-\d{2}"}      // ymd
 };

static string preferred_status_ref_tag_name = @"iucn status (\d{1,2}\s+([JFMASOND][a-z]+)\s+(\d{4}))";
static string access_date = @"\|access\-?date=([^\|\}]+)";

static Dictionary<string, int> months = new Dictionary<string, int>()
 {
 {"january", 1},          // these for dmy and mdy
 {"february", 2},
 {"march", 3},
 {"april", 4},
 {"may", 5},
 {"june", 6},
 {"july", 7},
 {"august", 8},
 {"september", 9},
 {"october", 10},
 {"november", 11},
 {"december", 12},
 {"jan", 1},           // these for dmy and mdy
 {"feb", 2},
 {"mar", 3},
 {"apr", 4},
// {"may", 5},           // same as whole month name; can't have two with the same key
 {"jun", 6},
 {"jul", 7},
 {"aug", 8},
 {"sep", 9},
 {"oct", 10},
 {"nov", 11},
 {"dec", 12},
 {"01", 1},           // these for ymd
 {"02", 2},
 {"03", 3},
 {"04", 4},
 {"05", 5},
 {"06", 6},
 {"07", 7},
 {"08", 8},
 {"09", 9},
 {"10", 10},
 {"11", 11},
 {"12", 12},
 };


//--------------------------- R E M O V E   D U P L I C A T E   S T A T U S   R E F >-------------------------

static string[] symbols = new string[]
 {
 @"\{",
 @"\(",
 @"\|",
 @"\.",
 @"\-",
 @"\)",
 @"\}",
 };

static string ref_open_tag_unnamed = @"\<[Rr][Ee][Ff]\>";
static string ref_open_tag_named = @"\<[Rr][Ee][Ff][^\>]*\>";
static string ref_close_tag = @"\</[Rr][Ee][Ff]>";
static string bib_open_ul = @"[\r\n]+\*\s*";
static string bib_close_ul = @"([\r\n]+)";


//---------------------------< S P E C I E S _ N A M E _ C L E A N U P >--------------------------------------
//
// these things must be removed from binomial before calling the api with the binomial
//

static string[][] cleanup_patterns =
 {
 new string[]  {ref_open_tag_named + @"[^\<]*" + ref_close_tag, ""}, // references; [[Lampadioteuthis]] caused api fetch exception
 new string[]  {@"\<[Rr][Ee][Ff][^\>]+/\>", ""},      // self-closed references; [[Sand cat]]
 new string[] {@"\<!\-\-[^\>]*\-\-\>",  ""},      // html comment
 new string[]  {@"[\.;:]+$",  ""},         // trailing punctuation
 new string[]  {"'''(.+)'''",  "$1"},         // bold wiki markup
 new string[]  {"''(.+)''$",  "$1"},         // italic wiki markup
 new string[]  {@"""",    ""},         // double quote marks
 new string[]  {"†",    ""},         // extinction markers
 new string[]  {@"\[\[",   ""},         // opening wikilink markup
 new string[]  {@"\]\]",   ""},         // closing wikilink markup
 new string[]  {@"\s*\([^\)]+\)", ""},         // disambiguation
 new string[]  {@"[\.;:]+$",  ""},         // trailing punctuation (again)
 new string[]  {@"\<nowiki/\>", ""},         // self-closed <nowiki/> tag
 new string[]  {@"\<nowiki\>",  ""},         // opening <nowiki> tag
 new string[]  {@"\</nowiki\>", ""},         // closing </nowiki> tag
 };


//----------------------------------------< P L A I N _ T E X T >---------------------------------------------
//
// for plaintext references wrapped in <ref>...</ref> tags or in unordered markup (bibliography); must have a
// recognizable page identifier or doi or a url from which a taxon id can be extracted
//

static string plain_text_ref_pattern = @"(\< *ref[^\>]*\>)([^\<]*)(\</ref>)";        // <ref>anything</ref> ref tags and reference are captured
static string plain_text_bib_pattern = @"([\r\n]+\*)([^\r\n]*iucnredlist\.org[^\r\n]*)([\r\n]+)";   // some sort of iucn ref in unordered list

static string plain_text_page_taxon_id = @"\be\.T(\d+)A\d+";            // get taxon id from page
static string plain_text_doi_taxon_id = @"\bRLTS\.T(\d+)A\d+";           // get taxon id from doi
static string plain_text_taxon_id_url = @"https?://(?:www|oldredlist)\.iucnredlist\.org/\S+?/(\d+)\S+"; // get taxon id from url


//---------------------------< I U C N   S T A T U S >--------------------------------------------------------

static string iucn_status_template_pattern = @"(\{\{\s*IUCN status[^\}]+\})";
static string iucn_status_lead = @"(\{\{\s*IUCN status\s*\|\s*)";
static string iucn_status_status = iucn_status_lead + IS_IUCN_STATUS;
static string iucn_status_id = @"(\{\{\s*IUCN status\s*\|[^\|]+\|\s*)(\d+)";


// Monkbot_task_19_cite_iucn_update.cs

Retrieved from "https://en.wikipedia.org/w/index.php?title=User:Monkbot/task_19:_cite_iucn_update&oldid=1167672054"





This page was last edited on 29 July 2023, at 05:31 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Code of Conduct

Developers

Statistics

Cookie statement

Mobile view



Wikimedia Foundation
Powered by MediaWiki