Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 How to export  



1.1  Using 'Special:Export'  



1.1.1  1. Get the names of pages to export  





1.1.2  2. Perform the export  





1.1.3  Exporting the full history  









2 Export format  



2.1  Example  





2.2  DTD  





2.3  Processing XML export  





2.4  Details and practical advice  







3 See also  





4 Wikipedia-specific help  














Help:Export






Deutsch
 

Edit links
 









Help page
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikibooks
Wikiquote
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Wiki pages can be exported in a special XML format to import into another MediaWiki installation or use it elsewise for instance for analysing the content. See also m:Syndication feeds for exporting all other information except pages, and see Help:Import on importing pages.

How to export[edit]

There are at least six ways to export pages:

By default only the current version of a page is included. Optionally you can get all versions with date, time, user name and edit summary.

Additionally you can copy the SQL database. This is how dumps of the database were made available before MediaWiki 1.5 and it won't be explained here further.

Using 'Special:Export'[edit]

To export all pages of a namespace, for example.

1. Get the names of pages to export[edit]

2. Perform the export[edit]

and finally...

Now you can use this XML file to perform an import.

Exporting the full history[edit]

A checkbox in the Special:Export interface selects whether to export the full history (all versions of an article) or the most recent version of articles. A maximum of 1000 revisions are returned; other revisions can be requested as detailed in MW:Parameters to Special:Export.

Export format[edit]

The format of the XML file you receive is the same in all ways. This format is codified in XML Schemaathttp://www.mediawiki.org/xml/export-0.6.xsd. This format is not intended for viewing in a web browser, though some browsers show you pretty-printed XML with "+" and "-" links to view or hide selected parts. Alternatively the XML-source can be viewed using the "view source" feature of the browser, or after saving the XML file locally, with a program of choice. If you directly read the XML source it won't be difficult to find the actual wikitext. If you don't use a special XML editor "<" and ">" appear as &lt; and &gt;, to avoid a conflict with XML tags; to avoid ambiguity, "&" is coded as "&amp;".

In the current version the export format does not contain an XML replacement of wiki markup (see Wikipedia DTD for an older proposal, or Wiki Markup Language). You only get the wikitext as you get when editing the article. (After export you can use alternative parsers to convert wikitext to other format)

Example[edit]

  <mediawiki xml:lang="en">
    <page>
      <title>Page title</title>
      <!-- page namespace code -->
      <ns>0</ns>
      <id>2</id>
      <!-- If page is a redirection, element "redirect" contains title of the page redirect to -->
      <redirect title="Redirect page title" />
      <restrictions>edit=sysop:move=sysop</restrictions>
      <revision>
        <timestamp>2001-01-15T13:15:00Z</timestamp>
        <contributor>
          <username>Foobar</username>
          <id>65536</id>
        </contributor>
        <comment>I have just one thing to say!</comment>
        <text>A bunch of [[text]] here.</text>
        <minor />
      </revision>
      <revision>
        <timestamp>2001-01-15T13:10:27Z</timestamp>
        <contributor><ip>10.0.0.2</ip></contributor>
        <comment>new!</comment>
        <text>An earlier [[revision]].</text>
      </revision>
      <revision>
        <!-- deleted revision example -->
        <id>4557485</id>
        <parentid>1243372</parentid>
        <timestamp>2010-06-24T02:40:22Z</timestamp>
        <contributor deleted="deleted" />
        <model>wikitext</model>
        <format>text/x-wiki</format>
        <text deleted="deleted" />
        <sha1/>
      </revision>
    </page>
    
    <page>
      <title>Talk:Page title</title>
      <revision>
        <timestamp>2001-01-15T14:03:00Z</timestamp>
        <contributor><ip>10.0.0.2</ip></contributor>
        <comment>hey</comment>
        <text>WHYD YOU LOCK PAGE??!!! i was editing that jerk</text>
      </revision>
    </page>
  </mediawiki>

DTD[edit]

Here is an unofficial, short Document Type Definition version of the format. If you don't know what a DTD is just ignore it.

<!ELEMENT mediawiki (siteinfo?,page*)>
<!-- version contains the version number of the format (currently 0.3) -->
<!ATTLIST mediawiki
  version  CDATA  #REQUIRED 
  xmlns CDATA #FIXED "http://www.mediawiki.org/xml/export-0.3/"
  xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation CDATA #FIXED
    "http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd"
>
<!ELEMENT siteinfo (sitename,base,generator,case,namespaces)>
<!ELEMENT sitename (#PCDATA)>      <!-- name of the wiki -->
<!ELEMENT base (#PCDATA)>          <!-- url of the main page -->
<!ELEMENT generator (#PCDATA)>     <!-- MediaWiki version string -->
<!ELEMENT case (#PCDATA)>          <!-- how cases in page names are handled -->
   <!-- possible values: 'first-letter' | 'case-sensitive'
                         'case-insensitive' option is reserved for future -->
<!ELEMENT namespaces (namespace+)> <!-- list of namespaces and prefixes -->
  <!ELEMENT namespace (#PCDATA)>     <!-- contains namespace prefix -->
  <!ATTLIST namespace key CDATA #REQUIRED> <!-- internal namespace number -->
<!ELEMENT page (title,id?,restrictions?,(revision|upload)*)>
  <!ELEMENT title (#PCDATA)>         <!-- Title with namespace prefix -->
  <!ELEMENT id (#PCDATA)> 
  <!ELEMENT restrictions (#PCDATA)>  <!-- optional page restrictions -->
<!ELEMENT revision (id?,timestamp,contributor,minor?,comment,text)>
  <!ELEMENT timestamp (#PCDATA)>     <!-- according to ISO8601 -->
  <!ELEMENT minor EMPTY>             <!-- minor flag -->
  <!ELEMENT comment (#PCDATA)> 
  <!ELEMENT text (#PCDATA)>          <!-- Wikisyntax -->
  <!ATTLIST text xml:space CDATA  #FIXED "preserve">
<!ELEMENT contributor ((username,id) | ip)>
  <!ELEMENT username (#PCDATA)>
  <!ELEMENT ip (#PCDATA)>
<!ELEMENT upload (timestamp,contributor,comment?,filename,src,size)>
  <!ELEMENT filename (#PCDATA)>
  <!ELEMENT src (#PCDATA)>
  <!ELEMENT size (#PCDATA)>

Processing XML export[edit]

Many tools can process the exported XML. If you process a large number of pages (for instance a whole dump) you probably won't be able to get the document in main memory so you will need a parser based on SAX or other event-driven methods.

You can also use regular expressions to directly process parts of the XML code. These run fast but are difficult to maintain.

Please list methods and tools for processing XML export here:

Details and practical advice[edit]

/mediawiki/siteinfo/namespaces/namespace

See also[edit]

Help desk

Wikipedia-specific help[edit]


Retrieved from "https://en.wikipedia.org/w/index.php?title=Help:Export&oldid=1163701523"

Category: 
Wikipedia how-to
 



This page was last edited on 6 July 2023, at 04:46 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Code of Conduct

Developers

Statistics

Cookie statement

Mobile view



Wikimedia Foundation
Powered by MediaWiki