Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Types of harvests  



1.1  Comprehensive harvests  





1.2  Selective harvests  





1.3  Topic harvests  







2 References  





3 External links  














Webarchiv






Čeština
Polski
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Webarchiv
Webarchiv logo

Type of site

Digital library
Available inCzech, English
Founded2000; 24 years ago (2000)
HeadquartersPrague, Czech Republic
ParentNational Library of the Czech Republic
URLWebarchiv.cz
Launched2001

Webarchiv is a digital archive of important Czech web resources, (i.e. published on the Internet) which are collected with the aim of their long-term preservation.

Preservation began in 2000, organized with help of the National Library of the Czech Republic, in cooperation with the Moravian Library and the Institute of Computer Science at Masaryk University. Nowadays Webarchiv is organized by the National Library of the Czech Republic only.

Webarchiv utilizes tools developed by the Internet Archive and the International Internet Preservation Consortium (IIPC) such as Heritrix for web archiving.[1]

Webarchiv has been a member of IIPC since 2007.

Types of harvests[edit]

The main aim of the Webarchiv project is to implement a comprehensive solution in the field of archiving of the national web, i.e. bohemical online-born documents. That includes tools and methods for collecting, archiving and preserving web resources as well as providing long-term access to them. Both large-scale automated harvesting of the entire national web and selective archiving are being carried out, including thematic „event-based“ collections. At present these methods are tested and are a subject of further research. To run all operations in a routine way, two conditions must be met: long-term funding has to be provided and the current legal issues have to be solved (primarily the legal deposit legislation).[2]

Webarchiv have two collections of archived websites. One is available via online access; it's a limited dataset whose content is covered by agreements with its original publishers. Second collection can only be accessed in the Library. According to Czech copyright law online access to archived websites is based on agreement with website owner or on Creative Commons licence. Website without this agreement are blocked from the online archive and they are accessible only from the library terminals.[3]

Comprehensive harvests[edit]

The main focus of comprehensive crawls is to automatically harvest the biggest number of Czech web resources. The list of URLs is from organisation CZ.NIC.

Selective harvests[edit]

Collection of resources with historical, scientific or cultural value manually selected. Collection is accessible online due to contracts with publishers.

The main focus of comprehensive crawls is to automatically harvest the biggest number of Czech web resources. The requirements of comprehensive crawls are:

Domain – Czech domain (.cz) web resources are collected. Resources with other domains can be also harvested, but they have to meet the optional requirements:

Other requirements are optional:[4]

Format – harvesting different formats of resources depends on a technical settings of the harvester[4]

Access – only freely accessible resources are harvested[4]

Number of files – maximum 5000 files from one domain[4]

Topic harvests[edit]

Topic collections are collections of resources which are related to certain event of topic, for example elections.

References[edit]

  1. ^ "Overview of the WebArchiv project". WebArchiv. Retrieved 18 March 2014.
  • ^ "About Webarchiv | Webarchiv.cz".
  • ^ "Frequently Asked Questions | Webarchiv.cz".
  • ^ a b c d "Comprehensive Harvests". Retrieved 2023-10-31.
  • External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Webarchiv&oldid=1182830211"

    Categories: 
    Web archiving
    Web archiving initiatives
    Hidden categories: 
    Articles with VIAF identifiers
    Articles with NKC identifiers
     



    This page was last edited on 31 October 2023, at 16:54 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki