Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 History of development  





2 Features  



2.1  Keywords and terminology extraction  





2.2  SKELL  







3 List of text corpora  





4 Architecture  



4.1  Manatee  





4.2  Bonito  





4.3  Corpus Architect  







5 Applications  





6 References  





7 Further reading  





8 External links  














Sketch Engine






Українська
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




In other projects  



Wikimedia Commons
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Original author(s)Adam Kilgarriff, Pavel Rychlý
Developer(s)Lexical Computing CZ s.r.o.
Initial release23 July 2003; 20 years ago (2003-07-23)[1]
Written inGo, JavaScript, jQuery, C++, Python
Operating systemLinux, Mac OS X
PlatformIA-32, x64orIA-64
Standard(s)Unicode
Available in11 languages
List of languages
Arabic, Crimean Tatar, Czech, English, French, German, Irish, Italian, Nko, Spanish, Ukrainian
TypeCorpus manager for 90+ languages, database management system
LicenseProprietary software; both commercial and freeware editions are available
Websitewww.sketchengine.eu

Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language behaviour (lexicographers, researchers in corpus linguistics, translatorsorlanguage learners) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour.[2] Currently, it supports and provides corpora in over 90 languages.[3]

History of development[edit]

Sketch Engine is a product of Lexical Computing, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff.[4] He started a collaboration with Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre, Masaryk University,[5] and the developer of Manatee and Bonito (two major parts of the software suite). Kilgarriff also introduced the concept of word sketches.

Since then, Sketch Engine has been commercial software, however, all the core features of Manatee and Bonito that were developed by 2003 (and extended since then) are freely available under the GPL license within the NoSketch Engine suite.[6]

Features[edit]

A list of tools available in Sketch Engine:

Keywords and terminology extraction[edit]

Sketch Engine can perform automatic term extraction by identifying words typical of a particular corpus, document, or text. Single words and multi-word units can be extracted from monolingual or bilingual texts. The terminology extraction feature provides a list of relevant terms based on comparison with a large corpus of general language. This functionality is also available as a separate service called OneClick Terms with a dedicated interface.[8]

SKELL[edit]

A free web service based on Sketch Engine and aimed at language learners and teachers is SKELL (formerly SkELL). It exploits Sketch Engine's proprietary GDEX (Good Dictionary Examples) scoring function to provide authentic example sentences for specific target words. Results are drawn from a special corpus of high-quality texts covering everyday, standard, formal, and professional language and displayed as a concordance. SKELL also includes simplified versions of Sketch Engine's word sketch and thesaurus functions.[9]

It has been suggested that SKELL can be used, for instance, to help students understand the meaning and/or usage of a word or phrase; to help teachers wanting to use example sentences in a class; to discover and explore collocates; to create gap-fill exercises; to teach various kinds of homonyms and polysemous words.[10][11] SKELL was first presented in 2014, when only English was supported.[9] Later, support was added for Russian,[12] Czech,[13] German,[14] Italian[15] and Estonian.[16]

List of text corpora[edit]

Sketch Engine provides access to more than 700 text corpora. There are monolingual as well as multilingual corpora of different sizes (from thousand of words up to 60 billions of words) and various sources (e.g. web, books, subtitles, legal documents). The list of corpora includes British National Corpus, Brown Corpus, Cambridge Academic English Corpus and Cambridge Learner Corpus, CHILDES corpora of child language, OpenSubtitles (a set of 60 parallel corpora), 24 multilingual corpora of EUR-Lex documents, the TenTen Corpus Family (multi-billion web corpora), and Trends corpora (monitor corpora with daily updates).

Architecture[edit]

Sketch Engine thesaurus page
Thesaurus cloud of the lemma work in Sketch Engine

Sketch Engine consists of three main components: an underlying database management system called Manatee, a web interface search front-end called Bonito, and a web interface for corpus building and management called Corpus Architect. [17]

Manatee[edit]

Manatee is a database management system specifically devised for effective indexing of large text corpora. It is based on the idea of inverted indexing (keeping an index of all positions of a given word in the text). It has been used to index text corpora comprising tens of billions of words.[18]

Searching corpora indexed by Manatee is performed by formulating queries in the Corpus Query Language (CQL).[19]

Manatee is written in C++ and offers an API for a number of other programming languages including Python, Java, Perl and Ruby. Recently, it was rewritten into Go for faster processing of corpus queries.[20]

Bonito[edit]

Bonito is a web interface for Manatee providing access to corpus search. In the client–server model, Manatee is the server and Bonito plays the client part. It is written in Python.[17]

Corpus Architect[edit]

Corpus Architect is a web interface providing corpus building and management features. It is also written in Python.

Applications[edit]

Sketch Engine has been used by major British and other publishing houses for producing dictionaries such as Macmillan English Dictionary, Dictionnaires Le Robert, Oxford University PressorShogakukan. Four of United Kingdom's five biggest dictionary publishers use Sketch Engine.[21]

References[edit]

  1. ^ Companies House Searched on United Kingdom's registrar of companies (Company name: LEXICAL COMPUTING LIMITED or Company number: 04841901)
  • ^ Kilgarriff, Adam; Baisa, Vít; Bušta, Jan; Jakubíček, Miloš; Kovář, Vojtěch; Michelfeit, Jan; Rychlý, Pavel; Suchomel, Vít (10 July 2014). "The Sketch Engine: ten years on". Lexicography. 1 (1): 7–36. doi:10.1007/s40607-014-0009-9. ISSN 2197-4292.
  • ^ "Languages in Sketch Engine". Sketch Engine. Lexical Computing CZ s.r.o. 7 June 2016. Retrieved 22 January 2018.
  • ^ Adam Kilgarriff's home page
  • ^ Natural Language Processing Centre, Masaryk University
  • ^ NoSketch Engine
  • ^ Kilgarriff, Adam; Herman, Ondřej; Bušta, Jan; Rychlý, Pavel; Jakubíček, Miloš (2015). "DIACRAN: a framework for diachronic analysis" (PDF). Corpus Linguistics 2015: 65–70.
  • ^ Baisa, Vít (2017). "Simplifying terminology extraction: OneClick Terms" (PDF). Proceedings of the 9th International Corpus Linguistics Conference.
  • ^ a b Baisa, Vít; Suchomel, Vít (2014). "SkELL:Web Interface for English Language Learning" (PDF). Eighth Workshop on Recent Advances in Slavonic Natural Language Processing. NLP Consulting: 63–70.
  • ^ Brown, Michael H. (2016-04-07). "SkELL: Easy to use for teachers and students". Corpus Linguistics 4 EFL. Retrieved 2018-12-03.
  • ^ Brown, Michael H. (2016-04-19). "SkELL: Homonymy and Polysemy". Corpus Linguistics 4 EFL. Retrieved 2018-12-03.
  • ^ Valentina, A., Vitalevna, B. O., Малолетняя, А. П., Olga, K., & Vit, B. (2016). RuSkELL: Online Language Learning Tool for Russian Language. In Proceedings of the XVII EURALEX International Congress. Lexicography and Linguistic Diversity (6–10 September 2016) (pp. 292-300). Ivane Javakhishvili Tbilisi State University.
  • ^ Cukr, Michal (2017). Český korpus příkladových vět (Czech corpus of example sentences) (Master's thesis thesis) (in Czech). Brno: Masaryk University, Faculty of Arts. Retrieved 2017-06-22.
  • ^ "deSkELL – German corpus for SkELL | Sketch Engine". www.sketchengine.eu. Retrieved 2018-12-03.
  • ^ "itSkELL – Italian corpus for SkELL | Sketch Engine". www.sketchengine.eu. Retrieved 2018-12-03.
  • ^ "etSkELL – Estonian corpus for SkELL | Sketch Engine". www.sketchengine.eu. Retrieved 2018-12-03.
  • ^ a b Rychlý, Pavel (2007). "Manatee/bonito–a modular corpus manager" (PDF). 1st Workshop on Recent Advances in Slavonic Natural Language Processing: 65–70.
  • ^ Pomikálek, Jan; Jakubíček, Miloš; Rychlý, Pavel (2012). "Building a 70 billion word corpus of English from ClueWeb" (PDF). Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12).
  • ^ "CQL – Corpus Query Language". Sketch Engine. Lexical Computing CZ s.r.o. 15 May 2015. Retrieved 22 January 2018.
  • ^ Rychlý, Pavel; Rábara, Radoslav (2015). "Concurrent Processing of Text Corpus Queries" (PDF). Workshop on Recent Advances in Slavonic Natural Language Processing: 49–58.
  • ^ "Using Computational Lexicography for Dictionary Production with the Sketch Engine". REF Impact Case Studies. University of Brighton. Retrieved 18 April 2015.
  • Further reading[edit]

    External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Sketch_Engine&oldid=1224690270"

    Categories: 
    Applied linguistics
    Computational linguistics
    Corpus linguistics
    Database management systems
    Data mining and machine learning software
    Lexicography
    Linguistic research
    Natural language processing
    Text analysis
    Text mining
    Hidden categories: 
    CS1 Czech-language sources (cs)
    Articles with short description
    Short description matches Wikidata
    Commons category link from Wikidata
     



    This page was last edited on 19 May 2024, at 21:45 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki