Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 See also  





2 References  





3 External links  














Speech corpus






Deutsch
Français
Հայերեն
Oʻzbekcha / ўзбекча
Русский
Slovenščina
Suomi
Tiếng Vit


 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Aspeech corpus (orspoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognitionorspeaker identification engine).[1]Inlinguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.[2][3]

A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of speech corpora:

  1. Read Speech – which includes:
    • Book excerpts
    • Broadcast news
    • Lists of words
    • Sequences of numbers
  2. Spontaneous Speech – which includes:
    • Dialogs – between two or more people (includes meetings; one such corpus is the KEC);
    • Narratives – a person telling a story (one such corpus is the Buckeye Corpus);
    • Map-tasks – one person explains a route on a map to another;
    • Appointment-tasks – two people try to find a common meeting time based on individual schedules.

A special kind of speech corpora are non-native speech databases that contain speech with a foreign accent.

See also

[edit]

References

[edit]
  1. ^ Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). "Optimization of data-driven filterbank for automatic speaker verification". Digital Signal Processing. 104: 102795. arXiv:2007.10729. Bibcode:2020DSP...10402795S. doi:10.1016/j.dsp.2020.102795. S2CID 220665533.
  • ^ Reece, Andrew; Cooney, Gus; Bull, Peter; Chung, Christine; Dawson, Bryn; Fitzpatrick, Casey; Glazer, Tamara; Knox, Dean; Liebscher, Alex; Marin, Sebastian (2022-03-01). "Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech". arXiv:2203.00674 [cs.CL].
  • ^ "Santa Barbara Corpus of Spoken American English | Department of Linguistics - UC Santa Barbara". www.linguistics.ucsb.edu. Retrieved 2023-04-26.
  • [edit]


  • t
  • e

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Speech_corpus&oldid=1219699072"

    Categories: 
    Corpora
    Corpus linguistics
    Speech recognition
    Dialectology
    Phonetics
    Language documentation
    Website stubs
    Library and information science stubs
    Hidden category: 
    All stub articles
     



    This page was last edited on 19 April 2024, at 09:15 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki