Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Approaches and methods  



1.1  Context clustering  





1.2  Word clustering  





1.3  Co-occurrence graphs  







2 Applications  





3 Software  





4 See also  





5 References  














Word-sense induction







Eesti
Italiano

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Incomputational linguistics, word-sense induction (WSI) or discrimination is an open problemofnatural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.

Approaches and methods[edit]

The output of a word-sense induction algorithm is a clustering of contexts in which the target word occurs or a clustering of words related to the target word. Three main methods have been proposed in the literature:[1][2]

Context clustering[edit]

The underlying hypothesis of this approach is that, words are semantically similar if they appear in similar documents, with in similar context windows, or in similar syntactic contexts.[3] Each occurrence of a target word in a corpus is represented as a context vector. These context vectors can be either first-order vectors, which directly represent the context at hand, or second-order vectors, i.e., the contexts of the target word are similar if their words tend to co-occur together. The vectors are then clustered into groups, each identifying a sense of the target word. A well-known approach to context clustering is the Context-group Discrimination algorithm [4] based on large matrix computation methods.

Word clustering[edit]

Word clustering is a different approach to the induction of word senses. It consists of clustering words, which are semantically similar and can thus bear a specific meaning. Lin’s algorithm [5] is a prototypical example of word clustering, which is based on syntactic dependency statistics, which occur in a corpus to produce sets of words for each discovered sense of a target word.[6] The Clustering By Committee (CBC) [7] also uses syntactic contexts, but exploits a similarity matrix to encode the similarities between words and relies on the notion of committees to output different senses of the word of interest. These approaches are hard to obtain on a large scale for many domain and languages.

Co-occurrence graphs[edit]

The main hypothesis of co-occurrence graphs assumes that the semantics of a word can be represented by means of a co-occurrence graph, whose vertices are co-occurrences and edges are co-occurrence relations. These approaches are related to word clustering methods, where co-occurrences between words can be obtained on the basis of grammatical [8] or collocational relations.[9] HyperLex is the successful approaches of a graph algorithm, based on the identification of hubs in co-occurrence graphs, which have to cope with the need to tune a large number of parameters.[10] To deal with this issue several graph-based algorithms have been proposed, which are based on simple graph patterns, namely Curvature Clustering, Squares, Triangles and Diamonds (SquaT++), and Balanced Maximum Spanning Tree Clustering (B-MST).[11] The patterns aim at identifying meanings using the local structural properties of the co-occurrence graph. A randomized algorithm which partitions the graph vertices by iteratively transferring the mainstream message (i.e. word sense) to neighboring vertices[12] is Chinese Whispers. By applying co-occurrence graphs approaches have been shown to achieve the state-of-the-art performance in standard evaluation tasks.

Applications[edit]

Software[edit]

See also[edit]

References[edit]

  1. ^ Navigli, R. (2009). "Word Sense Disambiguation: A Survey" (PDF). ACM Computing Surveys. 41 (2): 1–69. doi:10.1145/1459352.1459355. S2CID 461624.
  • ^ Nasiruddin, M. (2013). A State of the Art of Word Sense Induction: A Way Towards Word Sense Disambiguation for Under-Resourced Languages (PDF). TALN-RÉCITAL 2013. Les Sables d'Olonne, France. pp. 192–205.
  • ^ Van de Cruys, T. (2010). "Mining for Meaning. The Extraction of Lexico-Semantic Knowledge from Text" (PDF).
  • ^ Schütze, H. (1998). Dimensions of meaning. 1992 ACM/IEEE Conference on Supercomputing. Los Alamitos, CA: IEEE Computer Society Press. pp. 787–796. doi:10.1109/SUPERC.1992.236684.
  • ^ Lin, D. (1998). Automatic retrieval and clustering of similar words (PDF). 17th International Conference on Computational linguistics (COLING). Montreal, Canada. pp. 768–774.
  • ^ Van de Cruys, Tim; Apidianaki, Marianna (2011). "Latent Semantic Word Sense Induction and Disambiguation" (PDF).
  • ^ Lin, D.; Pantel, P. (2002). Discovering word senses from text. 8th International Conference on Knowledge Discovery and Data Mining (KDD). Edmonton, Canada. pp. 613–619. CiteSeerX 10.1.1.12.6771.
  • ^ Widdows, D.; Dorow, B. (2002). A graph model for unsupervised lexical acquisition (PDF). 19th International Conference on Computational Linguistics (COLING). Taipei, Taiwan. pp. 1–7.
  • ^ a b Véronis, J. (2004). "Hyperlex: Lexical cartography for information retrieval" (PDF). Computer Speech and Language. 18 (3): 223–252. CiteSeerX 10.1.1.66.6499. doi:10.1016/j.csl.2004.05.002.
  • ^ Agirre, E.; Martinez, D.; De Lacalle, O. Lopez; Soroa, A. Two graph-based algorithms for state-of-the-art WSD (PDF). 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP). Sydney, Australia. pp. 585–593.
  • ^ Di Marco, A.; Navigli, R. (2013). "Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction" (PDF). Computational Linguistics. 39 (3): 709–754. doi:10.1162/coli_a_00148. S2CID 1775181.
  • ^ Biemann, C. (2006). "Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems" (PDF).
  • ^ Navigli, R.; Crisafulli, G. Inducing Word Senses to Improve Web Search Result Clustering (PDF). 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010). Massachusetts, USA: MIT Stata Center. pp. 116–126.
  • ^ Nasiruddin, M.; Schwab, D.; Tchechmedjiev, A.; Sérasset, G.; Blanchon, H. Induction de sens pour enrichir des ressources lexicales (Word Sense Induction for the Enrichment of Lexical Resources) (PDF). 21ème conférence sur le Traitement Automatique des Langues Naturelles (TALN 2014). Marseille, France. pp. 598–603.

  • Retrieved from "https://en.wikipedia.org/w/index.php?title=Word-sense_induction&oldid=1172648440"

    Categories: 
    Natural language processing
    Computational linguistics
    Semantics
    Lexical semantics
    Word-sense disambiguation
     



    This page was last edited on 28 August 2023, at 11:56 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki