To the top

Page Manager: Webmaster
Last update: 9/11/2012 3:13 PM

Tell a friend about this page
Print version

Splitting rocks: Learning… - University of Gothenburg, Sweden Till startsida
To content Read more about how we use cookies on

Splitting rocks: Learning word sense representations from corpora and lexica

Doctoral thesis
Authors Luis Nieto Piña
Date of public defense 2019-09-13
Opponent at public defense Lilja Øvrelid
ISBN 978-91-87850-75-2
Publisher University of Gothenburg
Place of publication Gothenburg
Publication year 2019
Published at Department of Swedish
Language en
Keywords language technology, natural language processing, distributional models, semantic representations, distributed representations, word senses, embeddings, word sense disambiguation, linguistic resources, corpus, lexicon, machine learning, neural networks
Subject categories Computational linguistics, Language Technology (Computational Linguistics)


The representation of written language semantics is a central problem of language technology and a crucial component of many natural language processing applications, from part-of-speech tagging to text summarization. These representations of linguistic units, such as words or sentences, allow computer applications that work with language to process and manipulate the meaning of text. In particular, a family of models has been successfully developed based on automatically learning semantics from large collections of text and embedding them into a vector space, where semantic or lexical similarity is a function of geometric distance. Co-occurrence information of words in context is the main source of data used to learn these representations. Such models have typically been applied to learning representations for word forms, which have been widely applied, and proven to be highly successful, as characterizations of semantics at the word level. However, a word-level approach to meaning representation implies that the different meanings, or senses, of any polysemic word share one single representation. This might be problematic when individual word senses are of interest and explicit access to their specific representations is required. For instance, in cases such as an application that needs to deal with word senses rather than word forms, or when a digital lexicon's sense inventory has to be mapped to a set of learned semantic representations. In this thesis, we present a number of models that try to tackle this problem by automatically learning representations for word senses instead of for words. In particular, we try to achieve this by using two separate sources of information: corpora and lexica for the Swedish language. Throughout the five publications compiled in this thesis, we demonstrate that it is possible to generate word sense representations from these sources of data individually and in conjunction, and we observe that combining them yields superior results in terms of accuracy and sense inventory coverage. Furthermore, in our evaluation of the different representational models proposed here, we showcase the applicability of word sense representations both to downstream natural language processing applications and to the development of existing linguistic resources.

Page Manager: Webmaster|Last update: 9/11/2012

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?