Till sidans topp

Sidansvarig: Webbredaktion
Sidan uppdaterades: 2012-09-11 15:12

Tipsa en vän
Utskriftsversion

What goes into a word: ge… - Göteborgs universitet Till startsida
Webbkarta
Till innehåll Läs mer om hur kakor används på gu.se

What goes into a word: generating image descriptions with top-down spatial knowledge

Paper i proceeding
Författare Mehdi Ghanimifard
Simon Dobnik
Publicerad i Proceedings of the 12th International Conference on Natural Language Generation (INLG-2019)
Förlag Association for Computational Linguistics
Förlagsort Tokyo, Japan
Publiceringsår 2019
Publicerad vid Institutionen för filosofi, lingvistik och vetenskapsteori
Språk en
Länkar https://www.aclweb.org/anthology/W1...
https://www.inlg2019.com/assets/pap...
https://gup.ub.gu.se/file/207900
https://gup.ub.gu.se/file/207901
Ämnesord spatial descriptions, grounded neural language models, attention, representation learning
Ämneskategorier Datorlingvistik, Lingvistik, Kognitionsforskning

Sammanfattning

Generating grounded image descriptions requires associating linguistic units with their corresponding visual clues. A common method is to train a decoder language model with attention mechanism over convolutional visual features. Attention weights align the stratified visual features arranged by their location with tokens, most commonly words, in the target description. However, words such as spatial relations (e.g. next to and under) are not directly referring to geometric arrangements of pixels but to complex geometric and conceptual representations. The aim of this paper is to evaluate what representations facilitate generating image descriptions with spatial relations and lead to better grounded language generation. In particular, we investigate the contribution of four different representational modalities in generating relational referring expressions: (i) (pre-trained) convolutional visual features, (ii) spatial attention over visual features, (iii) top-down geometric relational knowledge between objects, and (iv) world knowledge captured by contextual embeddings in language models.

Sidansvarig: Webbredaktion|Sidan uppdaterades: 2012-09-11
Dela:

På Göteborgs universitet använder vi kakor (cookies) för att webbplatsen ska fungera på ett bra sätt för dig. Genom att surfa vidare godkänner du att vi använder kakor.  Vad är kakor?