204 captures
21 Nov 2014 - 04 Mar 2026
May JUN Jul
29
2019 2020 2021
success
fail

About this capture

COLLECTED BY

Organization: Alexa Crawls

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Collection: Alexa Crawls

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
TIMESTAMPS

The Wayback Machine - http://web.archive.org/web/20200629005743/http://shannon.cs.illinois.edu:80/DenotationGraph/
 

From image descriptions to visual denotations:
 New similarity metrics for semantic inference over event descriptions

Transactions of the Association for Computational Linguistics (to appear) (pdf)
Peter Young Alice Lai Micah Hodosh Julia Hockenmaier

Abstract


 We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituents and their denotations, based on a large corpus of 30K images and 150K descriptive captions.

30k Image Caption Corpus



 To produce the denotation graph, we have created an image caption corpus consisting of 158,915 crowd-sourced captions describing 31,783 images. This is an extension of our previous Flickr 8k Dataset. The new images and captions focus on people involved in everyday activities and events.




Denotation Graph



HTML view of the Denotation Graph
Denotation Graph Download

 We define  s, the visual denotation of a linguistic expression  s(e.g. a sentence, verb phrase or noun phrase),  to be the set of images that depict what it describes. The denotation graph pairs a large number of linguistic expressions with their visual denotations, and defines a large subsumption hierarchy over these expressions.  Consider the following fragment of our denotation graph1: 


Each node in the graph corresponds to a string sand its denotation: 

Ifs(e.g. s="two dogs running" ) is the parent of t (e.g. t="two dogs running in the grass" ) in the denotation graph, sis more generic than t, and there is one linguistic operation (here, the addition of the prepositional phrase "in the grass") that turns sinto t. Hence, any image that depicts t(two dogs running in the grass) must also depict s(two dogs running), and  t   s. Thus, the visual denotation of a parent swill subsume the visual denotation of any of its children t. 

We provide below the data files that make up the denotation graph we have created from the Flickr 30k Dataset. The graph consists of a set of strings that define the nodes of the graph (dog, running, grass, etc...), the edges that connect those nodes (dog running can be created from running by adding the subject dog, and can be created from dog by adding the verb running), and the images that depict each string in the graph.

Additionally, we have computed two different kinds of denotational similarity metrics computed on the nodes in the graph: (normalized) pointwise mutual information, PMI(s, t), and conditional probabilities, P(s | t ). Our paper shows that these similarity metrics are at least as beneficial as distributional similarities for two tasks that require semantic inference. For example:

sPMI(s, play baseball)P(play baseball | s)
tag him0.6730.600
hold bat0.6270.368
try to tag0.6160.517
slide into base0.5690.278
hold bat0.6270.368
pitch0.5610.200

Approximate Textual Entailment


The approximate textual entailment task generates textual entailment items using the Flickr 30k Dataset and our denotation graph. We use captions from the Flickr 30k Dataset as premises, and try to determine if they entail strings from the denotation graph.

Premises:A woman with dark hair in bending, open mouthed, towards the back of a dark headed toddler's head.
A dark-haired woman has her mouth open and is hugging a little girl while sitting on a red blanket.
A grown lady is snuggling on the couch with a young girl and the lady has a frightened look.
A mom holding her child on a red sofa while they are both having fun.
Hypothesis:make face

Downloads


Please fill in the following form to request access to the Flickr 30k Dataset and the Denotation Graph. Note that the Flickr 30k Dataset includes images obtained from Flickr. Use of the images must abide by the Flickr Terms of Use. We do not own the copyright of the images. They are solely provided at the link below for researchers and educators who wish to use the dataset for non-commercial research and/or educational purposes.


1. In our actual denotation graph, words are lemmatized, so two dogs running becomes two dog run.