OVERVIEW
S3VQA provides a new approach that involves Select, Substitute, and Search (SSS) for open-domain visual question answering.
S3 reaches the end result (i.e. natural language answer) for the VQA type query by first reformulating the input question
(using Select and Substitute) and then retrieving external knowledge source facts (using Search). We, as part of this work, provide:
a) OKVQAS3, produced by subsetting and annotating OKVQA to fit S3 specifications,
b) S3VQA, created from the ground up to S3 specifications.
S3VQA
- 6765 question-image pairs
- Average answer length: 3 (words)
- Automatic evaluation metric
OKVQAS3
- 2640 question-image pairs
- Average answer length: 1.5 (words)
- Automatic evaluation metric
DATASET
OKVQAS3
We provide additional annotations for the OKVQA dataset.
We tag each of the questions in the OK-VQA dataset into one of the four types:
- Type 1 - Question which require detecting objects and subsequent reasoning over an external knowledge source to arrive at the answer
- Type 2 - Question which require reading text from the image (OCR) (and no other information) to answer
- Type 3 - Question which are based on personal opinion or speculation
- Type 4 - Other
Additional annoatations for Type 1 subset of OKVQA - OKVQA
S3: For each of the question in this subset,
we provide 'annotated span' and 'annotated object' which are needed as part of query reformulation.
Span and Object can be described as:
- Span : part of the question which needs to be replaced
- Object : detected object from the image
Download the annotated files :
S3VQA
As part of this work we also release a new challenge dataset S3VQA. The dataset was created using the images from the
Open Images dataset.
For our dataset, we provide two files each for train and test split:
- which contains question_id, annotated span, annotated object and annotated answer
- which contains question_id, image_id and question
Download the annotated files :
CHALLENGE
Task : Given an image and a natural language question about the image, the task is to find each of the following items
There are 2 tracks in this challenge:
OKVQAS3
- Train and test sets, contains 2640 question-image pairs.
S3VQA
- Train and test sets, contains 6765 question-image pairs.
You will need to create a JSON file with the name "output.json" containing your results in the correct format and submit the ".zip" file.
Follow the below link to access the challenge :
LEADERBOARD
OKVQAS3
S3VQA
CODE
Refer the code from the link given below:
Source Code
Contact Us
Address
Indian Institute Of Technology, Main gate road, IIT Area, Powai
Mumbai, India
mayankk@cse.iitb.ac.in