S3VQA

Select, Substitute, Search

A New Benchmark for Knowledge-Augmented Visual Question Answering

Home Challenge Leaderboard Code Tutorial Contact

This website is a resource for the paper Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering(arXiv link)

OVERVIEW

S3VQA provides a new approach that involves Select, Substitute, and Search (SSS) for open-domain visual question answering. S3 reaches the end result (i.e. natural language answer) for the VQA type query by first reformulating the input question (using Select and Substitute) and then retrieving external knowledge source facts (using Search). We, as part of this work, provide: a) OKVQAS3, produced by subsetting and annotating OKVQA to fit S3 specifications, b) S3VQA, created from the ground up to S3 specifications.

S3VQA

  • 6765 question-image pairs
  • Average answer length: 3 (words)
  • Automatic evaluation metric

OKVQAS3

  • 2640 question-image pairs
  • Average answer length: 1.5 (words)
  • Automatic evaluation metric

DATASET

OKVQAS3

We provide additional annotations for the OKVQA dataset. We tag each of the questions in the OK-VQA dataset into one of the four types:

Additional annoatations for Type 1 subset of OKVQA - OKVQAS3: For each of the question in this subset, we provide 'annotated span' and 'annotated object' which are needed as part of query reformulation.


Span and Object can be described as:

Download the annotated files :

OKVQAS3 Questions

Images (COCO)

S3VQA

As part of this work we also release a new challenge dataset S3VQA. The dataset was created using the images from the Open Images dataset. For our dataset, we provide two files each for train and test split:
  • which contains question_id, annotated span, annotated object and annotated answer
  • which contains question_id, image_id and question
Download the annotated files :

Images (OpenImages)


CHALLENGE

Task : Given an image and a natural language question about the image, the task is to find each of the following items

There are 2 tracks in this challenge:
OKVQAS3
S3VQA
You will need to create a JSON file with the name "output.json" containing your results in the correct format and submit the ".zip" file.

Follow the below link to access the challenge :


LEADERBOARD

OKVQAS3

S3VQA


CODE

Refer the code from the link given below:

Source Code


TUTORIAL




Contact Us

Address

Indian Institute Of Technology, Main gate road, IIT Area, Powai

Mumbai, India

mayankk@cse.iitb.ac.in