S3VQA

OVERVIEW

S3VQA provides a new approach that involves Select, Substitute, and Search (SSS) for open-domain visual question answering. S3 reaches the end result (i.e. natural language answer) for the VQA type query by first reformulating the input question (using Select and Substitute) and then retrieving external knowledge source facts (using Search). We, as part of this work, provide: a) OKVQAS3, produced by subsetting and annotating OKVQA to fit S3 specifications, b) S3VQA, created from the ground up to S3 specifications.

S3VQA

6765 question-image pairs
Average answer length: 3 (words)
Automatic evaluation metric

OKVQA_S3

2640 question-image pairs
Average answer length: 1.5 (words)
Automatic evaluation metric

DATASET

OKVQA_S3

We provide additional annotations for the OKVQA dataset. We tag each of the questions in the OK-VQA dataset into one of the four types:

Type 1 - Question which require detecting objects and subsequent reasoning over an external knowledge source to arrive at the answer
Type 2 - Question which require reading text from the image (OCR) (and no other information) to answer
Type 3 - Question which are based on personal opinion or speculation
Type 4 - Other

Additional annoatations for Type 1 subset of OKVQA - OKVQA_S3: For each of the question in this subset, we provide 'annotated span' and 'annotated object' which are needed as part of query reformulation.

Span and Object can be described as:

Span : part of the question which needs to be replaced
Object : detected object from the image

Download the annotated files :

S3VQA

As part of this work we also release a new challenge dataset S3VQA. The dataset was created using the images from the Open Images dataset. For our dataset, we provide two files each for train and test split:

which contains question_id, annotated span, annotated object and annotated answer
which contains question_id, image_id and question

Download the annotated files :

S3VQA Annotations

Train annotations

Dev annotations

S3VQA Questions

Train questions

Dev questions

Test questions

Images (OpenImages)

Images

CHALLENGE

Task : Given an image and a natural language question about the image, the task is to find each of the following items

Span
Object
Accurate natural language answer

There are 2 tracks in this challenge:

OKVQA_S3

Train and test sets, contains 2640 question-image pairs.

S3VQA

Train and test sets, contains 6765 question-image pairs.

You will need to create a JSON file with the name "output.json" containing your results in the correct format and submit the ".zip" file.

Follow the below link to access the challenge :

Access challenge here

S3VQA

Select, Substitute, Search

This website is a resource for the paper Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering(arXiv link)

OVERVIEW

S3VQA

OKVQA_S3

DATASET

OKVQA_S3

Annotations

OKVQA_S3 Questions

Images (COCO)

S3VQA

S3VQA Annotations

S3VQA Questions

Images (OpenImages)

CHALLENGE

OKVQA_S3

S3VQA

LEADERBOARD

OKVQA_S3

S3VQA

CODE

TUTORIAL

Contact Us

Address

S3VQA

Select, Substitute, Search

This website is a resource for the paper Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering(arXiv link)

OVERVIEW

S3VQA

OKVQAS3

DATASET

OKVQAS3

Annotations

OKVQAS3 Questions

Images (COCO)

S3VQA

S3VQA Annotations

S3VQA Questions

Images (OpenImages)

CHALLENGE

OKVQAS3

S3VQA

LEADERBOARD

OKVQAS3

S3VQA

CODE

TUTORIAL

Contact Us

Address

OKVQA_S3

OKVQA_S3

OKVQA_S3 Questions

OKVQA_S3

OKVQA_S3