Visual Commonsense Reasoning

On VCR, a model must not only answer commonsense visual questions, but also provide a rationale that explains why the answer is true.


Submitting to the leaderboard

Submission is easy! You just need to email Rowan with your predictions. Formatting instructions are below:

Please include in your email 1) a name for your model, 2) your team name (including your affiliation), and optionally, 3) a github repo or paper link.

I'll try to get back to you within a few days, usually sooner. Teams can only submit results from a model once every 7 days.

I reserve the right to not score any of your submissions if you cheat -- for instance, please don't make up a bunch of fake names / email addresses and send me multiple submissions under those names.


What kinds of submissions are allowed?

The only constraint is that your system must predict the answer first, then the rationale. (The rationales were selected to be highly relevant to the correct Q,A pair, so they leak information about the correct answer.)

  • To deter this, the submission format involves submitting predictions for each possible rationale, conditioned on each possible answer.
  • A simple way of setting up the experiments (used in the paper) is to consider a task with query and four response choices. For Q->A the query is the question, and the response choices are the answers. For QA->R, the query is the question and answer, concatenated together, and the response choices are the rationales.

Questions?

If it's not about something private, check out the google group below:

VCR Leaderboard

There are two different subtasks to VCR:

  • Question Answering (Q->A): In this setup, a model is provided a question, and has to pick the best answer out of four choices. Only one of the four is correct.
  • Answer Justification (QA->R): In this setup, a model is provided a question, along with the correct answer, and it has to justify it by picking the best rationale out of four choices.

We combine the two parts with the Q->AR metric, in which a model only gets a question right if it answers correctly and picks the right rationale. Models are evaluated in terms of accuracy (%). How well will your model do?

Rank Model Q->A QA->R Q->AR
Human Performance

University of Washington

(Zellers et al. '18)
91.0 93.0 85.0

📼

September 30, 2019
UNITER-large (ensemble)

MS D365 AI

https://arxiv.org/abs/1909.11740
79.883.466.8

2

September 23, 2019
UNITER-large (single model)

MS D365 AI

https://arxiv.org/abs/1909.11740
77.380.862.8

3

August 9,2019
ViLBERT (ensemble of 10 models)

Georgia Tech & Facebook AI Research

https://arxiv.org/abs/1908.02265
76.478.059.8

4

September 23,2019
VL-BERT (single model)

MSRA & USTC

https://arxiv.org/abs/1908.08530
75.878.459.7

5

August 9,2019
ViLBERT (ensemble of 5 models)

Georgia Tech & Facebook AI Research

https://arxiv.org/abs/1908.02265
75.777.558.8

6

September 4,2019
Unicoder-VL (ensemble of 2 models)

MSRA & PKU

https://arxiv.org/abs/1908.06066
76.077.158.6

7

September 23, 2019
UNITER-base (single model)

MS D365 AI

https://arxiv.org/abs/1909.11740
75.077.258.2

8

May 13, 2019
B2T2 (ensemble of 5 models)

Google Research

http://arxiv.org/abs/1908.05054
74.077.157.1

9

May 13, 2019
B2T2 (single model)

Google Research

http://arxiv.org/abs/1908.05054
72.675.755.0

10

August 27,2019
Unicoder-VL (single model)

MSRA & PKU

https://arxiv.org/abs/1908.06066
73.474.554.9

11

July 30,2019
ViLBERT (single model)

Georgia Tech & Facebook AI Research

https://arxiv.org/abs/1908.02265
73.374.654.8

12

November 14, 2019
CAR

Sun Yat-sen University

72.273.453.2

13

May 22, 2019
TNet (ensemble of 5)

FlyingPig (Sun Yat-sen University)

72.772.653.0

14

July 5,2019
VisualBERT

UCLA & AI2 & PKU

https://arxiv.org/abs/1908.03557
71.673.252.4

15

September 4, 2019
MKDN

Peking University

70.771.550.5

16

October 25, 2019
TAB-VCR

UIUC

https://arxiv.org/abs/1910.14671
70.471.750.5

17

May 22, 2019
TNet (single model)

FlyingPig (Sun Yat-sen University)

70.970.650.4

18

November 29, 2019
WWR-Net

Tianjin University

70.870.750.2

19

October 13, 2019
transformer-r2c

SYSU NEW BANNER (Sun Yat-sen University)

70.770.550.0

20

September 4,2019
CAR

Sun Yat-sen University

70.570.449.8

21

May 13, 2019
HGL

HCP

70.170.849.8

22

May 18, 2019
CCD

Anonymous

68.570.548.4

23

May 16, 2019
MRCNet

MILAB (Seoul National University)

68.470.548.4

24

May 17, 2019
MUGRN

NeurIPS 2019 submission ID 21

68.269.447.5

25

May 13, 2019
SGRE

UTS

https://github.com/AmingWu/Multi-modal-Circulant-Fusion/
67.569.746.9

26

Feb 19, 2019
FAIR

Facebook AI Research

65.770.146.3

27

November 28, 2019
DAF

Beijing Institute of Technology

66.968.746.0

28

Feb 25, 2019
CKRE

Peking University

66.968.245.9

29

May 14, 2019
emnet

DCP

66.668.045.4

30

Nov 28, 2018
Recognition to Cognition Networks

University of Washington

https://github.com/rowanz/r2c
65.167.344.0

31

May 20, 2019
DVD

SL

66.365.043.3

32

March 27, 2019
GS Reasoning

UC San Diego

65.761.041.1

33

May 17, 2019
R2R (text only)

Anonymous

58.469.140.5

34

Nov 28, 2018
BERT-Base

Google AI Language (experiment by Rowan)

https://github.com/google-research/bert
53.964.535.0

35

Nov 28, 2018
MLB

Seoul National University (experiment by Rowan)

https://github.com/jnhwkim/MulLowBiVQA
46.236.817.2

36

November 15, 2019
Visual-Lang-base (ensemble)

Tiny-Group

17.774.113.3
Random Performance 25.0 25.0 6.2