Visual Commonsense Reasoning

On VCR, a model must not only answer commonsense visual questions, but also provide a rationale that explains why the answer is true.


Submitting to the leaderboard

Submission is easy! You just need to email Rowan (rowanz at cs.washington.edu) with your predictions. Formatting instructions are below:

Please include in your email 1) a name for your model, 2) your team name (including your affiliation), and optionally, 3) a github repo or paper link.

If you submit an ensemble, please tell me how many models you used in your ensemble.

I'll try to get back to you within a few days, usually sooner. Teams can only submit results from a model once every 7 days.

I reserve the right to not score any of your submissions if you cheat -- for instance, please don't make up a bunch of fake names / email addresses and send me multiple submissions under those names.


What kinds of submissions are allowed?

The only constraint is that your system must predict the answer first, then the rationale. (The rationales were selected to be highly relevant to the correct Q,A pair, so they leak information about the correct answer.)

  • To deter this, the submission format involves submitting predictions for each possible rationale, conditioned on each possible answer.
  • A simple way of setting up the experiments (used in the paper) is to consider a task with query and four response choices. For Q->A the query is the question, and the response choices are the answers. For QA->R, the query is the question and answer, concatenated together, and the response choices are the rationales.

Questions?

If it's not about something private, check out the google group below:

VCR Leaderboard

There are two different subtasks to VCR:

  • Question Answering (Q->A): In this setup, a model is provided a question, and has to pick the best answer out of four choices. Only one of the four is correct.
  • Answer Justification (QA->R): In this setup, a model is provided a question, along with the correct answer, and it has to justify it by picking the best rationale out of four choices.

We combine the two parts with the Q->AR metric, in which a model only gets a question right if it answers correctly and picks the right rationale. Models are evaluated in terms of accuracy (%). How well will your model do?

Rank Model Q->A QA->R Q->AR
Human Performance

University of Washington

(Zellers et al. '18)
91.0 93.0 85.0

šŸ“¼

July 10, 2024
OV-Grounding

Anonymous

91.492.885.0

2

September 11, 2023
GPT4RoI

Anonymous

89.491.081.6

3

January 19, 2024
ViP-LLaVa

Anonymous

89.290.981.3

4

March 14, 2024
LLME-VCR

Advanced Data Engineering and Real-time Computing Laboratory (ADE), School of Computor Science and Technology, Huazhong University of Science and Technology (HUST).

87.086.675.7

5

November 28, 2022
HunYuan_vcr

Tencent Data Platform

85.888.075.6

6

January 28, 2023
SP-VCR (ensemble of 4 models)

Shopee MMU

83.688.674.4

7

April 8, 2022
KS-MGSR

KDDI Research and SNAP

85.386.974.3

8

January 17, 2022
VLUA+

Kuaishou MMU

84.887.074.0

9

February 21, 2022
VQA-GNN + MerlotReserve-Large (ensemble of 2 models)

Anonymous

85.286.674.0

10

April 8, 2022
VLMT-VCR

Anonymous

84.785.972.9

11

December 14, 2021
VL-RoBERTa

Joint Laboratory of HIT and iFLYTEK Research (HFL)

84.286.472.8

12

August 25, 2022
SP-VCR (single model)

Shopee MMU

82.387.472.2

13

June 28, 2021
VLUA (single model)

Kuaishou MMU

82.387.072.0

14

November 1, 2021
šŸ·MerlotReserve-Large

University of Washington / AI2

https://rowanzellers.com/merlotreserve
84.084.971.5

15

May 10, 2021
UNIMO+ERNIE(ensemble of 7 models)

Baidu NLP

https://arxiv.org/abs/2012.15409
82.386.571.4

16

November 19, 2020
BLENDER (single model)

WeSee AI team, Tencent

81.686.470.8

17

June 24, 2020
ERNIE-ViL-large(ensemble of 15 models)

ERNIE-team - Baidu

https://arxiv.org/abs/2006.16934
81.686.170.5

18

January 12, 2024
EventLens-large

Advanced Data Engineering and Real-time Computing Laboratory (ADE), School of Computor Science and Technology, Huazhong University of Science and Technology (HUST).

82.782.768.5

19

October 28, 2020
MMCNet (ensemble of 4 models)

UC Berkeley

80.083.166.9

20

September 30, 2019
UNITER-large (ensemble of 10 models)

MS D365 AI

https://arxiv.org/abs/1909.11740
79.883.466.8

21

April 7, 2022
RobustNet

Anonymous

78.884.066.5

22

June 24, 2020
ERNIE-ViL-large(single model)

ERNIE-team - Baidu

https://arxiv.org/abs/2006.16934
79.283.566.3

23

November 15, 2021
ADVL (single model, formerly CLIP-TD)

Anonymous

79.682.966.2

24

June 8, 2021
DELV

Intel Labs Cognitive AI & MSRA NLC

79.482.565.8

25

May 22, 2020
VILLA-large (single model)

MS D365 AI

https://arxiv.org/pdf/2006.06195.pdf)
78.982.865.7

26

January 12, 2024
EventLens-large

Advanced Data Engineering and Real-time Computing Laboratory (ADE), School of Computor Science and Technology, Huazhong University of Science and Technology (HUST).

81.080.365.5

27

May 28, 2021
MERLOT (single model)

University of Washington / AI2

https://rowanzellers.com/merlot
80.680.465.1

28

October 30, 2020
VitsNet

Carnegie Mellon University

78.881.864.6

29

July 16, 2020
gnimix

anonymous

78.981.064.1

30

June 5, 2021
SEITU

Anonymous

https://github.com/MyLittleChange/SEITU
77.980.763.0

31

September 23, 2019
UNITER-large (single model)

MS D365 AI

https://arxiv.org/abs/1909.11740
77.380.862.8

32

February 15, 2022
VQA-GNN

Anonymous

77.980.062.8

33

November 1, 2021
šŸ·MerlotReserve-Base

University of Washington / AI2

https://rowanzellers.com/merlotreserve
79.378.762.6

34

March 15, 2022
VCR-test

Anonymous

77.380.262.4

35

June 24, 2020
ERNIE-ViL-base(single model)

ERNIE-team - Baidu

https://arxiv.org/abs/2006.16934
77.080.362.1

36

September 10, 2021
Kam-net

Anonymous

77.479.261.8

37

April 9, 2024
ICAR : Image Compression and Attentional Redundancy for Visual Commonsense Reasoning

Prof. Xie Yiyuan and student Shang Chaofan, School of Electronic Information Engineering, Southwest University, Beibei District, Chongqing, China

77.179.161.3

38

May 22, 2020
VILLA-base (single model)

MS D365 AI

https://arxiv.org/pdf/2006.06195.pdf)
76.479.160.6

39

October 22, 2020
MMCNet

UC Berkeley

76.079.360.6

40

February 17, 2021
Test_VILLA

CCR

76.679.060.6

41

December 3, 2022
VILLA_TEST

Alignment, Shandong University

76.679.060.6

42

December 3, 2022
VILLA_GIST

Alignment, Shandong University

76.479.060.4

43

April 23, 2020
KVL-BERT

Beijing Institute of Technology

76.478.660.3

44

August 9,2019
ViLBERT (ensemble of 10 models)

Georgia Tech & Facebook AI Research

https://arxiv.org/abs/1908.02265
76.478.059.8

45

March 23, 2019
VVT

runningcat

75.378.959.7

46

September 23,2019
VL-BERT (single model)

MSRA & USTC

https://arxiv.org/abs/1908.08530
75.878.459.7

47

August 2, 2021
PVL (single model)

PVL-team (UCLA)

76.977.159.7

48

September 10, 2021
SGEITL

Anonomous

76.078.059.6

49

April 1, 2023
UNITER_kd

WalkingDog, Shandong University

75.777.759.0

50

January 8, 2021
vlt (single model)

Anonymous

75.377.858.9

51

August 9,2019
ViLBERT (ensemble of 5 models)

Georgia Tech & Facebook AI Research

https://arxiv.org/abs/1908.02265
75.777.558.8

52

December 14, 2022
UNITER_joint

WalkingDog, Shandong University

75.677.558.8

53

June 16, 2021
GITRL

Anonomous

75.577.558.7

54

September 4,2019
Unicoder-VL (ensemble of 2 models)

MSRA & PKU

https://arxiv.org/abs/1908.06066
76.077.158.6

55

March 5, 2022
PEVL

Anonymous

76.076.758.6

56

December 14, 2022
UNITER_independent

WalkingDog, Shandong University

75.577.358.6

57

September 23, 2019
UNITER-base (single model)

MS D365 AI

https://arxiv.org/abs/1909.11740
75.077.258.2

58

November 5, 2020
TDN

VARMS (Sun Yat-sen University)

75.776.458.0

59

May 13, 2019
B2T2 (ensemble of 5 models)

Google Research

http://arxiv.org/abs/1908.05054
74.077.157.1

60

April 5, 2024
BLIP-VCR

Anonymous

75.874.356.8

61

March 13, 2023
GPT4 4-shot

OpenAI (experiment by Rowan)

73.575.456.2

62

September 11, 2021
YTX

Harbin Institute of Technology

73.874.855.4

63

April 1, 2023
VL-BERT_kd

WalkingDog, Shandong University

73.275.755.4

64

May 13, 2019
B2T2 (single model)

Google Research

http://arxiv.org/abs/1908.05054
72.675.755.0

65

August 27,2019
Unicoder-VL (single model)

MSRA & PKU

https://arxiv.org/abs/1908.06066
73.474.554.9

66

July 30,2019
ViLBERT (single model)

Georgia Tech & Facebook AI Research

https://arxiv.org/abs/1908.02265
73.374.654.8

67

November 29, 2022
VL-BERT_prec

WalkingDog, Shandong University

73.474.554.8

68

March 5, 2022
ALBEF

Salesforce Research (experiment by Anonymous)

https://arxiv.org/abs/2107.07651
72.974.554.7

69

May 26, 2021
ViLBERT_GCN

Beijing Institute of Technology

73.374.454.6

70

April 7, 2021
DCGR

Harbin Institute of Technology

72.874.254.3

71

March 11, 2021
CARC

Harbin Institute of Technology

72.973.854.1

72

November 14, 2019
HGL

Sun Yat-sen University

72.273.453.2

73

May 22, 2019
TNet (ensemble of 5)

FlyingPig (Sun Yat-sen University)

72.772.653.0

74

March 31, 2021
CMR

flying melon

72.372.852.8

75

July 5,2019
VisualBERT

UCLA & AI2 & PKU

https://arxiv.org/abs/1908.03557
71.673.252.4

76

February 22, 2022
TAB-KD

Anonymous

71.672.852.4

77

March 8,2021
SAC

SAC

71.772.852.2

78

January 5, 2022
GTEHG

MIC-Tongji University

71.272.451.7

79

April 13, 2021
A3 Net

A3 Net

71.272.051.4

80

January 17, 2022
JCL

Tinjin University

70.771.650.9

81

November 29, 2022
TAB-VCR_attribute

WalkingDog, Shandong University

70.571.650.8

82

September 4, 2019
MKDN

Peking University

70.771.550.5

83

October 25, 2019
TAB-VCR

UIUC

https://arxiv.org/abs/1910.14671
70.471.750.5

84

June 3, 2020
RobustCL

NeurIPS 2020 submission ID1218

70.471.550.5

85

October 30, 2021
YXY

Peking University

70.671.250.5

86

May 22, 2019
TNet (single model)

FlyingPig (Sun Yat-sen University)

70.970.650.4

87

March 26, 2021
UABE

WalkingDog from Qilu University of Technology

70.371.350.4

88

November 29, 2019
WWR-Net

Tianjin University

70.870.750.2

89

October 13, 2019
transformer-r2c

SYSU NEW BANNER (Sun Yat-sen University)

70.770.550.0

90

May 13, 2019
HGL

HCP

https://arxiv.org/abs/1910.11475
70.170.849.8

91

September 4,2019
CAR

Sun Yat-sen University

70.570.449.8

92

December 6,2020
SIA V1

SIA

69.671.349.8

93

September 3, 2020
SFW1

Jiangsu University

70.270.749.7

94

January 20, 2022
CCN-KD

Anonymous

70.070.649.7

95

September 3, 2020
SFW2

Jiangsu University

69.671.149.6

96

May 3, 2021
RKB

Hefei University of Technology

69.670.749.3

97

June 10, 2021
PUV (Pretrain UNITER by VC feature)

Anonymous

69.870.249.3

98

February 3, 2021
vlb (single model)

Anonymous

69.869.848.9

99

June 10, 2021
BLU

Anonomous

69.370.148.9

100

May 16, 2019
MRCNet

MILAB (Seoul National University)

68.470.548.4

101

May 18, 2019
CCD

Anonymous

68.570.548.4

102

May 17, 2019
MUGRN

NeurIPS 2019 submission ID 21

68.269.447.5

103

May 13, 2019
SGRE

UTS

https://github.com/AmingWu/Multi-modal-Circulant-Fusion/
67.569.746.9

104

January 30, 2022
R2C-KD

Anonymous

66.969.446.6

105

February 19, 2019
FAIR

Facebook AI Research

65.770.146.3

106

January 20, 2022
CCN-NKD

Anonymous

67.268.246.1

107

November 28, 2019
DAF

Beijing Institute of Technology

66.968.746.0

108

February 25, 2019
CKRE

Peking University

66.968.245.9

109

October 30, 2021
MIE

flying melon

66.268.845.5

110

August 8, 2022
ATGAN

MMT, Tianjin University

63.471.945.5

111

May 14, 2019
emnet

DCP

66.668.045.4

112

Nov 28, 2018
Recognition to Cognition Networks

University of Washington

https://github.com/rowanz/r2c
65.167.344.0

113

May 20, 2019
DVD

SL

66.365.043.3

114

March 27, 2019
GS Reasoning

UC San Diego

65.761.041.1

115

May 17, 2019
R2R (text only)

Anonymous

58.469.140.5

116

May 28, 2021
R2CC

Arjun Singh

60.761.737.6

117

Nov 28, 2018
BERT-Base

Google AI Language (experiment by Rowan)

https://github.com/google-research/bert
53.964.535.0

118

June 18, 2021
MUR

Anonymous

46.546.025.6

119

September 15, 2021
SG-QA-model

Anonymous

70.924.917.7

120

Nov 28, 2018
MLB

Seoul National University (experiment by Rowan)

https://github.com/jnhwkim/MulLowBiVQA
46.236.817.2

121

October 23, 2021
BERT-base-vc-ft

LUKA-Axe

58.624.914.5

122

November 15, 2019
Visual-Lang-base (ensemble)

Tiny-Group

17.774.113.3
Random Performance 25.0 25.0 6.2