VCR v1.0 readme

Packaged Dec 3, 2018. Contact Rowan Zellers for help. Download the VCR annotations and images below:

By checking this box, I agree to the VCR dataset license as well as the VCR website terms and conditions.

Disclaimer/Content warning (borrowed from the one for ATOMIC): the content in VCR has been annotated from Mechanical Turk workers on top of movie images. Many of the images depict nudity, violence, or miscellaneous problematic things (such as Nazis, because in many movies Nazis are the villains). We left these in though, partially for the purpose of learning (probably negative but still important) commonsense implications about the scenes. Even then, the content covered by movies is still pretty biased and problematic, which definitely manifests in our data (men are more common than women, etc.). We filtered some especially problematic images where we could, but probably missed some. If you find something or have any concerns, please contact Rowan.

Once you download the files, place them into a directory (I suggest vcr1/) and unzip them.

Quick overview

The dataset consists of image/metadata files, and annotations. Each annotation file (train.jsonl, val.jsonl, and test.jsonl) is a jsonl file, where each line is a JSON object. The test file has the labels removed, to preserve the integrity of the leaderboard.

Here are the important things from the annotations:

Here's what you get from the JSON file located at metadata_fn:

There are pretty much all that you need to get started. I'll try to upload code and a dataloader soon.

Structure of the dataset

|-- vcr1images/
|   |-- VERSION.txt
|   |-- movie name, like movieclips_A_Fistful_of_Dollars
|   |   |-- image files, like Sv_GcxkmW4Y@29.jpg
|   |   |-- metadata files, like Sv_GcxkmW4Y@29.json
|-- train.jsonl
|-- val.jsonl
|-- test.jsonl

More detailed information for people who are really curious:

I put more information in the annotations in case they help. Some of these were taken out for the test set, to mask the labels.

The VERSION.txt contains the version info of the release.