After the YFCC100M was released, several expansion packs have been released that enrich the collection with additional material. This page describes the three expansion packs produced by Yahoo Labs, whereas other supplemental materials produced by us as part of MMCOMMONS are described here.
Getting the Expansion Packs: The expansion packs are distributed together with the YFCC100M dataset by Yahoo Webscope. If you can access the dataset, you can also access the expansion packs. Please see here how to request the dataset. More information on each of the expansion packs described below can be found in the README document that accompanies the dataset.
A deep learning approach was used to detect 1,570 different “visual concepts” (e.g. people, animals, objects, food, events, architecture, and scenery) that are visible in the photos and in the first frames of the videos. The concept classifiers (binary SVM) were trained on features from the next to last Alexnet layer using the deep convolutional neural network (CNN) framework Caffe and tuned to achieve at least 90% precision. The respective dataset for training the SVM consisted of 15 million crowd labeled Flickr images that are not public available. An (unknown) probability fit was applied to each classifier to provide a confidence score between 0.5 and 1. If the score was below 0.5, the concept was considered to be not present. These automatic annotations constitute the autotags expansion pack. In 4307331 cases, no concept was detected or the image could not be processed.
File Format: The autotags are stored in a single Bzip2-compressed file named
yfcc100m_autotags.bz2. The file contains a line for each of the photos and videos in the YFCC100M, which are listed in exactly the same order as in the dataset. Each line begins with the photo or video identifier and, separated by a tab, is followed by a comma-separated set of autotags for that photo or video (if any). Each autotag consists of a concept label and a confidence score, separated by a colon (e.g.
dune:0.5100). If none of the classifiers indicated that a concept was present, the photo or video identifier will still be present in the file, but its set of autotags will be empty.
Errata: In some cases, there are duplicate entries for certain autotags. We suggest to stick to the higher value in this case. The relevant photoids are: [2122571487,2120710600,2118579816,2132046377,2140310774,2120231414,2091791454,2132653214,2100459728,2095260440,2084662654,2114095680,2137130224,2132488211,2201856950,2109731049,2107839134,2080254977,2123082193,152095622,2149273632,2122570887,2103042115,2120091082,2119282617,2103985017,2139791601,2109730863,2096455518,152127540,2131087079,2132648718,2123346778,2130744902,2107612480].
About half of the photos and videos in the YFCC100M are associated with a geographic coordinate, which were either automatically assigned by a GPS chip embedded in or connected to the camera, or manually assigned by a user. Yahoo’s reverse geocoding service was used to translate each coordinate to human-readable place names (e.g. the coordinate -81.804885, 24.550558 was converted to Duval Street, Key West, Monroe County, FL 33040, United States). These reverse geocodes constitute the places expansion pack. Depending on the accuracy parameter (provided by the YFCC100M metadata) related to the coordinates (longitude, latitude), those estimates can be very accurate or rather off. In 48469829 cases, the coordinates were provided.
File Format: The reverse geocodes are stored in a single Bzip2-compressed file named
yfcc100m_places.bz2. The file contains a line for each of the photos and videos in the YFCC100M, which are listed in exactly the same order as in the dataset. Each line begins with the photo or video identifier and, separated by a tab, is followed by a comma-separated set of places for that photo or video (if any). Each of the places includes its unique Where-On-Earth IDentifier, its name, and its type, separated by colons (e.g. 91476962:Lettershea:Town). As the original place names are written in their local character sets, they are URL-encoded in the expansion pack to make it easier to be processed by computer programs. If a photo or video was not associated with a geographic coordinate or it could not be reverse geocoded (e.g. it was taken in international airspace or international waters), the photo or video identifier will still be present in the file, but its set of places will be empty.
When a photo is captured, the camera usually embeds metadata into the image data. This so-called Exif metadata contains information about the camera used, what the aperture size was at time of capture, whether the flash was used, etc. When a user uploads a photo to Flickr, the original file is maintained (including the Exif) but several versions of varying sizes are created and used for display purposes. Flickr removes this metadata from these additional versions, so it needs to store less data and so it can serve them quicker. The exif expansion pack was created to make the Exif metadata readily available to those who like to use it.
File Format: The Exif metadata is stored in a single Bzip2-compressed file named
yfcc100m_exif.bz2. The file contains a line for each of the photos and videos in the YFCC100M, which are listed in exactly the same order as in the dataset. Each line begins with the photo or video identifier and, separated by a tab, is followed by a comma-separated set of colon-separated key-value pairs for that photo or video (if any). The key itself consists of two period-separated parts, namely an Exif category and an Exif tag (e.g. Exif+SubIFD.Focal+Length:50.0+mm). Since some categories, tags, and values can contain free-form text, they are all URL-encoded. If no Exif metadata was available for a photo or video, its identifier will still be present in the file, but the set of key-value pairs will be empty.