Tools and Demos

The Multimedia Commons effort includes interactive demos that show how the dataset is being used, along with analysis and retrieval tools to help researchers take advantage of this massive resource. Some of these tools are included in the resources hosted on Amazon Web Services, while other tools and demos arise from independent efforts and are hosted externally.

Jump to:
YFCC100M Browsers
Multimedia Commons Search
audioCaffe: Audio Analysis With Deep Neural Nets
videosearch100M: Semantic Search
YFCC100M Visual Search Utility
MI-File CBIR: Similarity-Based Image Retrieval
Evento360: Discovering Social Events
Other Demos and Projects
Sample image: Sunset with casino sign

YFCC100M Browsers

There are two browsers with which the dataset can be easily explored. For example, they allow you to visualize the distribution of media with particular tags across users, times, and places; view the original media with those tags on Flickr; and download the metadata for your query results. More details are available on a separate page.

audioCaffe: Audio Analysis With Deep Neural Nets

The audioCaffee audio-content analysis tool and a demonstration experiment may be found in the tools/audioCaffe/ directory in our AWS S3 data store, and is also included in the Multimedia Commons CloudFormation Template. The demonstration experiment — a MED-ium Cup of audioCaffe — uses data from the YLI Multimedia Event Detection (MED) subcorpus. It will give you a taste of what you can do with a big corpus of computed audio features like YLI and a flexible set of analysis tools.

The directory also includes a build of audioCaffe; you can check for updates at GitHub. audioCaffe is a deep neural net-based audio content analysis tool that leverages the deep-learning framework Caffe. audioCaffe is an open-source resource being developed as part of the SMASH project, which aims to provide a single software framework for a variety of content-analysis tasks (speech recognition, audio analysis, video event detection, etc.).

Citation: Khalid Ashraf, Benjamin Elizalde, Forrest Iandola, Matthew Moskewicz, Gerald Friedland, Kurt Keutzer, and Julia Bernd. 2015. Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR ’15). New York: ACM, 611-614.

Cheers to: This cup of audioCaffe was grown, roasted, ground, and brewed by Khalid Ashraf, Kurt Keutzer, Gerald Friedland, Benjamin Elizalde, and Jia Yangqing.

Funding: AudioCaffe is funded by a National Science Foundation grant for the SMASH project (grant IIS-1251276). (Any opinions, findings, and conclusions expressed on this website are those of the individual researchers and do not necessarily reflect the views of the funding agency.)

Contacts: Questions can be directed to ashrafkhalid[chez]berkeley[stop]edu.

videosearch100M: Semantic Search

A team at Carnegie Mellon University has been building a video search system based on matching (relatively) simple semantic concepts like dancing with visual and motion features (including convolutional neural network features and dense trajectories). They added features and annotations extracted from the ~800,000 YFCC100M videos to their system, and have released those resources publicly.

Watch this space for updates on (hopefully) getting the full set of new features on AWS!

YFCC100M Visual Search Utility

Researchers at the Information Technologies Institute in Thermi, Greece, have produced a nearest-neighbor based visual search utility for the YFCC100M, using an IVFPQ index based on SURF+VLAD features.

Check out the YFCC100M Visual Search Utility here.

Available in the Multimedia Commons S3 data store on AWS:

  • The index file and the learning files for the Visual Search Utility: features/features/image/vgg-vlad-yfcc/vlad/ and /
  • The SURF+VLAD features used as the basis for the Utility: features/features/image/vgg-vlad-yfcc/vlad/full/ — see the Other Feature Corpora page for a description.

Citation: Adrian Popescu, Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Hervé Le Borgne, and Yiannis Kompatsiaris. Towards an Automatic Evaluation of Retrieval Performance With Large Scale Image Collections. In Proceedings of the ACM Multimedia 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons ’15), Brisbane, Australia, October 2015. PDF of Preprint.

MI-File CBIR: Similarity-Based Image Retrieval

A research group at the Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” (ISTI) has developed a similarity-based search engine for the YFCC100M that retrieves YFCC100M images based on visual similarity to a query image and predicts the most appropriate tag. The system uses deep neural network features and the Metric Inverted File technique.

Check out the MI-File CBIR Search Engine and Tag Predictor here.

The Hybrid-CNN features that were used as the basis for the search engine are available in the Multimedia Commons S3 data store on AWS, under features/features/image/hybrid-cnn/. See the Other Feature Corpora page for a description.

For More Information: See the ISTI Deep Feature Corpus website.

Evento360: Discovering Social Events

The YFCC100M images were used for a Grand Challenge sponsored by Yahoo at ACM Multimedia 2015. Participants were asked to build systems to automatically detect particular social or cultural events in the YFCC100M dataset, analyze their structure, and summarize them. The Grand Challenge provided an example of how having a freely available web-scale multimedia dataset like YFCC100M can move research forward at the level of a whole community.

ICSI researchers participated in the Challenge, producing a retrieval system called “Evento360” that uses hierarchical clustering based on both visual and audio information, as well as clustering of metadata. Link to demo coming soon! We’re having technical difficulties.

ImageSnippets: a system for creating structured, transportable metadata for your images

ImageSnippets is a new kind of software tool that lets you compose and attach machine-readable descriptions to images, using cutting-edge Semantic Web and LInked Data technology and standards.

Once created, these descriptions have many uses: apps can do more meaningful searching for images in an image corpus; your described images can be found on the Web in powerful new ways, and seamlessly incorporated into other linked data tools; and your image and copyright information can be held secure on protected servers, safe from metadata stripping in social media. But first you have to create it, which has been a problem. ImageSnippets makes this easy.

ImageSnippets is a new kind of digital asset management system built on principles from the semantic web. We are a group of designers, cognitive scientists, programmers, and image makers interested in taking image management to a whole new level.

The lightweight image ontology (LIO) used by ImageSnippets was designed by Patrick Hayes and Margaret Warren and is open and freely usable and can be found at:

The ImageSnippets triple-store of image data can be accessed at

Other Demos and Projects

Demos and Tools Using Multimedia Commons Data:

  • The YFCC100M images were used by researchers at University of North Carolina as the basis for a tool that creates 3D reconstructions of landmarks from diverse sets of user-generated images.
  • The ISI Foundation developed Flickr Cities, a tool for visualizing what types of images get captured in particular cities at different times and seasons.

The projects listed here form only a sample of what is available!

Research in the Multimedia Commons: