The Multimedia Commons effort includes interactive demos that show how the dataset is being used, along with analysis and retrieval tools to help researchers take advantage of this massive resource. Some of these tools are included in the resources hosted on Amazon Web Services, while other tools and demos arise from independent efforts and are hosted externally.
There are two browsers with which the dataset can be easily explored. For example, they allow you to visualize the distribution of media with particular tags across users, times, and places; view the original media with those tags on Flickr; and download the metadata for your query results. More details are available on a separate page.
audioCaffe: Audio Analysis With Deep Neural Nets
The audioCaffee audio-content analysis tool and a demonstration experiment may be found in the
tools/audioCaffe/ directory in our AWS S3 data store, and is also included in the Multimedia Commons CloudFormation Template. The demonstration experiment — a MED-ium Cup of audioCaffe — uses data from the YLI Multimedia Event Detection (MED) subcorpus. It will give you a taste of what you can do with a big corpus of computed audio features like YLI and a flexible set of analysis tools.
The directory also includes a build of audioCaffe; you can check for updates at GitHub. audioCaffe is a deep neural net-based audio content analysis tool that leverages the deep-learning framework Caffe. audioCaffe is an open-source resource being developed as part of the SMASH project, which aims to provide a single software framework for a variety of content-analysis tasks (speech recognition, audio analysis, video event detection, etc.).
Citation: Khalid Ashraf, Benjamin Elizalde, Forrest Iandola, Matthew Moskewicz, Gerald Friedland, Kurt Keutzer, and Julia Bernd. 2015. Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR ’15). New York: ACM, 611-614.
Cheers to: This cup of audioCaffe was grown, roasted, ground, and brewed by Khalid Ashraf, Kurt Keutzer, Gerald Friedland, Benjamin Elizalde, and Jia Yangqing.
Funding: AudioCaffe is funded by a National Science Foundation grant for the SMASH project (grant IIS-1251276). (Any opinions, findings, and conclusions expressed on this website are those of the individual researchers and do not necessarily reflect the views of the funding agency.)
Contacts: Questions can be directed to ashrafkhalid[chez]berkeley[stop]edu.
videosearch100M: Semantic Search
A team at Carnegie Mellon University has been building a video search system based on matching (relatively) simple semantic concepts like dancing with visual and motion features (including convolutional neural network features and dense trajectories). They added features and annotations extracted from the ~800,000 YFCC100M videos to their system, and have released those resources publicly.
- Check out the videosearch100m demo here.
- Find out how to use the videosearch100m API here.
- Get the features and concept annotations here.
Watch this space for updates on (hopefully) getting the full set of new features on AWS!
YFCC100M Visual Search Utility
Researchers at the Information Technologies Institute in Thermi, Greece, have produced a nearest-neighbor based visual search utility for the YFCC100M, using an IVFPQ index based on SURF+VLAD features.
Check out the YFCC100M Visual Search Utility here.
Available in the Multimedia Commons S3 data store on AWS:
- The index file and the learning files for the Visual Search Utility:
- The SURF+VLAD features used as the basis for the Utility:
features/features/image/vgg-vlad-yfcc/vlad/full/— see the Other Feature Corpora page for a description.
Citation: Adrian Popescu, Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Hervé Le Borgne, and Yiannis Kompatsiaris. Towards an Automatic Evaluation of Retrieval Performance With Large Scale Image Collections. In Proceedings of the ACM Multimedia 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions (MMCommons ’15), Brisbane, Australia, October 2015. PDF of Preprint.
MI-File CBIR: Similarity-Based Image Retrieval
A research group at the Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” (ISTI) has developed a similarity-based search engine for the YFCC100M that retrieves YFCC100M images based on visual similarity to a query image and predicts the most appropriate tag. The system uses deep neural network features and the Metric Inverted File technique.
Check out the MI-File CBIR Search Engine and Tag Predictor here.
The Hybrid-CNN features that were used as the basis for the search engine are available in the Multimedia Commons S3 data store on AWS, under
features/features/image/hybrid-cnn/. See the Other Feature Corpora page for a description.
For More Information: See the ISTI Deep Feature Corpus website.
Evento360: Discovering Social Events
The YFCC100M images were used for a Grand Challenge sponsored by Yahoo at ACM Multimedia 2015. Participants were asked to build systems to automatically detect particular social or cultural events in the YFCC100M dataset, analyze their structure, and summarize them. The Grand Challenge provided an example of how having a freely available web-scale multimedia dataset like YFCC100M can move research forward at the level of a whole community.
ICSI researchers participated in the Challenge, producing a retrieval system called “Evento360” that uses hierarchical clustering based on both visual and audio information, as well as clustering of metadata. Link to demo coming soon! We’re having technical difficulties.
ImageSnippets: a system for creating structured, transportable metadata for your images
ImageSnippets is a new kind of software tool that lets you compose and attach machine-readable descriptions to images, using cutting-edge Semantic Web and LInked Data technology and standards.
Once created, these descriptions have many uses: apps can do more meaningful searching for images in an image corpus; your described images can be found on the Web in powerful new ways, and seamlessly incorporated into other linked data tools; and your image and copyright information can be held secure on protected servers, safe from metadata stripping in social media. But first you have to create it, which has been a problem. ImageSnippets makes this easy.
ImageSnippets is a new kind of digital asset management system built on principles from the semantic web. We are a group of designers, cognitive scientists, programmers, and image makers interested in taking image management to a whole new level.
The lightweight image ontology (LIO) used by ImageSnippets was designed by Patrick Hayes and Margaret Warren and is open and freely usable and can be found at: http://lov.okfn.org/dataset/lov/vocabs/lio
The ImageSnippets triple-store of image data can be accessed at datahub.io
Other Demos and Projects
Demos and Tools Using Multimedia Commons Data:
- The YFCC100M images were used by researchers at University of North Carolina as the basis for a tool that creates 3D reconstructions of landmarks from diverse sets of user-generated images.
- The ISI Foundation developed Flickr Cities, a tool for visualizing what types of images get captured in particular cities at different times and seasons.
The projects listed here form only a sample of what is available!
Research in the Multimedia Commons:
- The MediaEval Placing Task is using the YLI-GEO dataset (with features, YFCC100M metadata, and original media) for its 2014, 2015, and 2016 benchmarks. Check out: 2014 Working Notes Proceedings | 2015 Working Notes Proceedings
- The first Multimedia COMMONS workshop featured research that used the YFCC100M dataset and other Multimedia Commons features, annotations, and resources. Check out the Proceedings of MMCommons’15.
- Find other papers referencing the YFCC100M.