Machine Learning Applied to Media Libraries for Insights, Search, and Segmentation

Rob Gonsalves,Nizar Bouguila,Shailendra Mathur,Zahra Montajabi

doi:10.5594/jmi.2023.3245685

Abstract

Recent advances in machine learning (ML) have produced a new form of semantic indexing that lets users enhance searches and gain new insights into their media libraries. Unlike typical search systems that use extracted metadata, semantic indexing allows users to find relevant material without the need to tag the media with selections from a predefined taxonomy. With semantic search, users can simply enter unstructured text, and the system will find the best matching media clips. This article extends the use of this technology to gather analytics on the data, which can then be further correlated to generate various insights. This new form of media indexing can be performed with the contrastive language-image pretraining (CLIP) model from OpenAI for images and similar models for video and audio. These models encode media into embeddings that can be searched to find the closest semantic similarity, enhanced with learned cultural knowledge. These systems have the benefit of finding media based on keywords, synonyms, and summaries. The systems can also be used for analytics and insights, such as segmentation, shot detection, and creating a 2D map to display correlations. This article ends with a discussion of the next steps, including using knowledge graphs for semantic search.

Full Text