Abstract

Recent advances in Machine Learning (ML) have produced a new form of semantic indexing that lets users enhance searches, as well as gain new insights into their media libraries. Unlike typical search systems that use extracted metadata, semantic indexing allows users to find relevant material without the need to tag the media with selections from a predefined taxonomy. With semantic search, users can simply enter unstructured text, and the system will find the best matching media clips. The paper extends the use of the same technology to gather analytics on the data which can then be further correlated to generate various insights. — This new form of media indexing can be performed with the CLIP model from OpenAI. The model encodes images and text into embeddings that can be searched to find the closest semantic similarity, enhanced with learned cultural knowledge. This type of indexing can be made practical using a database like Elasticsearch. The system has the benefit of finding media based on keywords, synonyms, and summaries. The same system can also be used for analytics and insights, such as clustering, shot detection, and creating a 2-dimensional map to display correlations. — The paper also presents extensions to the semantic search systems. Based on a study of multiple existing models, these extensions provide new capabilities - to handle many media types, using additional languages, search for spoken phrases in audio files, finding both verbatim and semantically similar phrases, extracting semantic information that leverages multiple video frames, and searching for ambient sounds.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call