Abstract

Scene segmentation of a video, a book or TV series allows to organize them into Logical Story Units and is an essential step for representing, extracting and understanding their narrative structures. We propose an automatic scene segmentation method for TV series based on the grouping of adjacent shots and relying on a combination of multimodal neural features: visual features and textual features, further augmented with the temporal information which may improve the clustering of adjacent shots. Reported experiments compare early and late fusion of the features, video frames subsampling and various shot clustering algorithms. The proposed method achieved good recall, precision and F-measure when tested on several seasons of two popular TV series.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call