Abstract

The available automatic temporal video scene segmentation methods still lack efficacy to be employed in most practical multimedia systems. The ones showing better results are multimodal and based on late fusion. On the other hand, early fusion has not been sufficiently investigated in this task because of the well known barriers of this approach: correlation identification, temporal synchronization and unique representation. This work presents a feature fusion method which deals with the mentioned difficulties and produces features which can enhance the efficacy of existing temporal video scene segmentation methods. This feature fusion process is performed on singlemodal Bag of Features feature vectors and is intended to enrich previously captured latent semantics by performing temporal clustering of features, providing an unified representation of multiple temporal related features. This feature fusion process have been coupled with two of-the-shelf scene segmentation algorithms, presenting competitive results when compared with two other state-of-the-art multimodal temporal scene segmentation methods. The results indicate that the proposed early fusion feature representation method is a promising alternative in helping to boost video retrieval related tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call