Abstract
With recent trends in data, it is very evident that more and more of it will be continued to be generated. It is suspected that our limit to provide services to customers will be limited by the type of analysis and knowledge that we can extract from the data. Images constitute a fair share of information in the large form of media that is used for communication. For example text, video, audio to name other few along with their meaningful combinations. While Summarization of videos and events have been of recent interest to computer vision and multimedia research community. Recent advances in the field of optimization especially deep learning have shown significant improvements in video summarization. Image Collection Summarization is an important task that continues to elude because of the inherent challenges and its differences from video summarization. Since the video has a lot of temporal link between the frames that can be exploited using some temporal neural networks like Long Short-Term Memory (LSTM) or Recurrent Neural Networks (RNNs) they prove to be useful in case of designing deep learning based architecture for the event and video summarization. Similarly, for text, it can be acknowledged that a long passage and sentences will have a high-level temporal sequence between them which can be exploited for summarization. While in case of a collection of images there is no temporal sequence between two images to be exploited by the network [14, 24]. This has resulted in the problem being esoteric in nature. To remedy this, the following article plans to bring the challenges in the field of image collection summarization, the need for gold standards in the definition of summarization, datasets and quantitative evaluation metrics based on those datasets and also major papers in the area that have aimed to solve the problem in past.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have