Abstract

SummaryA video saliency detection model is proposed based on deep learning, which improves the existing fully convolutional network (FCN)‐based model by introducing a convolutional long short‐term memory (ConvLSTM) module. The ConvLSTM splits the input into two flows with two layers in each one. The two flows have different dilation rates that make them have different receptive fields, which enables the proposed model to perform better in depicting the contour of objects. The ConvLSTM module receive frames in order as input rather than unordered frames that FCN modules do, so the proposed model can learn both spatial and temporal information of video data. Considering the lack of manually labeled annotations in the dataset, augmentation technologies are used in training the model to expand the dataset, such as performing mirror transformation, introducing Gaussian noise and abandoning every other frame to simulate fast movement situation. The proposed FCNs‐ConvLSTM model is trained and evaluated on extensively used dataset, and the results demonstrate that it performs better on recall rate (0.52 to 0.64) with a similar level on precision rate (0.72) when threshold is 125 and it also gets an increase on maximum F‐measure (0.66 to 0.70), which indicates that the proposed model has better capacity in detecting moving object.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.