The rise in the availability of video content for access via the Internet and the medium of television has resulted in the development of automatic search procedures to retrieve the desired video. Searches can be simplified and hastened by employing automatic classification of videos. This paper proposes a descriptor called the Spatio-Temporal Histogram of Radon Projections (STHRP) for representing the temporal pattern of the contents of a video and demonstrates its application to video classification and retrieval. The first step in STHRP pattern computation is to represent any video as Three Orthogonal Planes (TOPs), i.e., XY, XT and YT, signifying the spatial and temporal contents. Frames corresponding to each plane are partitioned into overlapping blocks. Radon projections are obtained over these blocks at different orientations, resulting in weighted transform coefficients that are normalized and grouped into bins. Linear Discriminant Analysis (LDA) is performed over these coefficients of the TOPs to arrive at a compact description of STHRP pattern. Compared to existing classification and retrieval approaches, the proposed descriptor is highly robust to translation, rotation and illumination variations in videos. To evaluate the capabilities of the invariant STHRP pattern, we analyse the performance by conducting experiments on the UCF-101, HMDB51, 10contexts and TRECVID data sets for classification and retrieval using a bagged tree model. Experimental evaluation of video classification reveals that STHRP pattern can achieve classification rates of 96.15%, 71.7%, 93.24% and 97.3% for the UCF-101, HMDB51,10contexts and TRECVID 2005 data sets respectively. We conducted retrieval experiments on the TRECVID 2005, JHMDB and 10contexts data sets and the results revealed that STHRP pattern is able to provide the videos relevant to the user's query in minimal time (0.05s) with a good precision rate.
Read full abstract