Web videos are rich resources for people to satisfy their information and entertainment needs. Previous studies have applied clustering methods by using textual information of videos tagged by the up-loaders to perform web video categorization, which helps users easily find videos that they really want and then increase the user’s satisfaction. However, web video categorization remains a challenging task due to the difficulties in accuracy measuring the semantic relation between terms in videos. In this paper, a novel framework for social web video clustering is proposed by improving the similarity calculation method of web videos and using the clustering ensemble. It consists of the following steps: 1) A new semantic based on Vector Space Model (VSM) is defined by considering the semantic relation of terms obtained from the lexical reference system (WordNet). 2) Word2vec is used to capture the continuous vectors as semantic information in the form of vector set of terms in a document. 3) The comprehensive extension of Semantic VSM by utilizing the Normalized Google Distance is presented. 4) The linear combining function is embodied to combine the similarity based on the optimal values of the parameter to control the weights of models before applying them to clustering paradigms and the Clustering Ensemble is employed to integrate the results of each clustering with Must-Link constraint. Experimental evaluations on real-world social web video datasets demonstrate that the proposed method effectively facilitates the clustering and achieves promising performance.
Read full abstract