Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘bag-of-visual words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. The key contributions of this paper are first, a novel approach for visual words construction which takes physically spatial information, angle, and scale of keypoints into account in order to preserve semantic information of objects in visual content and to enhance the traditional bag-of-visual words, is presented. Second, a method to identify and eliminate similar key points, to form semantic visual words of high quality and to strengthen the discrimination power for visual content classification, is given. Third, an approach to discover a set of semantically similar visual words and to form visual phrases representing visual content more distinctively and leading to narrowing the semantic gap is specified.