Mining Ambiguities Using Pixel-Based Content Extraction

B S Charulatha,Arun Rajaraman,Paul Rodrigues,T Chitralekha

doi:10.1007/978-81-322-2674-1_50

Abstract

Internet and mobile computing have become a major societal force in that down-to-earth issues are being addressed and sorted out whether they relate to online shopping or securing driving information in unknown places. Here the major concern of communication is that the Web content should reach the user in a short period of time. So information extraction needs to be at a basic level and easier to implement without depending on any major software. The present study focuses on extraction of information from the available text and media-type data after it is converted into digital form. The approach uses the basic pixel map representation of data and converting them through numerical means, so that issues of language, text script and format do not pose problems. With the numerically converted data, key clusters similar to keywords used in any search method are developed and content is extracted through different approaches making it computation-intensive for easiness. One approach is that statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm. The similarity metric being Euclidean distance and the accuracy is compared and presented. The concept of ambiguity is introduced in the paper, by comparing objects like ‘computer,’ which have explicit content representation possible to an abstract subject like ‘soft-computing,’ where vagueness and ambiguity are possible in representation. With this as the objective, the approach used for content extraction is compared and how within certain bounds it could be possible to extract the content.

Full Text