Neural Modeling and Content Extraction

B.S Charulatha,T Chitralekha,Sasikumar Sreedharan,Arun Rajaraman,Paul Rodrigues

doi:10.1109/iscmi.2015.39

Abstract

Internet and mobile computing have become a major societal force in that down-to-earth problems and issues are being addressed and sorted out. For this to be effective, information and content extraction need to be at a basic generic level to address different sources and types of web documents and preferably not dependent on any major software. The present study is a development in this direction and focuses on extraction of information from the available text and media-type data as it is stored in the computer, in digital form. The approach is based on operating generic pixel-maps-as stored for any data-so that issues of language, text-script and format do not pose problems. With the pixel-maps, as bases, different methods are used to convert into a numerical form for suitable neural modeling and content is extracted with ease so that approach is universal. Statistical features of the pixel-maps are extracted from the pixel map matrix of the image. The extracted features are presented to neural model and standard Back Propagation algorithm with hidden layers is used to extract content. The accuracy is compared to give the validity of the approach as to how the content extraction within certain bounds could be possible for any web page.

Full Text