Abstract

Communication and Internet are two major resources in today’s technical, social and scientific disciplines offering a wide range of possibilities in bringing in new approaches and variations in current ones. Web documents are increasingly growing in size, volume and time, bringing in the need to access and process them off and online over the Internet with a PC or a smart phone. When viewed in Indian context, web documents pose different kinds of challenge and the present study addresses some of them taking into account the vagaries in the Indian languages. This has become very relevant in Indian education scenario, where bilingual and multi-lingual communication and web documents through on-line courses, are being generated. When regional native dialect comes into picture, another dimension of complexity is added. After presenting the different kinds of web pages in the Indian perspective, the case for the development of a generic approach id highlighted so that it can blend with current tools of data mining and at the same time cater to vagaries in Indian texts. The approach based on a pixel level addressing of data-which is of large size-, is later modified and reduced to numerical equivalents using matrix manipulations so that they form inputs to some classification approaches, like statistical, pattern matching and neural models. Some typical case studies on text letters and words are presented to highlight the generality of approach and its flexibility to fit into different tools.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call