Abstract

The Internet is probably the most successful distributed computing system ever. However, our capabilities for data querying and manipulation on the internet are primordial at best. The user expectations are enhancing over the period of time along with increased amount of operational data past few decades. The data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an integrals part whose prime objective is to identifying the token and their count. In this paper, we have proposed an effective tokenization approach which is based on training vector and result shows that efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s information need more precisely and reduced search sharply, is believed to be a part of information retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing of documents and generates its respective tokens which is the basis of these tokens probabilistic IR generate its scoring and gives reduced search space. The comparative analysis is based on the two parameters; Number of Token generated, Pre-processing time.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.