A Survey on Web Text Information Retrieval in Text Mining

Tapaswini Nayak,Srinivash Prasad,Manas Ranjan Senapati

doi:10.19026/rjaset.10.1884

Tapaswini Nayak, Srinivash Prasad + Show 1 more

Open Access

https://doi.org/10.19026/rjaset.10.1884

Copy DOI

Abstract

In this study we have analyzed different techniques for information retrieval in text mining. The aim of the study is to identify web text information retrieval. Text mining almost alike to analytics, which is a process of deriving high quality information from text. High quality information is typically derived in the course of the devising of patterns and trends through means such as statistical pattern learning. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, creation of coarse taxonomies, sentiment analysis, document summarization and entity relation modeling. It is used to mine hidden information from not-structured or semi-structured data. This feature is necessary because a large amount of the Web information is semi-structured due to the nested structure of HTML code, is linked and is redundant. Web content categorization with a content database is the most important tool to the efficient use of search engines. A customer requesting information on a particular subject or item would otherwise have to search through hundred of results to find the most relevant information to his query. Hundreds of results through use of mining text are reduced by this step. This eliminates the aggravation and improves the navigation of information on the Web.

Highlights

Text mining, which is referred to as “text analytics” is one way to make qualitative or “unstructured” data vulnerable by a computer (Vasumathi and Moorthi, 2012)
Guernsey explains that “to the unskilled, it may seem that Google and other Web search engines do something similar, since they minute opening from beginning to end reams of documents in split-second intervals (Wang et al, 2011) (Fig. 1)
To make frequent pattern mining an essential task in data mining, much research is needed to further develop pattern based mining methods

Summary

Introduction

Text mining, which is referred to as “text analytics” is one way to make qualitative or “unstructured” data vulnerable by a computer (Vasumathi and Moorthi, 2012). Qualitative data is explanatory data that cannot be measured in numbers and often includes qualities of appearance like color, texture and textual report. This includes customer care web chats, e-mails, mobile application or web articles, news sites, social sites, internal reports, call center logs, journal papers, blog entries, to name a few. The Oxford English Dictionary defines text mining as the process or practice of examining large collections of written resources in order to generate new Information, classically using specialized computer software. It is a subset of the superior field of data mining. Guernsey explains that “to the unskilled, it may seem that Google and other Web search engines do something similar, since they minute opening from beginning to end reams of documents in split-second intervals (Wang et al, 2011) (Fig. 1)

Objectives

Findings

Discussion

Conclusion