Abstract

Information Retrieval (IR) is a profound technique to find information that addresses the need of query. Processing of normal text is easier and information can be retrieved efficiently. There are plenty of algorithms in hand to carry out the normal text retrieval. Whereas retrieving geospatial information is very complex and requires additional operations to be performed. Since geospatial data contain complex details than general data such as location, direction. To handle geographical queries, we proposed a Density Probabilistic Document Correlation (DPDC) approach. This approach, initially categorize the geographical features from text that satisfies the given queries. Existing text classification techniques are unsuitable for geospatial text classification due to the exclusivity of the geographical features. Depending on the DPDC approach result we predict overlap of the feature set for a document. Based on overlap and document correlation, the documents are ranked. Highly relevant documents are extracted depending on the score obtained through ranking. Documents with high score are considered the most relevant. The experimental results show that our proposed method efficiently retrieves the list of relevant documents.

Highlights

  • For the past several years, geographical data has been useful for large spatial data sets

  • Most relevant documents for the user queries are in the top of the list, whereas the irrelevant documents are not retrieved, which are eliminated by the Density Probabilistic Document Correlation (DPDC) approach

  • Due to the complex nature of spatial data type and the correlation relationship exists among the spatial data; information retrieval of spatial data becomes laborious

Read more

Summary

INTRODUCTION

Spatial data is the progression of discovering interesting patterns, which were formerly unknown, but potentially. The correlation and relationship exist among spatial data are frequently handled by the algorithms of data mining. Based on the features chosen the relevant documents are retrieved User expresses their interest in the form of queries to a component, which performs the search operation. Stop words have the impact on the retrieval process since they have high frequency of appearing in document with less meaning and affect the weighting process, which is carried out in our Density Probabilistic Document Correlation (DPDC) approach. The preprocessed documents or data sets and the user query are given to the DPDC approach component. In order to retrieve relevant document, we use the ranking algorithm, which determines the occurrence trained features in a document. Most relevant documents for the user queries are in the top of the list, whereas the irrelevant documents are not retrieved, which are eliminated by the DPDC approach

RELATED WORK
PROPOSED METHODOLOGY
Feature Selection
Document Preprocessing
DPDC Approach
Estimate Probability of Feature Occurrence
Predict Feature Overlap
Estimate Document Weight
Determine Document Score
Rank and Retrieve the Documents
EXPERIMENTAL EVALUATION
Findings
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.