Abstract

The project Enhanced semantic preserved concept based mining model for enhancing document clustering proposes the enhancement of data mining model for efficient informaion retreival . Concept based mining model is a challenging and a red hot field in the current scenario and has great importance in text categorization applications. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. This project aims to Develop a concept based mining model for preserving the meaning of sentence using semantic net & synonym dictionary. The new concept definition can be expressed in the form of a triplet .This triplet is the basic unit for the processing and preprocessing tasks. For increasing the performance, SVD (Singular Value Decomposition) is used. I. INTRODUCTION Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectation.Semantic preserved Concept based mining model is used to avoid the problems of polysemy and synonymy in the text mining applications.It is a challenging issue to find accurate and relevant knwolegde in the text documents to help users to find what they actually want.The main advantage of term based mining is that it has highest computational performance compared to concept based mining.A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. II. EXISTING SYSTEM The existing system is based on keywords and its frequency.When we are submitting a querry it counts the frequency of words and seraches based on the frequency.The main disadvantages are it is manual one,costly,imposes waste of time for manual operations,lack of semantic consideration,inefficient term matching,do not consider synonyms and term dependencies,it does not provide partial matching and poor retrieval performance. III. PROPOSED SYSTEM In the proposed system the concepts of the passage is considered and clustered on the basis of the semanticsThe proposed model can efficiently find significant matching concepts between documents,according to the semantics of their sentences. The similarity between documents is calculated based on a new concept - based similarity measure.A raw text document is the input to the proposed model. Each document has well definedsentence boundaries. Each sentence in the document is labeled automatically based on the Prop Bank notations. The sentence that has many labeled verb argument structures includes many verbs associated with their arguments. The labeled verb argument structures, the output of the role labeling task, are captured and analyzed by the concept-based model on the sentence and document levels. In this model, both the verb and the argument are considered as terms. One term can be an argument to more than one verb in the same sentence. This means that this term can have more than one semantic role in the same sentence..The main advantages are it can be used to create a platform that is capable of identifying & classifying medical care related information from patients,it consider the semantic meaning of the entered texts,efficient term matching,considers synonyms and term dependencies, provide partial matching and it is accurate method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call