Document Categorization Using Decision Tree: Preliminary Study

Wan M.U Noormanshah,Puteri N.E Nohuddin,Zuraini Zainol

doi:10.14419/ijet.v7i4.34.26907

Wan M.U Noormanshah, Puteri N.E Nohuddin + Show 1 more

Open Access

https://doi.org/10.14419/ijet.v7i4.34.26907

Copy DOI

Abstract

This preliminaries study aims to propose a good classification technique that capable of doing document classification based on text mining technique and create an algorithm to automatically classify document according to its folder based on document’s content while able to do sentiment analyses to data sets and summarize it. The objective of this paper to identify an efficient text mining classification technique which can resulted with highest accuracy of classifying document into document folder, capable of extracting valuable information from context-based term that can be used as an output for algorithm to do automatic classification and evaluate the classification technique. Methodology of this study comprises in 5 modules which is 1) Document collection, 2) Pre-Processing Stage, 3) Term Frequency-Inversed Document Frequency, 4) Classification Technique and Algorithm, and lastly 5) Evaluation and Visualization of the classification result. The proposed framework will have utilized Term Frequency-Inversed Document Frequency (TF-IDF) and Decision Tree technique which TF-IDF used as purposes to rank all the terms based on most frequent to least frequent terms so, while decision tree function as decision making in terms of deciding which folder the document belongs to.

Highlights

Text Mining (TM) is one of an analytics process, it was formulated to execute a task in analyzing a collection of unstructured textual materials in deriving high-quality information and essential knowledge covered by raw texts and TM specified in takes care of unstructured information
With a combination of TM and Term Document Matrix (TDM) [2] its competent to indexed and count all terms appear in each document in table form which arranged by column for terms appeared in a document and row represents the document identification or vice versa
In this research, we suggest term frequency (TF)-inversed document frequency (IDF) as reduction attribute technique to be combined with decision tree as an absolutely factual method to assess the significance of words dependent on its frequency of occurrence in the document and in its related corpus

Summary

Related Work

We lived in an era that computing technology grows so fast and data collecting becomes notable and contribute too many fields of work such as in medical used, business, education, reference, report, etc. Real world data have many type which is qualitative, quantitative, discrete, etc. These data can be recorded and visualized in variety of mediums such as electronic document, and databases. Data mining known as knowledge discovery in databases is the process of extracting hidden useful knowledge through large data set with help of tools to analyses data. Classification is one of data mining components that used to analyse and result in predict set of data according to its target class that a data belongs to. First sub-section is about introduction of text mining, followed by term frequency inversed document frequency and end with comparison of classification technique in text mining

Text Mining

Term Frequency Inversed Document Frequency

Text Classification Technique and Comparison

Framework of Keyword-Based Text Classification

Proposed Framework

RSTUDIO

Expected Outcomes

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Engineering & Technology	Publication Date: Dec 13, 2018
Citations: 8	License type: cc-by

R Discovery Prime

R Discovery Prime

Document Categorization Using Decision Tree: Preliminary Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering & Technology

Lead the way for us

Similar Papers

A comparative analysis of text representation, classification and clustering methods over real project proposals
Meltem Aksoy ... Mehmet Fatih Amasyali
International Journal of Intelligent Computing and Cybernetics | VOL. 16
Meltem Aksoy, et. al.Meltem Aksoy ... Mehmet Fatih Amasyali
28 Feb 2023
International Journal of Intelligent Computing and Cybernetics | VOL. 16

Biomedical Text Mining for Diagnosing Diseases - A Review
R Priya ... R Padmajavalli
Indian Journal of Science and Technology | VOL. 9
R Priya, et. al.R Priya ... R Padmajavalli
28 Jun 2016
Indian Journal of Science and Technology | VOL. 9

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis
Ukhti Ikhsani Larasati ... Much Aziz Muslim
Scientific Journal of Informatics | VOL. 6
Ukhti Ikhsani Larasati, et. al.Ukhti Ikhsani Larasati ... Much Aziz Muslim
24 May 2019
Scientific Journal of Informatics | VOL. 6

Bibliometric and Text Mining Analysis on COVID-19 Research Projects in Iran
Meisam Dastani ... Mohammad Ghorbani
Depiction of Health | VOL. 12
Meisam Dastani, et. al.Meisam Dastani ... Mohammad Ghorbani
03 Nov 2021
Depiction of Health | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Document Categorization Using Decision Tree: Preliminary Study

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Engineering &amp; Technology

More From: International Journal of Engineering & Technology