Abstract

Objectives: To develop a desktop application that automatically classifies a document as to which area of accreditation documents it should belong to. Specifically, it aims to: a) To create a predictive model that addresses document classification tasks. b) To design and develop an application that classifies documents according to document classification. c) To evaluate the performance measures of the automatic document classification. Methods: We introduce an innovative approach for the automatic classification of accreditation documents. Specifically, an approach of including scanned or captured documents in classification task using Optical Character Recognition (OCR); use TFIDF (Term-frequency Inverse Document Frequency) with stopwords removal, ngram of 1-2 in preprocessing of the text documents; and Naive Bayes algorithm with additive (Laplace/Lidstone) smoothing as a classifier in building our model. Results: Performance measures such as accuracy, precision, recall, and f-score were conducted to evaluate the efficiency of the study. The results showed 82% accuracy, 84% precision, 82% recall, and 82% F-1 score. As we explore the use of OCR for text extraction, TF-IDF for text preprocessing, and Naive Bayes classifier, the results indicate that the proposed approach is efficient. Conclusions: Classification of input documents in whatever forms, may it be captured image, scanned or simple text documents were obtained using OCR, TF-IDF, and Naive Bayes classifier. It provides an efficient way of automatic classification of accreditation documents and it gives an avenue to address limiting factors of the previous works, i.e classifying documents based on one’s opinion and time-consuming classification. Keywords: Accreditation Document Classification; Document Classification Objective Evaluation; TF-IDF; Term frequency-inverse document frequency; Multinomial Naive Bayes; OCR; Optical Character Recognition

Highlights

  • Accreditation is one way for HEIs (Higher Education Institutions) to keep themselves in check with the standards

  • In each HEI, there are accreditation tasks that are assigned to collect and classify documents. These documents are in the forms of Portable Document Format (PDF), Document File Format (Doc), and Scanned PDF

  • A system that automatically classifies documents is invaluable to the assigned accreditation task force

Read more

Summary

Introduction

Accreditation is one way for HEIs (Higher Education Institutions) to keep themselves in check with the standards. Accreditation requires documents if the standards are being met of a particular program in the HEI. In each HEI, there are accreditation tasks that are assigned to collect and classify documents. These documents are in the forms of Portable Document Format (PDF), Document File Format (Doc), and Scanned PDF. The traditional way of classifying these documents is dependent on the assigned accreditation task force’s judgment and is, subjective. Since it is subjective, it is time-consuming. A system that automatically classifies documents is invaluable to the assigned accreditation task force

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call