Abstract
This paper presents an approach to classification of scanned documents by identifying the title of the document using Naive Bayes classifier, the predicted title is then validated using a domain driven knowledge base. Levenshtein distance is used to mitigate errors arising from the OCR (optical character recognition) algorithm. This approach produced significantly better results than using the Naive Bayes classifier by itself. This study contributes resources to the intelligent processing of real estate documents in the form of rich domain specific knowledge base.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have