Abstract

Post processing is an important part of any document processing system. There are two ways of post processing. First word level correction and second sentence level correction in document. The word level is performed in two ways first, finding error and finding dictionary by most similar word. That is called dictionary based approach. Another method to find most probable word is known as probabilistic approach. In order to generate the probabilistic model which includes unigram, bigram, trigram, online resources from various Gujarati newspaper websites are used. The proposed system will use models like Naive Bayes and Hidden Markov Model to correct word level error. The system will be tested on synthetic dataset which is generated by adding random word level error in the actual document.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.