Abstract
Corpus analysis is one of the most powerful methods in text mining, data discovery, and finding relationships among documents. In linguistics, a corpus (plural corpora) is a large and structured set of texts which should to be classified by artificial intelligence systems. The performance of conventional text classifiers on corpora is usually unsatisfying. In this paper, a novel text classifier for corpus analysis is proposed by using advanced numerical unconstrained nonlinear optimization in collaboration with neural networks. The proposed approach, the relaxed conjugate gradient (RCG) trained artificial neural network, classifies each document using n-gram token filter by TF score multiplied by its IDF score. The proposed updating formula for training of neural networks combines the good numerical performance of Polak–Ribiere technique and the wonderful global convergence properties of Fletcher–Reeves method and also it inherits some adaption from Hestenes–Stiefel, and Dai–Yuan conjugate gradient updating procedures by using the relaxation equation. The our proposed algorithm was evaluated on verses of Holy Quran and its outcomes were compared with results of its competitors such as the classical gradient descent algorithm, the modified quickprop algorithm, the conjugate gradient algorithm with Hestenes–Stiefel update, the conjugate gradient algorithm with Polak–Ribiere update, the conjugate gradient algorithm with Fletcher–Reeves updates, the scaled conjugate gradient algorithm, the variable memory Broyden–Fletcher–Goldfarb–Shanno update, and smoothed regularized conjugate gradient method. Based on these experiments, the proposed RCG is able to accurately classify text corpus with low computational cost.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.