Text mining, which fundamentally involves quantitative tactics to analyze textual data, can be used for discovering knowledge and to achieve scholarly research goals. For large-scale data such as corpus text, intelligent learning methods have been effectively approached. In this paper, an artificial neural network with a quasi-Newton updating procedure is presented for multi-label multi-class text classification. This numerical unconstrained training technique, the Multi-Label extension of Log-Loss function using in Limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (ML4BFGS), provides a noteworthy opportunity for text mining and leads to a significant improvement in text classification performances. The ML4BFGS training approach is applied to allocate some (one or multi) of the classes to each corresponding sentence from different available labels. We evaluate this method on English translations of the Holy Quran. These religious texts have been chosen for experiments of this manuscript because each verse (sentence) usually has multiple labels (topics) and different translations of each verse should have the same labels. Experimental results show that ML4BFGS is talented for multi-label multi-class classification in the Quranic corpus. Evaluation criteria of some advanced updating methods such as ITCG, BFGS, L-BFGS-B, L3BFGS as well as some other multi-label approaches such as ML-k-NN, and well-known SVM are compared with the proposed ML4BFGS and the outcomes are fully-described in this study. The performance measures including the Hamming loss, recall, precision, and F1 score show that the ML4BFGS achieves the best results in extracting related classes for each verse, while the proposed network takes the least epochs compared to the other training approach for completing learning or training phase. Simultaneously, the elapsed time for ML4BFGS is just 78% (in seconds) of the best experience of this term. Compared with the applicability of some state-of-the-art algorithms, ML4BFGS has a less computational cost, faster convergence rate, and much accuracy in corpus analysis.
Read full abstract