Extractive Text Summarization Models for Urdu Language

Ali Nawaz,Maheen Bakhtyar,Junaid Baber,Ihsan Ullah,Waheed Noor,Abdul Basit

doi:10.1016/j.ipm.2020.102383

Abstract

In the recent few years, a lot of advancement has been made in Urdu linguistics. There are many portals and news websites that are generating a huge amount of data every day. However, there is still no publicly available dataset nor any framework available for automatic Urdu extractive summary generation. In an automatic extractive summary generation, the sentences with the highest weights are given importance to be included in the summary. The sentence weight is computed by the sum of the weights of the words in the sentence. There are two famous approaches to compute the weight of the words in the English language: local weights (LW) approach and global weights (GW) approach. The sensitivity of the weights depends on the contents of the text, the one word may have different weights in a different article, known as LW based approach. Whereas, in the case of GW, the weights of the words are computed from the independent dataset, which implies the weights of all words remain the same in different articles. In the proposed framework, LW and GW based approaches are modeled for the Urdu language. The sentence weight method and the weighted term-frequency method are LW based approaches that compute the weights of the sentences by the sum of important words and the sum of frequencies of the important words, respectively. Whereas, vector space model (VSM) is GW based approach, that computes the weight of the words from the independent dataset, and then remain the same for all types of the text; GW is widely used in the English language for various applications such as information retrieval and text classification. The extractive summaries are generated by LW and GW based approaches and evaluated with ground-truth summaries that are obtained by the experts. The VSM is used as a baseline framework for sentence weighting. Experiments show that LW based approaches are better for extractive summary generation. The F-score of the sentence weight method and the weighted term-frequency method are 80% and 76%, respectively. The VSM achieved only 62% accuracy on the same dataset. Both, the datasets with ground-truth, and the code are made publicly available for the researchers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extractive Text Summarization Models for Urdu Language

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Sep 25, 2020
Citations: 24

Similar Papers

NERWS: Towards Improving Information Retrieval of Digital Library Management System Using Named Entity Recognition and Word Sense
Ahmed Aliwy ... Ayad Abbas
Big Data and Cognitive Computing | VOL. 5
Ahmed Aliwy, et. al.Ahmed Aliwy ... Ayad Abbas
28 Oct 2021
Big Data and Cognitive Computing | VOL. 5

Improving learning accuracy of fuzzy decision trees by hybrid neural networks
E.C.C Tsang ... X.Z Wang
IEEE Transactions on Fuzzy Systems | VOL. 8
E.C.C Tsang, et. al.E.C.C Tsang ... X.Z Wang
01 Jan 1999
IEEE Transactions on Fuzzy Systems | VOL. 8

N-layer Approach to Web Information Retrieval
H.B Kekre ... S.S Sane
International Journal of Applied Information Systems | VOL. 5
H.B Kekre, et. al.H.B Kekre ... S.S Sane
10 Jan 2013
International Journal of Applied Information Systems | VOL. 5

Research On Text Classification Based On Deep Neural Network
Deageon Kim
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14
Deageon KimDeageon Kim
31 Dec 2022
International Journal of Communication Networks and Information Security (IJCNIS) | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extractive Text Summarization Models for Urdu Language

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management