Two-fold complex network approach to discover the impact of word-order in Urdu language

Nuzhat Khan,Muhammad Paend Bakht,Usman Ullah Sheikh,Mohamad Anuar Kamaruddin

doi:10.11591/ijeecs.v23.i2.pp1039-1048

Abstract

<div>This work examines standard Urdu text to confirm impact of word order in the language structure. The complex network approach is used to obtain universal properties of two different word co-occurrence networks. Macro and micro scale two-fold examinations of networks are performed for structure discovery. While preserving the vocabulary size, two networks are generated from same text with and without standard word order. In addition, text networks are benchmarked with a random network to extract global features. Achieved outcomes indicate certain word order in Urdu structure for most of the sentences. The normal and shuffled text networks demonstrated similar large-scale characteristics. The results show that average path length and network diameter is reduced after shuffling. On the other hand, clustering coefficient is increased in shuffled text as compared to normal text. Our results validated that few short sentences in range of three words are fully free order. The observations revealed that long sentences are ambiguous without standard order. Both networks are topologically similar but shuffling caused massive discrepancy in network composition and sentence structure. Inside graph view, grammatical association-based words connectivity exists in normal text network. With this universal approach, impact of word order in Urdu language is confirmed. Meanwhile, this breakthrough directs to uncover language composition by extracting small sentences as motifs.</div>

Full Text