Abstract

Users from different cultures and backgrounds often feel comfortable expressing their thoughts on trending topics by generating content in their regional languages. Recently, there has been an explosion in multilingual information, and a massive amount of multilingual textual data is added daily on the Internet. Using hashtags for multilingual low-resource content can be an effective way to overcome language barriers because it allows content to be discovered by a wider audience and makes it easier for people interested in the topic to find relevant content, regardless of the language in which it was written. To account for linguistic diversity and universal access to information, hashtag recommendation for multilingual low-resource content is essential. Several approaches have been put forth to recommend content-based and personalized hashtags for multimodal content in high-resource languages. Data availability and linguistic differences often limit the development of hashtag recommendation methods for low-resource Indic languages. Hashtag recommendation for tweets disseminated in low-resource Indic languages has seldom been addressed. Moreover, personalization and language usage aspects to recommend hashtags for tweets posted in low-resource Indic languages have yet to be explored. In view of the foregoing, we propose an automated hashtag recommendation system for tweets posted in low-resource Indic languages dubbed as TAGALOG, capable of recommending personalized and language-specific hashtags. We employ user-guided and language-guided attention mechanisms to distill indicative features from low-resource tweets according to the user’s topical and linguistic preferences. We propose a graph-based neural network to mine users’ posting behavior by connecting historical tweets of a particular user and language relatedness by linking tweets according to language families, i.e., Indo-Aryan and Dravidian. Experimental results on the curated dataset from Twitter demonstrate that the proposed model outperformed recognized pre-trained language models and extant research, showing an average improvement of 12.3% and 12.8% in the F1-score, respectively. TAGALOG recommends hashtags that align with the user’s interests and linguistic predilections, leading to a heightened level of tailored and engaging user experience. Personalized and multilingual hashtag recommendation systems for low-resource Indic languages can help to improve the discoverability and relevance of content in these languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call