Keyphrase Extraction on Covid-19 Tweets Based on Doc2Vec and YAKE

Fahri Firdausillah,Erika Devi Udayanti

doi:10.33633/jais.v6i1.4454

Fahri Firdausillah, Erika Devi Udayanti

Open Access

https://doi.org/10.33633/jais.v6i1.4454

Copy DOI

Abstract

Keyword and keyphrase extraction are one of the initial foundations for performing several text processing operations such as summarization and document clustering. YAKE is one of the techniques used for unsupervised and independent keyphrase extraction, it does not require a corpus for linguistic tools such as NER and POS-tag. However, the use of YAKE in microblogging documents such as Twitter often results in a keyphrase that is less representative because of the lack of words used for ranking. This paper offers a solution to this problem by looking for similar tweets in the keyphrase extraction process using Doc2Vec so that the number of words used in the YAKE ranking process can be greater. Covid-19 tweets related are used as dataset as the topic is currently widely discussed on social media to prove that the proposed approach could improve keyphrase extraction performance

Full Text