Abstract

Keyword and keyphrase extraction are one of the initial foundations for performing several text processing operations such as summarization and document clustering. YAKE is one of the techniques used for unsupervised and independent keyphrase extraction, it does not require a corpus for linguistic tools such as NER and POS-tag. However, the use of YAKE in microblogging documents such as Twitter often results in a keyphrase that is less representative because of the lack of words used for ranking. This paper offers a solution to this problem by looking for similar tweets in the keyphrase extraction process using Doc2Vec so that the number of words used in the YAKE ranking process can be greater. Covid-19 tweets related are used as dataset as the topic is currently widely discussed on social media to prove that the proposed approach could improve keyphrase extraction performance

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call