Abstract

This is an era of search engines which have become powerful tools to retrieve information from the Web. It is the practice of using multiple names to mention the same entity that creates problem while collecting data. For example, Mohandas Karamchand Gandhi, Father of Nation, is also known as Gandhiji and with other pseudonyms. Synonyms (different names of an entity) can affect the relevance of a search engine. Extracting alias of an entity is significant for various tasks in the web such as automatic metadata extraction, entity disambiguation and social network analysis. In this paper, an alias detection framework is proposed to find alternate names of a given entity through the automatically downloaded web snippets using lexical patterns. The extracted alias names similarity scores are calculated using the string similarity measures and a novel method is introduced for detecting irrelevant alias names by ranking with the help of ELM. It is a learning algorithm for single hidden layer feed forwards neural networks having a generalization performance with a faster learning speed to train neural networks in a single iteration and its ranking performance is examined against Support Vector Machine (SVM). The ELM outperformed in terms of precision, recall and fscore as 17.28%, 90.70% and 0.34% for giving alias dataset better than SVM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call