Abstract

Social media networks have evolved as a large repository of short documents and gives the greater challenges to effectively retrieve the content out of it. Many factors were involved in this process such as restricted length of a content, informal use of language (i.e., slangs, abbreviations, styles, etc.) and low contextualization of the user generated content. To meet out the above stated problems, latest studies on context-based information searching have been developed and built on adding semantics to the user generated content into the existing knowledge base. And also, earlier, bag-of-concepts has been used to link the potential noun phrases into existing knowledge sources. Thus, in this paper, we have effectively utilized the relationships among the concepts and equivalence prevailing in the related concepts of the selected named entities by deriving the potential meaning of entities and find the semantic similarity between the named entities with three other potential sources of references (DBpedia, Anchor Texts and Twitter Trends).

Highlights

  • Searching on the micro blogging system has been heavily suffered with data sparseness and data redundancy

  • In order to bring out the semantic proximity between the set of ambiguous mentions from DBpedia and its candidate entity, we have measured the semantic similarity by considering the weight and the path exist between the connected nodes

  • Once the potential named entities have been identified from the Twitter datasets using any of the above three methods described, the crucial task would be to assign the extracted named entities into the predefined types of its classes such as person, product, geographical locations, time, company etc., Though many Information Retrieval (IR) techniques had been proposed for document processing in information retrieval (Ifrim, Shi, & Brigadir, 2014; Liang et al, 2014), it has failed to categorize the entities into its associated domains or classes and when it is extracted from unstructured text such as Twitter Streams

Read more

Summary

Introduction

Searching on the micro blogging system has been heavily suffered with data sparseness and data redundancy. To overcome the above problems, it is deemed to model the semantic based retrieval system which removes the ambiguity persists over the text (i.e. unstructured text) and links the entities in the text to the appropriate real-world entity sets It has brought into the focus of entity-based retrieval system over the micro blogging search operations and disambiguates the entities with the populated knowledge base ontologies (such as DBpedia, Freebase, YAGO, etc). It has encountered with many disambiguates which are persisting in large numbers and yields the contradictory results To shun those entity disambiguates, we proposed the three ways strategic approaches such as DBpedia based Semantic Measure, Anchor Text based Cosine Similarity and Twitter Popularity Trend Detection to effectively filter out the disambiguated entities and mapped exactly to the given tweet(s) context. We have preferred this topic for empirical analysis since it has attained huge reach and collected high volume of responses for the topic

Related works
Proposed semantic retrieval context
DBpedia based entity disambiguation
Entity labeling
Disambiguation pages
Redirect pages
Anchor text-based similarity measure
Twitter popularity-based trends measure
Classification of named entities
Empirical results
Conclusion
Findings
Further research directions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.