Abstract

Tamil literature has many valuable thoughts that can help the human community to lead a successful and a happy life. Tamil literary works are abundantly available and searched on the World Wide Web (WWW), but the existing search systems follow a keyword-based match strategy which fails to satisfy the user needs. This necessitates the demand for a focused Information Retrieval System that semantically analyses the Tamil literary text which will eventually improve the search system performance. This paper proposes a novel Information Retrieval framework that uses discourse processing techniques which aids in semantic analysis and representation of the Tamil Literary text. The proposed framework has been tested using two ancient literary works, the Thirukkural and Naladiyar, which were written during 300 BCE. The Thirukkural comprises 1330 couplets, each 7 words long, while the Naladiyar consists of 400 quatrains, each 15 words long. The proposed system, tested with all the 1330 Thirukkural couplets and 400 Naladiyar quatrains, achieved a mean average precision (MAP) score of 89%. The performance of the proposed framework has been compared with Google Tamil search and a keyword-based search which is a substandard version of the proposed framework. Google Tamil search achieved a MAP score of 56% and keyword-based method achieved a MAP score of 62% which shows that the discourse processing techniques improves the search performance of an Information Retrieval system.

Highlights

  • Tamil language and literature have a long and glorious tradition, with the written form of the Tamil language dating back to 600 Before the Common Era (BCE)

  • Though many literary types of text are available online, they are not completely accessed by the users due to the lack of language tools. This paper addresses such issues and has attempted to enhance the current language tools such as Morphological Analyzer

  • This paper puts forth a semantic analysis that can go well with both current and literary types of Tamil text in order to increase the performance. Tamil literary works such as the Thirukkural and Naladiyar are didactic in nature, conveying information using words of a fixed length

Read more

Summary

INTRODUCTION

Tamil language and literature have a long and glorious tradition, with the written form of the Tamil language dating back to 600 Before the Common Era (BCE). This paper puts forth a semantic analysis that can go well with both current and literary types of Tamil text in order to increase the performance Tamil literary works such as the Thirukkural and Naladiyar are didactic in nature, conveying information using words of a fixed length. Offline processing involves constructing a discourse parser that captures the semantic relations in the couplets and quatrains of the Thirukkural and Naladiyar, resulting in a discourse structure. A user query is processed to make it compatible with the index structure It is matched with the indices to retrieve the couplets and quatrains requested, with relevant explanations. It proposes a discourse-based indexing technique to semantically retrieve relevant Tamil text and literature information from the web. The last section presents the results and discussion, followed by a section that concludes the paper and offers directions for future work

RELATED WORKS
Background
If the word in query has any of the following
Let DisRel be the discourse relations identified from the query
AND DISCUSSION
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.