Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection

Ravi Krishna,Kurt Keutzer,Norman Mu

doi:10.1145/3392047

Abstract

Over the past decade, through a mixture of optical character recognition and manual input, there is now a growing corpus of Tibetan literature available as e-texts in Unicode format. With the creation of such a corpus, the techniques of text analytics that have been applied in the analysis of English and other modern languages may now be applied to Tibetan. In this work, we narrow our focus to examine a modest portion of that literature, the Mind-section portion of the literature of the Tibetan tradition of the Great Perfection. Here, we will use the lens of text analytics tools based on machine learning techniques to investigate a number of questions of interest to scholars of this and related traditions of the Great Perfection. It has been necessary for us to participate in all portions of this process: corpora identification and text edition selection, rendering the text as e-texts in Unicode using both Optical Character Recognition and manual entry, data cleaning and transformation, implementation of software for text analysis, and interpretation of results. For this reason, we hope this study can serve as a model for other low-resource languages that are just beginning to approach the problem of providing text analytics for their language.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Mar 31, 2021
Citations: 1

Similar Papers

Evaluating the accuracy of lung-RADS score extraction from radiology reports: Manual entry versus natural language processing
Amir Gandomi ... Stuart Cohen
International Journal of Medical Informatics | VOL. 191
Amir Gandomi, et. al.Amir Gandomi ... Stuart Cohen
01 Jul 2024
International Journal of Medical Informatics | VOL. 191

User Collaboration in Mass Digitisation of Textual Materials
...
-
, et. al. ...
01 Jan 2009
01 Jan 2009

A Data Entry Optical Character Recognition Tool using Convolutional Neural Networks
Samarth Ghulyani ... Sarthak Joshi
-
Samarth Ghulyani, et. al.Samarth Ghulyani ... Sarthak Joshi
20 May 2022
20 May 2022

Usable OCR: what are the minimum performance requirements?
William H Cushman ... Purnendu S Ojha
-
William H Cushman, et. al.William H Cushman ... Purnendu S Ojha
01 Jan 1990
01 Jan 1990

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Applying Text Analytics to the Mind-section Literature of the Tibetan Tradition of the Great Perfection

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing