Text Segmentation for Language Identification in Greek Forums

Pavlina Fragkou

doi:10.1016/j.sbspro.2014.07.140

Text Segmentation for Language Identification in Greek Forums

Pavlina Fragkou

Open Access

https://doi.org/10.1016/j.sbspro.2014.07.140

Copy DOI

Journal: Procedia - Social and Behavioral Sciences	Publication Date: Aug 1, 2014
Citations: 3	License type: cc-by-nc-nd

Affiliation: Technological Educational Institute of Athens

#Text Segmentation #Collecting Web Pages + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this paper, we examine the benefit of applying text segmentation methods to perform language identification in forums. The focus here is on forums containing a mixture of information written in Greek, English as well as Greeklish. Greeklish can be defined as the use of Latin alphabet for rendering Greek words with Latin characters. For the evaluation, a corpus was manually created, by collecting web pages from Greek university forums and most specifically, pages containing information that combines Greek with English technical terminology and Greeklish. The evaluation using two well known text segmentation algorithms leads to the conclusion that, despite the difficulty of the problem examined, text segmentation seems to be a promising solution.

Full Text