Transformation-based named entity extraction from spoken content for personal memory aid

Ji-Hwan Kim

doi:10.1109/tce.2010.5681147

Abstract

This paper proposes an automatic transformation-based rule-based Named Entity (NE) extraction from spoken content for personal memory aid. The proposed automatic rule inference based on transformation has shown itself to be a viable alternative to the stochastic approach in NE extraction, while retaining the advantages of a rule-based approach: lightweight memory requirements and computation, and extensible to the inclusion of personal information. The performance of the proposed system is compared with one of the successful stochastic systems. When only the sequences of words are available, both systems show almost equal performance as is also the case with additional information such as punctuation, capitalisation and name lists. The best results of the proposed system were measured at 0.9134 in terms of F-measure. In cases where input texts are corrupted by speech recognition errors, the performance of both systems is degraded by almost the same level (0.0062 of F-measure loss per 1% of additional speech recognition error). However, the proposed system requires only 94KB and simple computation on the transition diagram of a finite automata.

Full Text