Abstract

A large amount of texts recorded in Chinese exist in power grid enterprises. These texts contain abundant information of power system. Manually mining the text information is inefficient and the accuracy may vary with different dispatchers. In this paper, the power fault countermeasure text is taken as the object to study the power Chinese text information extraction method. Power texts are segmented firstly based on the nature language process (NLP), the ontology lexicon is established according to the power word attribute in the power fault countermeasure text; Based on the syntax structure characteristics of punctuations and the concept of separate parsing phrase are brought in to guide the division of long texts, which can separate the sentence with only one power entity and its related information; The syntax rule template applicable to the separate parsing phrase is established based on the meta-character templates (generalization slot, fixed word-combination, wildcard character, and registry function) used for the power fault preplan text information extraction and the structured output of that information; At last, the generalization ability and the universality of the template are analyzed. Examples show that the rule template applies to the information extraction of most texts with strong universality and high accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call