The technology of military equipment entity extraction, a crucial component in constructing military knowledge bases, holds significant research value and theoretical importance for guiding the development and improvement of equipment support forces. In the military domain, equipment entities exhibit a phenomenon of nesting, where one entity is contained within another, and abbreviations or codes are frequently used to represent these entities. To address this complexity, this paper proposes a method named CoTNER for extracting entities. Initially, a large-scale language model is used to perform data augmentation with chain-of-thought on the original dataset, providing additional semantic and contextual information. Subsequently, the augmented dataset is fine-tuned on a small-scale language model to adapt it to the task of military equipment entity extraction and to enhance its ability to learn complex rules specific to the domain of military equipment. Additionally, a high-quality data filtering strategy based on instruction-following difficulty scoring is proposed to address the catastrophic forgetting issue that may occur during the fine-tuning of large language models. The experimental results demonstrate that the proposed military equipment entity extraction method outperforms mainstream traditional deep learning methods, validating the effectiveness of CoTNER.
Read full abstract