Abstract
Document representation is the basis of language modeling. Its goal is to turn natural language text that flows into a structured form that can be stored and processed by a computer. The bag-of-words model is used by most of the text-representation methods that are currently available. And yet, they do not consider how phrases are used in the text, which hurts the performance of tasks that use natural language processing later on. Representing the meaning of text by phrases is a promising area of future research, but it is hard to do well because phrases are organized in a hierarchy and mining efficiency is low. In this paper, we put forward a method called hierarchical text semantic representation using the knowledge graph (HTSRKG), which uses syntactic structure features to find hierarchical phrases and knowledge graphs to improve how phrases are evaluated. First, we use CKY and PCFG to build the syntax tree sentence by sentence. Second, we walk through the parse tree using the hierarchical routing process to obtain the mixed phrase semantics in passages. Finally, the introduction of the knowledge graph improves the efficiency of text semantic extraction and the accuracy of text representation. This gives us a solid foundation for tasks involving natural language processing that come after. Extensive testing on actual datasets shows that HTSRKG surpasses baseline approaches with respect to text semantic representation, and the results of a recent benchmarking study support this.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.