Abstract

The problems of automatic analysis and representation of human language have been clear since the inception of Natural Language Processing (NLP). Machines can be easily fooled when it comes to interpreting sentences and extracting meanings from texts. Semantically-driven processing needs deep understanding of natural languages by machines, and algorithms relying on word co-occurrence and frequencies can not activate semantically-related concepts/experiences as human brain does.This thesis presents computational methods for semantic analysis and quantifying the meaning of short scientific texts in a new light. Methods of this research attempt to extract semantic features that are not explicitly expressed in the text, and provide predictions about human cognition. Rather than psychological properties, we describe the situation of use of words for scientific texts by scientifically specific description - subject categories of the text.First, this thesis investigates Bag of Words model on a corpus of students' answers. Automated scoring systems were created for marking of short answer questions and for providing feedback to students on their answers. Students' marks were predicted by a mathematical model through words selected to transmit information.Second, we introduced novel techniques for quantifying the meaning for words and then texts. Leicester Scientific Corpus (LSC) and Leicester Scientific Thesaurus (LScT) were built for empirical studies. LSC is a corpus of 1,673,350 scientific texts and LScT is a thesaurus of 5,000 words extracted from the LSC. Methodologies for semantic analysis were developed based on informational representation of the meaning extracted from the occurrence of the word in texts across the scientificcategories. Vector representation of words was created in the newly constructed Meaning Space (MS), and utilised in representing text meaning. Feature Vector of Text (FVT) were introduced and created for LSC texts as a vector representation of meaning. This approach obtains superior performance to standard frequency representation in identifying scientific-specific meanings.Finally, this thesis presents a research in evaluating the impact of scientific articles through their informational semantics. Newly developed approach to meaning have offered a way to predict the scientific impact of papers, and the study details examples of text classification going from 80% success to distinguish highly-cited and less-cited papers.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.