Abstract

The research proposes the neural network methods to include a textual dependency tree structure in classification tasks of Russian texts. Author profiling task of gender identification was chosen to test the models, and two corpora used in experiments: based on a crowdsource, and in-person polling. The first approach is based on a long short-term memory (LSTM) layers, and developed graph embedding algorithm. The second one is based on a graph convolution network and LSTM. Two syntactic parsers were used to obtain dependency trees from the texts. Input data was represented in different forms: morphological binary vectors, FastText vectors, and their combination. The developed models result was compared to the state-of-the-art, that is neural network model based on a convolutional and LSTM layers. Finally, we demonstrate that including textual dependency tree structure to input feature space improves f1-score of gender classification task on 4% for the RusPersonality dataset, and 7% for the crowdsource dataset in average. The developed models resulting f1-score is 84% and 83%, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.