Abstract

Text readability is very important in meeting people's information needs. With the explosive growth of modern information, the measurement demand of text readability is increasing. In view of the text structure of words, sentences, and texts, a hybrid network model based on convolutional neural network is proposed to measure the readability of English texts. The traditional method of English text readability measurement relies too much on the experience of artificial experts to extract features, which limits its practicability. With the increasing variety and quantity of text readability measurement features to be extracted, it is more and more difficult to extract deep features manually, and it is easy to introduce irrelevant features or redundant features, resulting in the decline of model performance. This paper introduces the concept of hybrid network model in deep learning; constructs a hybrid network model suitable for English text readability measurement by combining convolutional neural network, bidirectional long short-term memory network, and attention mechanism network; and replaces manual automatic feature extraction by machine learning, which greatly improves the measurement efficiency and performance of text readability.

Highlights

  • It is generally believed that convolutional neural network (CNN) is good at capturing local features of language, while long shortterm memory network (LSTM) is good at processing sequence data and capturing long-distance dependent information

  • Fu et al [29] used CNN-BiLSTM hybrid network model for beautiful sentence recognition. rough experimental comparison with CNN and BiLSTM networks, the results show that the hybrid network model can achieve higher accuracy

  • In theory, long short-term memory networks should be better at dealing with the task of sequence data input than convolutional neural network (CNN), but from the experimental results, the performance of long short-term memory network is slightly inferior to convolutional neural network model. e reason for this result may be that our network model takes the whole text sequence composed of word sequence as the input, and the length of the sequence is uncertain and long, which limits the performance of LSTM to a certain extent

Read more

Summary

Introduction

Size of L2 regularized lambda LSTM hidden layer size Batch size Length of sequence. — — — 0.836 intermediate state of the cyclic network. Size of L2 regularized lambda LSTM hidden layer size Batch size Length of sequence. — — — 0.836 intermediate state of the cyclic network. Experiments show that it is better to retain the output of the intermediate state and connect the pooling layer. We need to properly consider the selection of the pooling layer connected after LSTM output. Considering the use of the attention mechanism layer in the pooling layer will get the best model effect and can achieve an accuracy (ACC) of 0.886 and a Pearson correlation coefficient (PCC) of 0.938 on the WeeBit data set

English Text Readability Measurement Based on Convolutional Neural Network
Word Vector
Convolution Layer and Max-Pooling Layer
Circulation Layer
Case Study
Pearson correlation
Experimental Environment and Super
Comparison with CNN and LSTM Related Models
Comparison with the Existing Traditional Methods
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call