Current evidence indicates that the semantic representation of question and answer sentences is better generated by deep neural network-based sentence models than traditional methods in community answer selection tasks. In particular, as a widely recognized language model, the self-attention model computes the similarity between the specific word and the whole sets of words in the same sentence and generates new semantic representation through the similarity-weighted summation of semantic representations of the whole words. However, the self-attention operation entirely considers all the signals with a weighted sum operation, which disperses the distribution of attention, which may result in overlooking the relation of neighboring signals. This issue becomes serious when applying the self-attention model to online community question answering platforms because of the varied length of the user-generated questions and answers. To address this problem, we introduce an attention mechanism enhanced local self-attention (LSA), which restricts the range of original self-attention by a local window mechanism, thereby scaling linearly when increasing the sequence length. Furthermore, we propose stacking multiple LSA layers to model the relationship of multiscale <inline-formula> <tex-math notation="LaTeX">$n$</tex-math> </inline-formula>-gram features. It captures the word-to-word relationship in the first layer and then captures the chunk-to-chunk (such as lexical <inline-formula> <tex-math notation="LaTeX">$n$</tex-math> </inline-formula>-gram phrases) relationship in its deeper layers. We also test the effectiveness of the proposed model by applying the learned representation through the LSA model to a Siamese and a classification network in community question answer selection tasks. Experiments on the public datasets show that the proposed LSA achieves a good performance.
Read full abstract