Noisy Text Data Research Articles

Community Question Answering (CQA) sites provide knowledge sharing facility as the users can post questions and other users can share their answers. The selection of top-quality answers from the set of answers in a thread is a significant and challenging task in Natural Language Processing (NLP). To address this issue, we propose a deep learning based spatial temporal Bidirectional Long Short-Term Memory (Bi-LSTM) algorithm. The existing studies mainly focus only computing semantic similarity between questions and answers using votes given by the users. The proposed hybrid approach, based on both forward and backward, consider question to answer and answer to answer similarity. The forward LSTM captures the spatial impact of the answer to estimate the relevancy, whereas the backward LSTM learns temporal features with the answer to predict the best quality answer. Moreover, spatial Bi-LSTM captures past and future dependencies for a better understanding of context and to improve the effectiveness of answer selection. For extracting meaningful information from noisy text data, data is preprocessed following standard steps such as tokenization, parsing, lemmatization, stop words removal, part of speech tagging and entities extraction. Word embeddings-based Paragraph to vector (par2vec) has additional input nodes to represent paragraph information in vector for context understanding. The empirical analysis carried out on the SemEval CQA dataset shows that the proposed model outperforms state-of-art answer selection approaches.

Read full abstract

The proliferation of Internet has not only led to the generation of huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks, etc. The data generated from online communication acts as potential gold mines for discovering knowledge, particularly for market researchers. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. The chief bottleneck for designing text mining systems for handling blogs arise from the fact that online communication text data are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. We have proposed a framework in which these texts are first cleaned using domain knowledge and then subjected to mining. Ours is a semi-automated approach, in which the system aids in the process of knowledge assimilation for knowledge-base building and also performs the analytics. Domain experts ratify the knowledge base and also provide training samples for the system to automatically gather more instances for ratification. The system identifies opinion expressions as phrases containing opinion words, opinionated features and also opinion modifiers. These expressions are categorized as positive or negative with membership values varying from zero to one. Opinion expressions are identified and categorized using localized linguistic techniques. Opinions can be aggregated at any desired level of specificity i.e. feature level or product level, user level or site level, etc. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions crawled from a set of pre-defined blogs.

Read full abstract

Noisy Text Data Research Articles

Related Topics

Articles published on Noisy Text Data

Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

Context-aware Answer Selection in Community Question Answering Exploiting Spatial Temporal Bidirectional Long Short-Term Memory

Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods

Opinion mining from noisy text data

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Noisy Text Data Research Articles

Related Topics

Articles published on Noisy Text Data

Improving End-to-End Speech Translation by Leveraging Auxiliary Speech and Text Data

Context-aware Answer Selection in Community Question Answering Exploiting Spatial Temporal Bidirectional Long Short-Term Memory

Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods

Opinion mining from noisy text data