This paper firstly compares the current research status of text sentiment analysis and potential customer identification, and introduces the process of building sentiment dictionaries and feature selection, feature screening, and common classification algorithms in text analysis. Secondly, around the most used tool for sentiment analysis, sentiment dictionary, the sentiment polarity discriminative rules of sentiment words are studied. In response to the shortcomings of using a single recognition algorithm in the current process of building sentiment dictionaries, an improved integration rule is designed and an automatic construction method for domain sentiment dictionaries in the social media environment is proposed. Then, this paper analyzes the sentiment topic information existing in user-generated content and adds the domain sentiment lexicon to the joint sentiment topic model as a posteriori information to extract the sentiment topic features, based on which the feature engineering study of potential customer identification is conducted and the feature set is constructed. In addition, a sample resampling method and a diverse integration framework for unbalanced data are designed to work together for the prospect identification task under data skewing in response to the category imbalance in real data. Finally, an experimental study is conducted using a social media text corpus to validate the proposed method in this paper. The proposed domain sentiment lexicon construction method and the joint domain sentiment topic-based lead identification method show good performance in different control group experiments. This paper provides an in-depth study on the construction of domain sentiment lexicon and imbalance classification in theory and provides solutions for companies to discover potential customers in practice, which has certain theoretical significance and practical value.
Read full abstract