Abstract

Sentiment analysis and spam detection of social media text messages are two challenging data analysis tasks due to sparse and high-dimensional feature vectors. Learning classifier systems (LCS) are rule-based evolutionary computing systems and have limited capabilities to handle real valued sparse high-dimensional big data sets. LCS techniques use interval based representations to handle real valued feature vectors. In the work presented here, interval based representation is replaced by genetic programming based tree like structures to classify high-dimensional real valued text feature vectors. Multiple experiments are conducted on different social media text data sets, i.e. tweets, movie reviews, amazon and yelp reviews, SMS and Email spam message to evaluate the proposed scheme. Real valued feature vectors are generated from these data sets using term frequency inverse document frequency and/or sentiment lexicons-based features. Results depicts the supremacy of the new encoding scheme over interval based representations in both small and large social media text data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.