Abstract

XCS, an evolutionary computing technique, can classify data using both bit strings and real valued representations. Real valued XCS (XCSR) commonly uses the min max interval based representation (MMR) for continuous valued data sets. Text data sets can be represented using bag of words based real valued representation, e.g. term frequency inverse document frequency of features. In this work we classify social media short informal text messages using XCSR, for the first time, from two major domains, i.e. spam detection and sentiment analysis. We perform spam detection of SMS and Email messages, and sentiment analysis of reviews and tweets. Feature vectors extracted from short text messages are very sparse and XCSR with MMR representation can not handle sparse data sets very well. We proposed XCSR# that uses MMR representation with explicit don't care intervals to handle sparse social media data sets. The experimental results indicate that introduction of the explicit don't care intervals improved the performance and created a statistically significant impact, specifically in the spam detection data sets. Further, it is observed that XCSR# produced more accurate and general rules than XCSR.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.