Abstract
Social media is one of the platforms for many people to express their sentiments. However, more and more post on social media contain suicide information. At present, the rapid generation of online suicide post cannot be balanced with the speed of manual annotation, which makes it very difficult to establish a fast and real-time predictive model. In order to identify suicidal actors on social media, this study proposes a mechanism to build a small sample of suicidal ideation prediction model. Through this mechanism, the model selects appropriate sample information and obtains accurate annotation data at the lowest cost. Natural language processing technology is quite complex. In order to ensure the quality of the text representation of the input value, this study adopts the BERT with multi-head attention mechanism as the text representation extraction model. Considering that social media content is usually used to express personal sentiments, the text representation must be weighted by a layer of text sentiment representation. We use the Random Forest of the ensemble learning concept as the classifier. In order to ensure that the model uses the most informative data, this research also uses active learning. Our proposed architecture has achieved exciting results on small data sets. The random forest model is paired with an active learning mechanism to train weighted and unweighted text representation, and then make predictions on the test set. The results show that the combination of weighted text representation and active learning can improve the accuracy of the baseline classifier by 10%. The framework proposed in this study can increase the accuracy of the baseline model and reduce the labor and time costs of a large amount of data labeling. It is expected to help suicide prevention and control related units to explore the difficulty of the Internet.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.