Quantifying Feature Importance for Detecting Depression using Random Forest

Hatoon Alsagri,Mourad Ykhlef

doi:10.14569/ijacsa.2020.0110577

Abstract

Feature selection based on importance is a funda-mental step in machine learning models because it serves as a vital technique to orient the use of variables to what is most efficient and effective for a given machine learning model. In this study, an explainable machine learning model based on Random forest, is built to address the problem of identification of depression level for Twitter users. This model reflects its transparency through calculating its feature importance. There are several techniques to quantify the importance of features. However, in this study, random forest is used as both a classifier, which has over-performing aspects over many classifiers such as decision trees, and a method for weighting the input features as their importance imply. In this study, the importance of features is measured using different techniques including random forest, and the results of these techniques are compared. Furthermore, feature importance uses the concept of weighting the input variables inside a complete system for recommending a solution for depressed persons. The experimental results confirm the superiority of random forest over other classifiers using three different methods for measuring the features importance. The accuracy of random forest classification reached 84.7%, and the importance of features increased the classifier accuracy to 84.9%.

Highlights

Depression is a leading cause of disability worldwide and a common mental illness
More than 300 million people are estimated to suffer from depression every year1.Faceto-face clinical diagnose is need to diagnose depression but 70% of the patients would not consult a doctor when they are at early stages of depression
Several studies have reported that the diagnosis of mental illnesses has increased because of the use of social media platforms [2] [2], and these mental illnesses are one of the leading causes of disability and among the most of the devastating diseases that individuals suffer from worldwide according to the World Health Organization [3], [4], [5]

Summary

Introduction

More than 300 million people are estimated to suffer from depression every year.Faceto-face clinical diagnose is need to diagnose depression but 70% of the patients would not consult a doctor when they are at early stages of depression. This might cause patients to reach advance stages in their condition [1]. This is beneficial for machine learning researchers, this makes the data high dimensional it is quite common for datasets to have hundreds of features or more in most of the cases. They help in understanding the features and their importance

Methods

Results

Conclusion