One of the significant issues in a smart city is maintaining a healthy environment. To improve the environment, huge amounts of data are gathered, manipulated, analyzed, and utilized, and these data might include noise, uncertainty, or unexpected mistreatment of the data. In some datasets, the class imbalance problem skews the learning performance of the classification algorithms. In this paper, we propose a case-based reasoning method that combines the use of crowd knowledge from open source data and collective knowledge. This method mitigates the class imbalance issues resulting from datasets, which diagnose wellness levels in patients suffering from stress or depression. We investigate effective ways to mitigate class imbalance issues in which the datasets have a higher proportion of one class over another. The results of this proposed hybrid reasoning method, using a combination of crowd knowledge extracted from open source data (i.e., a Google search, or other publicly accessible source) and collective knowledge (i.e., case-based reasoning), were that it performs better than other traditional methods (e.g., SMO, BayesNet, IBk, Logistic, C4.5, and crowd reasoning). We also demonstrate that the use of open source and big data improves the classification performance when used in addition to conventional classification algorithms.
Read full abstract