In this paper, three ensemble methods: Random Forest, XGBoost, and a Hybrid Ensemble method were implemented to classify imbalanced pulsar candidates. To assist these methods, tree models were used to select features among 30 features of pulsar candidates from references. The skewness of the integrated pulse profile, chi-squared value for sine-squared fit to amended profile and best S/N value play important roles in Random Forest, while the skewness of the integrated pulse profile is one of the most significant features in XGBoost. More than 20 features were selected by their relative scores and then applied in three ensemble methods. In the Hybrid Ensemble method, we combined Random Forest and XGBoost with EasyEnsemble. By changing thresholds, we tried to make a trade-off between Recall and Precision to make them approximately equal and as high as possible. Experiments on HTRU 1 and HTRU 2 datasets show that the Hybrid Ensemble method achieves higher Recall than the other two algorithms. In HTRU 1 dataset, Recall, Precision, and F-Score of the Hybrid Ensemble method are $0.967$, $0.971$, and $0.969$, respectively. In HTRU 2 dataset, the three values of that are $0.920$, $0.917$, and $0.918$, respectively.
Read full abstract