Abstract

In several research areas, it is common to have a dataset with more explanatory variables than the number of observations, called high-dimensional data. This condition can lead to multicollinearity problem. The least absolute shrinkage and selection operator (LASSO) solves the problem by shrinking the estimated coefficient to zero so that it can simultaneously carry on the variable selection and the parameter estimation. But LASSO performs poorly when the data contains some outliers in the response or explanatory variables. Robust methods have addressed this problem based on the least-absolute-deviation approach, such as LAD-LASSO and WLAD-LASSO. This current research aims to evaluate the performance of the LAD-LASSO and WLAD-LASSO methods on high-dimensional and low-dimensional data containing outliers. To evaluate the performance of these methods, the simulation study was conducted. The simulation study used three scenarios (without outliers, outliers on the response variable (5%, 10%, 15%), outliers both on the response and explanatory variables (5%, 10%, 15%)). We also used the Minimum Regularized Covariance Determinant (MRCD) estimator in calculating the weights on the WLAD-LASSO. The best method from this simulation then will be applied to sembung leaf extract data to identify antioxidant marker compounds in sembung leaf extract. The simulation results show that LAD-LASSO tends to be very tight in selecting, while LASSO tends to be too loose. Meanwhile, WLAD-LASSO is in the middle of those two techniques and performs the best in identifying the important variables correctly. Even the existence of weights cause WLAD-LASSO more robust against the presence of outliers in the response and explanatory variables compared to LAD-LASSO. Furthermore, performance of these methods on high-dimensional data decrease compared to low-dimensional data. The performance of these methods also tends to decrease when the rate of outlier increases. The WLAD-LASSO was then implemented in actual data to find the compound of antioxidant markers in the sembung leaf extract. The compounds/formulas obtained are Umbelliferone, 12-Hydroxyjasmonic Acid, C22H14N8O2, and Acetyleugenol (with a prediction error is 0.133050). These compounds/formulas can be developed as natural antioxidants and have the potential to be developed as medicinal ingredients.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call