Abstract

Survival analysis is a statistical technique mainly used to analyze time-to-event data. Identification of influential observation attains greater importance since it leads to discovering new prognostic factors. Influential observation in survival typically points to individuals whose survival time is extremely short or long in comparison to others. Particularly, when the data possess more covariates than the observations, all classical approaches fail to perform. Hence, dimensionality reduction is necessary for choosing appropriate variables and it has been done by popular techniques such as LASSO and elastic net algorithm. This paper consider high-dimensional breast cancer data, and its dimensionality is reduced using variable selection methods. Subsequently, the rank product test and martingale residuals are used to identify an influential observation. Furthermore, a resampling technique is used to validate the consistency and robustness of the methods. The novelty of this paper lies in comparing the prediction accuracy of datasets with and without outliers using Random Survival Forest (RSF) for different training fractions. Comparatively, the RSF result demonstrates that the LASSO approach outperform others in the absence of outliers. Therefore, we suggest reducing dimensionality using the LASSO variable selection technique first, followed by removing likely outliers to improve the performance of classification algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call