Analytic for data-driven decision-making in complex high-dimensional time-to-event data

Keivan Sadeghzadeh

doi:10.17760/d20195338

Abstract

In the era of big data, analysis of complex and huge data expends time and money, may cause errors and misinterpretations. Consequently, inaccurate and erroneous reasoning could lead to poor inference and decision-making, sometimes irreversible and catastrophic events. On the other hand, proper management and utilization of valuable data could significantly increase knowledge and reduce cost by preventive actions. In many areas, there are great interests in time and causes of events. Time-to-event data analysis is a kernel of risk assessment and has an inevitable role in predicting the probability of many events occurrence. In addition, variable selection and classification procedures are an integral part of data analysis where the information revolution brings larger datasets with more variables and it has become more difficult to process the streaming high-dimensional time-to-event data in traditional application approaches, specifically in the occurrence of censored observations. Thus, in the presence of large-scale, massive and complex data, specifically in terms of variables, applying proper methods to efficiently simplify such data is desired. Most of the traditional variable selection methods involve computational algorithms in a class of non-deterministic polynomial-time hard (NP-hard) that makes these procedures infeasible. Although recent methods may operate faster, involve different estimation methods and assumptions, their applications are limited, their assumptions cause restrictions, their computational complexities are costly, or their robustness is not consistent. This research is motivated by the importance of the applied variable reduction in complex high-dimensional time-to-event data to avoid aforementioned difficulties in decision-making and facilitate time-to-event data analysis. Quantitative statistical and computational methodologies using combinatorial heuristic algorithms for variable selection and classification are proposed. The purpose of these methodologies is to reduce the volume of the explanatory variables and identify a set of most influential variables in such datasets.

Full Text