Abstract

We propose stringing, a class of methods where one views high-dimensional observations as functional data. Stringing takes advantage of the high dimension by representing such data as discretized and noisy observations that originate from a hidden smooth stochastic process. Assuming that the observations result from scrambling the original ordering of the observations of the process, stringing reorders the components of the high-dimensional vectors, followed by transforming the high-dimensional vector observations into functional data. Established techniques from functional data analysis can be applied for further statistical analysis once an underlying stochastic process and the corresponding random trajectory for each subject have been identified. Stringing of high-dimensional data is implemented with distance-based metric multidimensional scaling, mapping high-dimensional data to locations on a real interval, such that predictors that are close in a suitable sample metric also are located close to each other on the interval. We provide some theoretical support, showing that under certain assumptions, an underlying stochastic process can be constructed asymptotically, as the dimension p of the data tends to infinity. Stringing is illustrated for the analysis of tree ring data and for the prediction of survival time from high-dimensional gene expression data and is shown to lead to new insights. In regression applications involving high-dimensional predictors, stringing compares favorably with existing methods. The theoretical results and proofs and also additional simulation results are provided in online Supplemental Material.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call