Abstract

Modern science frequently involves the study of complex relationships among effects and factors. Flexible statistical tools are commonly used to visualize nonlinear associations. When our interest is to study the discrimination capacity of a multivariate marker on a binary outcome, the theoretical transformation leading to the optimal results in terms of sensitivity and specificity has already been settled. It is particularly useful to know this function, not only to allocate items to groups, but also to understand the relationship between the multivariate marker and the outcome. In this paper, we explore the use of the multivariate kernel density estimator in order to approximate such transformation. Large sample properties of the finally derived estimator are outlined, while its finite sample behavior is studied via Monte Carlo simulations. We consider six different bivariate and three additional higher-dimensional scenarios. The performance of the estimator is studied by using four different tuning parameters computed automatically. Besides a cross-validation algorithm is incorporated with the aim of reducing the potential overfitting. The proposed methodology is applied in order to study the capacity of two molecular characteristics to predict the toxicity of some chemical products. Results suggest that smoothing techniques are promising classical and simple statistical tools which can be used for a better understanding of some current scientific problems. However, the incorporation of additional machine learning techniques such as cross-validation is advisable in order to control the frequently over optimistic results, specially in those cases with small sample size. The function implementing the proposed methodology is provided as supplementary material.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call