Abstract

Scale inconsistency is a widely encountered issue in multi-output learning problems. Specifically, target sets with multiple real valued or a mixture of categorical and real valued variables require addressing the scale differences to obtain predictive models with sufficiently good performance. Data transformation techniques are often employed to solve that problem. However, these operations are susceptible to different shortcomings such as changing the statistical properties of the data and increase the computational burden. Scale differences also pose problem in semi-supervised learning (SSL) models as they require processing of unsupervised information where distance measures are commonly employed. Classical distance metrics can be criticized as they lose efficiency when variables exhibit type or scale differences, too. Besides, in higher dimensions distance metrics cause problems due to loss of discriminative power. This paper introduces alternative semi-supervised tree-based strategies that are robust to scale differences both in terms of feature and target variables. We propose use of a scale-invariant proximity measure by means of tree-based ensembles to preserve the original characteristics of the data. We update classical tree derivation procedure to a multi-criteria form to resolve scale inconsistencies. We define proximity based clustering indicators and extend the supervised model with unsupervised criteria. Our experiments show that proposed method significantly outperforms its benchmark learning model that is predictive clustering trees.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.