Learning from Heterogeneous Sources via Gradient Boosting Consensus

Xiaoxiao Shi,Jean-Francois Paiement,David Grangier,Philip S Yu

doi:10.1137/1.9781611972825.20

Abstract

Multiple data sources containing different types of features may be available for a given task. For instance, users’ profiles can be used to build recommendation systems. In addition, a model can also use users’ historical behaviors and social networks to infer users’ interests on related products. We argue that it is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models. We call this framework heterogeneous learning. In our proposed setting, data sources can include (i) nonoverlapping features, (ii) non-overlapping instances, and (iii) multiple networks (i.e. graphs) that connect instances. In this paper, we propose a general optimization framework for heterogeneous learning, and devise a corresponding learning model from gradient boosting. The idea is to minimize the empirical loss with two constraints: (1) There should be consensus among the predictions of overlapping instances (if any) from different data sources; (2) Connected instances in graph datasets may have similar predictions. The objective function is solved by stochastic gradient boosting trees. Furthermore, a weighting strategy is designed to emphasize informative data sources, and deemphasize the noisy ones. We formally prove that the proposed strategy leads to a tighter error bound. This approach consistently outperforms a standard concatenation of data sources on movie rating prediction, number recognition and terrorist attack detection tasks. We observe that the proposed model can improve out-of-sample error rate by as much as 80%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning from Heterogeneous Sources via Gradient Boosting Consensus

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

GBC: Gradient boosting consensus model for heterogeneous data†
Xiaoxiao Shi ... David Grangier
Statistical Analysis and Data Mining: The ASA Data Science Journal | VOL. 7
Xiaoxiao Shi, et. al.Xiaoxiao Shi ... David Grangier
22 May 2013
Statistical Analysis and Data Mining: The ASA Data Science Journal | VOL. 7

Identifying biologically relevant genes via multiple heterogeneous data sources
Zheng Zhao ... Yung Chang
-
Zheng Zhao, et. al.Zheng Zhao ... Yung Chang
24 Aug 2008
24 Aug 2008

Forecasting stock price movements with multiple data sources: Evidence from stock market in China
Zhongbao Zhou ... Helu Xiao
Physica A: Statistical Mechanics and its Applications | VOL. 542
Zhongbao Zhou, et. al.Zhongbao Zhou ... Helu Xiao
04 Nov 2019
Physica A: Statistical Mechanics and its Applications | VOL. 542

Heterogeneous Embedding via Aggregating Multiple Sources
Xiaoxiao Shi ... Philip S Yu
Annals of Data Science | VOL. 1
Xiaoxiao Shi, et. al.Xiaoxiao Shi ... Philip S Yu
01 Mar 2014
Annals of Data Science | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning from Heterogeneous Sources via Gradient Boosting Consensus

Abstract

Talk to us

Similar Papers