Abstract

We consider multi-response and multi-task regression models, where the parameter matrix to be estimated is expected to have an unknown grouping structure. The groupings can be along tasks, or features, or both, the last one indicating a bi-cluster or “checkerboard” structure. Discovering this grouping structure along with parameter inference makes sense in several applications, such as multi-response Genome-Wide Association Studies (GWAS). By inferring this additional structure we can obtain valuable information on the underlying data mechanisms (e.g., relationships among genotypes and phenotypes in GWAS). In this paper, we propose two formulations to simultaneously learn the parameter matrix and its group structures, based on convex regularization penalties. We present optimization approaches to solve the resulting problems and provide numerical convergence guarantees. Extensive experiments demonstrate much better clustering quality compared to other methods, and our approaches are also validated on real datasets concerning phenotypes and genotypes of plant varieties.

Highlights

  • We consider multi-response and multi-task regression models, which generalize single-response regression to learn predictive relationships between multiple input and multiple output variables, referred to as tasks (Borchani et al, 2015)

  • Convex bi-clustering method (Chi et al, 2014) aims at grouping observations and features in a data matrix; while our approaches aim at discovering groupings in the parameter matrix of multi-response regression models while jointly estimating such a matrix, and the discovered groupings reflect groupings in features and responses

  • We introduce a surrogate parameter matrix Ŵ that will be used for bi-clustering

Read more

Summary

INTRODUCTION

We consider multi-response and multi-task regression models, which generalize single-response regression to learn predictive relationships between multiple input and multiple output variables, referred to as tasks (Borchani et al, 2015). A motivating example is that of multi-response Genome-Wide Association Studies (GWAS) (Schifano et al, 2013), where for instance a group of Single Nucleotide Polymorphisms or SNPs (input variables or features) might influence a group of phenotypes (output variables or tasks) in a similar way, while having little or no effect on another group of phenotypes. As another example, stocks values of related companies can affect the future value of a group of stocks .

Contributions
Related Work
Roadmap
PROBLEM STATEMENT AND PROPOSED METHODS
Formulation 1: “Hard Fusion”
Formulation 2: “Soft Fusion”
OPTIMIZATION ALGORITHMS FOR THE PROPOSED FORMULATIONS
Optimization for Formulation 1
Optimization for Formulation 2
Numerical Convergence
Weights and Sparsity Regularization
Penalty Multiplier Tuning
Result
Solution Paths
Bi-clustering Thresholds
SYNTHETIC DATA EXPERIMENTS
Performance Measures
Simulation Setup and Results
REAL DATA EXPERIMENTS
Phenotypic Trait Prediction From Remote Sensed Data
Multi-Response GWAS
CONCLUDING REMARKS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.