Abstract
We consider multi-response and multi-task regression models, where the parameter matrix to be estimated is expected to have an unknown grouping structure. The groupings can be along tasks, or features, or both, the last one indicating a bi-cluster or “checkerboard” structure. Discovering this grouping structure along with parameter inference makes sense in several applications, such as multi-response Genome-Wide Association Studies (GWAS). By inferring this additional structure we can obtain valuable information on the underlying data mechanisms (e.g., relationships among genotypes and phenotypes in GWAS). In this paper, we propose two formulations to simultaneously learn the parameter matrix and its group structures, based on convex regularization penalties. We present optimization approaches to solve the resulting problems and provide numerical convergence guarantees. Extensive experiments demonstrate much better clustering quality compared to other methods, and our approaches are also validated on real datasets concerning phenotypes and genotypes of plant varieties.
Highlights
We consider multi-response and multi-task regression models, which generalize single-response regression to learn predictive relationships between multiple input and multiple output variables, referred to as tasks (Borchani et al, 2015)
Convex bi-clustering method (Chi et al, 2014) aims at grouping observations and features in a data matrix; while our approaches aim at discovering groupings in the parameter matrix of multi-response regression models while jointly estimating such a matrix, and the discovered groupings reflect groupings in features and responses
We introduce a surrogate parameter matrix Ŵ that will be used for bi-clustering
Summary
We consider multi-response and multi-task regression models, which generalize single-response regression to learn predictive relationships between multiple input and multiple output variables, referred to as tasks (Borchani et al, 2015). A motivating example is that of multi-response Genome-Wide Association Studies (GWAS) (Schifano et al, 2013), where for instance a group of Single Nucleotide Polymorphisms or SNPs (input variables or features) might influence a group of phenotypes (output variables or tasks) in a similar way, while having little or no effect on another group of phenotypes. As another example, stocks values of related companies can affect the future value of a group of stocks .
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.