Abstract
Non-negative least square regression (NLS) is a constrained least squares problem where the coefficients are restricted to be non-negative. It is useful for modeling non-negative responses such as time measurements, count data, histograms and so on. Existing NLS solvers are designed for cases where the predictor variables and response variables have linear relationships, and do not consider interactions among predictor variables. In this paper, we solve NLS in the complete space of power sets of variables. Such an extension is particularly useful in biology, for modeling genetic associations. Our new algorithms solve NLS problems exactly while decreasing computational burden by using an active set method. The algorithm proceeds in an iterative fashion, such that an optimal interaction term is searched by a branch-and-bound subroutine, and added to the solution set one another. The resulting large search space is efficiently restricted by novel pruning conditions and two kinds of sparsity promoting regularization; <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$l_{1}$ </tex-math></inline-formula> norm and non-negativity constraints. In computational experiments using HIV-1 datasets, 99% of the search space was safely pruned without losing the optimal variables. In mutagenicity datasets, the proposed method could identify long and accurate patterns compared to the original NLS. Codes are available from <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/afiveithree/inlars</uri> .
Highlights
Non-negative least square regression (NLS) is a constrained least squares problem where the coefficients are restricted to be non-negative
For example in computational chemistry, it is used for estimating concentrations from spectra data, in which the percentage of a composition does not take negative values [Bro and Jong, 1997]. Another example in computational biology is mass spectrometry analysis, in which observed spectrum is to be recovered by fitting templates isotope patterns [Slawski et al, 2012]
As we show later in this paper, there are promising applications in biology and chemistry in which NLS with variable interactions are useful
Summary
Non-negative least square regression (NLS) is a constrained least squares problem where the coefficients are restricted to be non-negative It is useful for modeling non-negative data such as time measurements, count data, or price. It has been introduced in [Lawson and Hanson, 1995], and since many algorithms have been developed. For example in computational chemistry, it is used for estimating concentrations from spectra data, in which the percentage of a composition does not take negative values [Bro and Jong, 1997]. Another example in computational biology is mass spectrometry analysis, in which observed spectrum is to be recovered by fitting templates isotope patterns [Slawski et al, 2012]. A review paper by [Chen and Plemmons, 2007] contains other applications such as text mining and speech recognition, as well as algorithmic solutions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.