Abstract

In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from another information source. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm (distinguished from the EM algorithm of Dempster et al., 1977) based on the information geometry of positive definite matrices. We will report promising results on bacteria clustering experiments using two marker sequences: 16S and gyrB.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.