Abstract

Blind Source Separation (BSS) is a powerful tool for analyzing composite data patterns in many areas, such as computational biology. We introduce a novel BSS method, Convex Analysis of Mixtures (CAM), for separating non-negative well-grounded sources, which learns the mixing matrix by identifying the lateral edges of the convex data scatter plot. We propose and prove a sufficient and necessary condition for identifying the mixing matrix through edge detection in the noise-free case, which enables CAM to identify the mixing matrix not only in the exact-determined and over-determined scenarios, but also in the under-determined scenario. We show the optimality of the edge detection strategy, even for cases where source well-groundedness is not strictly satisfied. The CAM algorithm integrates plug-in noise filtering using sector-based clustering, an efficient geometric convex analysis scheme, and stability-based model order selection. The superior performance of CAM against a panel of benchmark BSS techniques is demonstrated on numerically mixed gene expression data of ovarian cancer subtypes. We apply CAM to dissect dynamic contrast-enhanced magnetic resonance imaging data taken from breast tumors and time-course microarray gene expression data derived from in-vivo muscle regeneration in mice, both producing biologically plausible decomposition results.

Highlights

  • We applied CAM to dissect a time-course gene expression dataset obtained from a mouse skeletal muscle regeneration process[25]

  • We prove for the first time a necessary and sufficient condition (i.e. assumption (A3)) for identifying the mixing matrix in non-negative well-grounded BSS problems through edge detection

  • We show the optimality of the edge detection strategy that identifies the data points with maximum source dominance, even when WGPs do not exist

Read more

Summary

CAM Theory

This section develops the theory of CAM for a noise-free scenario, including the model assumptions, identifiability, and optimality. The lateral edges of the convex cone C {A} = {∑kK=1αkak ak ∈ {A}, αk ≥ 0} are the K (mixing matrix) column vectors a1, ..., aK, if and only if (A3) holds. Theorem 1 is a direct conclusion derived from Lemmas 1 and 2 It states that for separating non-negative well-grounded sources, (A3) is a sufficient and necessary condition for an edge detection solution uniquely identifying the mixing matrix A based on the observed data X. When WGPs exist, the lateral edges of cone C{X} are the mixing matrix column vectors a1,. Even when WGPs do not exist, edge detection still can identify the optimal estimates for the mixing matrix column vectors among all observed data points. (2) If (A1), (A2), and (A3) are satisfied (which can happen in the exact-determined and over-determined cases, and in the under-determined case where there are at least three mixtures), the mixing matrix A and the number of sources are identifiable, while S cannot in general be uniquely determined

CAM Algorithm
Results
Skeletal muscle regeneration gene expression data
Conclusion and Discussion
Additional Information
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.