Abstract
We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.