Contrastive pre-training for sequence based genomics models.

Ksenia Sokolova,Olga Troyanskaya,Kathleen M Chen

doi:10.1101/2024.06.10.598319

Abstract

In recent years deep learning has become one of the central approaches in a number of applications, including many tasks in genomics. However, as models grow in depth and complexity, they either require more data or a strategic initialization technique to improve performance. In this project, we introduce cGen, a novel unsupervised, model-agnostic contrastive pre-training method for sequence-based models. cGen can be used before training to initialize weights, reducing the size of the dataset needed. It works through learning the intrinsic features of the reference genome and makes no assumptions on the underlying structure. We show that the embeddings produced by the unsupervised model are already informative for gene expression prediction and that the sequence features provide a meaningful clustering. We demonstrate that cGen improves model performance in various sequence-based deep learning applications, such as chromatin profiling prediction and gene expression. Our findings suggest that using cGen, particularly in areas constrained by data availability, could improve the performance of deep learning genomic models without the need to modify the model architecture.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Contrastive pre-training for sequence based genomics models.

Abstract

Talk to us

Similar Papers

More From: bioRxiv : the preprint server for biology

Lead the way for us

Similar Papers

An Unsupervised Learning-Based Regional Deformable Model for Automated Multi-Organ Contour Propagation.
Xiaokun Liang ... Yuming Jiang
Journal of digital imaging | VOL. 36
Xiaokun Liang, et. al.Xiaokun Liang ... Yuming Jiang
30 Jan 2023
Journal of digital imaging | VOL. 36

Deep transfer learning radiomics model based on temporal bone CT for assisting in the diagnosis of inner ear malformations
Xing Zhao ... Pu Dai
Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology, head, and neck surgery | VOL. 38
Xing Zhao, et. al.Xing Zhao ... Pu Dai
01 Jun 2024
Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology, head, and neck surgery | VOL. 38

Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
Justin Du ... Sanjay Aneja
Cancer Research | VOL. 81
Justin Du, et. al.Justin Du ... Sanjay Aneja
01 Jul 2021
Cancer Research | VOL. 81

Deep generative learning for automated EHR diagnosis of traditional Chinese medicine
Zhaohui Liang ... Jimmy Xiangji Huang
Computer Methods and Programs in Biomedicine | VOL. 174
Zhaohui Liang, et. al.Zhaohui Liang ... Jimmy Xiangji Huang
04 May 2018
Computer Methods and Programs in Biomedicine | VOL. 174

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Contrastive pre-training for sequence based genomics models.

Abstract

Talk to us

Similar Papers

More From: bioRxiv : the preprint server for biology