Biological Sequences Research Articles

Bioinformatics and Computational Biology are disciplines that have used GPUs for over two decades to accelerate data processing in computational applications, where CUDA has been the most utilized programming language. However, the exclusive use of CUDA presents a portability issue, as it is only compatible with NVIDIA GPUs and not with other heterogeneous architectures, such as AMD or Intel GPUs, or any other type of accelerator. To address this portability challenge, the Khronos Group recently introduced the SYCL standard, a multi-platform programming model that offers a high-level programming interface. This standard facilitates the development of portable applications that can efficiently leverage the capabilities of different hardware devices, such as NVIDIA, Intel, and AMD GPUs, without needing significant modifications to the source code. Therefore, this thesis set out as its general objective to evaluate the feasibility of SYCL as a unified, portable, and efficient heterogeneous programming model for the design and development of computationally demanding applications on heterogeneous GPU-based systems, specifically in the field of bioinformatics. Initially, a detailed investigation was conducted about heterogeneous programming models, performance metrics, and bioinformatics concepts, in order to establish the theoretical foundations of this thesis. Then, the SW# suite was chosen as the case study, as it represents a clear example of a CUDA-based bioinformatics application for biological sequence alignment. Using the SYCLomatic tool, a complete migration of the CUDA code to SYCL was carried out, which involved modifying the generated code and solving runtime errors. Additionally, the functionality was verified, optimizations were applied, and the resulting SYCL code was standardized to be compatible with other SYCL implementations. Subsequently, multiple experiments were conducted to evaluate the functionality and performance portability of the software migrated to SYCL. These experiments involved running the application on a wide variety of HPC platforms, including different CPUs and GPUs from various manufacturers. The results obtained demonstrated performance comparable to CUDA in most configurations, confirming the effectiveness of SYCL. Moreover, good performance portability across platforms was observed due to SYCL's ability to run on various hardware combinations. Furthermore, performance remained consistent when switching SYCL implementations. In conclusion, this study demonstrates that SYCL is a viable alternative as a unified, portable, and efficient programming model in the context of heterogeneous computing with GPUs for bioinformatics applications. The findings of this study lay the groundwork for the transition of legacy applications and the development of new solutions leveraging the capabilities of SYCL.

Read full abstract

Introduction/Background: Deep neural networks have shown great promise in advancing drug discovery and precision medicine. By leveraging large amounts of complex biomedical and chemical data, deep learning approaches can identify novel targets, predict drug-target and drug-drug interactions, generate new molecular structures, and assist in personalized treatment selection and development. However, fully utilizing deep learning techniques for optimization across the drug development pipeline remains an ongoing challenge. Materials and Methods: A comprehensive literature review was conducted using major bibliographic databases including PubMed, Web of Science, and Scopus. Search terms included combinations of "deep learning", "drug discovery", "precision medicine", "biomedical data", and "neural networks". Over 200 papers published between 2010-2023 related to deep learning applications in pharmacology and genomics were identified and reviewed. Results: Deep learning has been widely applied at various stages of the drug discovery process including target identification/prioritization, lead generation/optimization, and prediction of molecular properties. Convolutional neural networks are commonly used for the representation and classification of biological sequence and image data for tasks such as gene expression analysis and pathogen detection from microscopy images. Graph neural networks effectively model compound structures and interactome networks to predict molecular bindings and disease associations. Multi-modal neural networks integrate diverse data types for personalized treatment response prediction and biomarker discovery. Challenges remain around data and model interpretation, generalization to new targets/diseases, and integration across domains. Discussion: While deep learning has shown promise, rigorous benchmarking and validation on real-world clinical endpoints are still needed to establish usefulness in decision-making. Data and model transparency must be improved to enable scientific insights. Privacy and security risks accompanying "real world" biomedical big data will require ethical practices. Standardization and sharing of resources/protocols could accelerate progress by enabling comparison of techniques. Combining deep learning with other AI paradigms like causal inference may further improve utility in drug discovery and precision healthcare. Conclusion: Deep neural networks demonstrate potential for optimizing drug development and precision medicine applications. Continued advancement relies on addressing challenges around data, models, validation, and ethics. Multi-disciplinary collaborations integrating machine learning, molecular biology, medicine, and other domains are needed to fully realize benefits to patients.

Read full abstract

Biological Sequences Research Articles

Articles published on Biological Sequences

Elliptic geometry-based kernel matrix for improved biological sequence classification

Density estimation for ordinal biological sequences and its applications

MMLmiRLocNet: miRNA Subcellular Localization Prediction based on Multi-view Multi-label Learning for Drug Design.

PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites.

Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.

Viability Study of SYCL as a Unified Programming Model for Heterogeneous Systems Based on GPUs in Bioinformatics

Transfer learning Bayesian optimization for competitor DNA molecule design for use in diagnostic assays.

Eight novel diagnostic markers differentiate lineages of the highly invasive myrtle rust pathogen Austropuccinia psidii.

Mathematics behind the identifying CpG islands

DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification

AnnoDUF: A Web-Based Tool for Annotating Functions of Proteins Having Domains of Unknown Function.

SWQC: Efficient sequencing data quality control on the next-generation sunway platform

Evaluating deep neural networks in optimizing drug discovery and precision medicine: A review

Optimized Spectral Clustering Methods For Potentially Divergent Biological Sequences

Artificial intelligence-guided strategies for next-generation biological sequence design

Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological data.

BAD2matrix: Phylogenomic matrix concatenation, indel coding, and more

Graph-based analysis of DNA sequence comparison in closed cotton species: A generalized method to unveil genetic connections.

Distinguishing word identity and sequence context in DNA language models

Sequencing of Targeted Therapy in Psoriasis: Does it Matter?

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biological Sequences Research Articles

Articles published on Biological Sequences

Elliptic geometry-based kernel matrix for improved biological sequence classification

Density estimation for ordinal biological sequences and its applications

MMLmiRLocNet: miRNA Subcellular Localization Prediction based on Multi-view Multi-label Learning for Drug Design.

PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites.

Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.

Viability Study of SYCL as a Unified Programming Model for Heterogeneous Systems Based on GPUs in Bioinformatics

Transfer learning Bayesian optimization for competitor DNA molecule design for use in diagnostic assays.

Eight novel diagnostic markers differentiate lineages of the highly invasive myrtle rust pathogen Austropuccinia psidii.

Mathematics behind the identifying CpG islands

DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification

AnnoDUF: A Web-Based Tool for Annotating Functions of Proteins Having Domains of Unknown Function.

SWQC: Efficient sequencing data quality control on the next-generation sunway platform

Evaluating deep neural networks in optimizing drug discovery and precision medicine: A review

Optimized Spectral Clustering Methods For Potentially Divergent Biological Sequences

Artificial intelligence-guided strategies for next-generation biological sequence design

Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological data.

BAD2matrix: Phylogenomic matrix concatenation, indel coding, and more

Graph-based analysis of DNA sequence comparison in closed cotton species: A generalized method to unveil genetic connections.

Distinguishing word identity and sequence context in DNA language models

Sequencing of Targeted Therapy in Psoriasis: Does it Matter?