Year
Publisher
Journal
1
Institution
Institution Country
Publication Type
Field Of Study
Topics
Open Access
Language
Filter 1
Year
Publisher
Journal
1
Institution
Institution Country
Publication Type
Field Of Study
Topics
Open Access
Language
Filter 1
Export
Sort by: Relevance
dsRNAPredictor-II: An improved predictor of identifying dsRNA and its silencing efficiency for Tribolium castaneum based on sequence length distribution

RNA interference (RNAi) has been widely utilized to investigate gene functions and has significant potential for control of pest insects. However, recent studies have revealed that the target insect species, dsRNA molecule length, target genes, and other experimental factors can affect the efficiency of RNAi mediated control, restricting the further development and application of this technology. Therefore, the aim of this study was to establish a deep learning model using bioinformatics to help researchers identify dsRNA fragments with the highest RNAi efficiency. In this study, we optimized an existing model, namely, dsRNAPredictor, by designing sub-models based on different sequence lengths. Accordingly, the data were divided into two groups: 130–399 bp and 400–616 bp long sequences. Then, one-hot encoding was employed to extract sequence information. The convolutional neural network framework comprising three convolutional layers, three average pooling layers, a flattened layer, and three dense layers was employed as the classifier. By adjusting the parameters, we established two sub-models for different sequence distributions. Using multiple independent test datasets and conducting hypothesis testing, we demonstrated that our model exhibits superior performance and strong robustness to dsRNAPredictor, respectively. Therefore, our model may help design dsRNAs with pre-screening potential and facilitate further research and applications.

Read full abstract
Just Published
A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis

Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.

Read full abstract
Just Published
Gluconeogenesis unraveled: A proteomic Odyssey with machine learning

The metabolic pathway known as gluconeogenesis, which produces glucose from non-carbohydrate substrates, is essential for maintaining balanced blood sugar levels while fasting. It’s extremely important to anticipate gluconeogenesis rates accurately to recognize metabolic disorders and create efficient treatment strategies. The implementation of deep learning and machine learning methods to forecast complex biological processes has been gaining popularity in recent years. The recognition of both the regulation of the pathway and possible therapeutic applications of proteins depends on accurate identification associated with their gluconeogenesis patterns. This article analyzes the uses of machine learning and deep learning models, to predict gluconeogenesis efficiency. The study also discusses the challenges that come with restricted data availability and model interpretability, as well as possible applications in personalized healthcare, metabolic disease treatment, and the discovery of drugs. The predictor utilizes statistics moments on the structures of gluconeogenesis and their enzymes, while Random Forest is utilized as a classifier to ensure the accuracy of this model in identifying the best outcomes. The method was validated utilizing the independent test, self-consistency, 10 k fold cross-validations, and jackknife test which achieved 92.33 %, 91.87 %, 87.88 %, and 87.02 %. An accurate prediction of gluconeogenesis has significant implications for understanding metabolic disorders and developing targeted therapies. This study contributes to the rising field of predictive biology by mixing algorithms for deep learning, and machine learning, with metabolic pathways.

Read full abstract
Just Published
MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning

Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integratingthegene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methodsin cell classification. In this work, we propose MVCLST, a multi-view comparative learningmethod to analyze spatial transcriptomicsdata for accurate cell type classification. MVCLSTconstructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to clusterthe learned featuresfor cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomicsdata analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. Italso outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulbdata.

Read full abstract
Just Published
Cleaving the way for heterologous peptide production: An overview of cleavage strategies

One of the main bottlenecks for recombinant peptide production is choosing the proper cleavage method to remove fusion protein tags from target peptides. While these tags are crucial for inhibiting the activity of the target peptide during heterologous expression, incorporating a cleavage site is essential for their later removal, ensuring the pure sequencing of the peptide. This review evaluates different cleavage methods, including protease-mediated, self-cleavable protein, and chemical-mediated sites, regarding their advantages and limitations. For instance, intein, Npro EDDIE, enterokinase, factor Xa, SUMO, and CNBr are options for residue-free cleavage. Although protease-mediated cleavage is widely used, it can be expensive, due to its own cost added to the whole process. As an alternative, self-cleavable sites eliminate the requirement for proteinases. Another crucial step in defining the proper cleavage method is cost consideration, which relates to the purpose of peptide production. Here, we explore a range of cleavage approaches, meeting the needs of both cost-constrained applications and a more flexible budget. Overall, selecting the most suitable cleavage method should be based on careful consideration of toxicity, cost, accuracy, and specific application requirements to ensure a state-of-the-art approach.

Read full abstract
Just Published
In silico identification of Histone Deacetylase inhibitors using Streamlined Masked Transformer-based Pretrained features

Histone Deacetylases (HDACs) are enzymes that regulate gene expression by removing acetyl groups from histones. They are involved in various diseases, including neurodegenerative, cardiovascular, inflammatory, and metabolic disorders, as well as fibrosis in the liver, lungs, and kidneys. Successfully identifying potent HDAC inhibitors may offer a promising approach to treating these diseases. In addition to experimental techniques, researchers have introduced several in silico methods for identifying HDAC inhibitors. However, these existing computer-aided methods have shortcomings in their modeling stages, which limit their applications. In our study, we present a Streamlined Masked Transformer-based Pretrained (SMTP) encoder, which can be used to generate features for downstream tasks. The training process of the SMTP encoder was directed by masked attention-based learning, enhancing the model's generalizability in encoding molecules. The SMTP features were used to develop 11 classification models identifying 11 HDAC isoforms. We trained SMTP, a lightweight encoder, with only 1.9 million molecules, a smaller number than other known molecular encoders, yet its discriminant ability remains competitive. The results revealed that machine learning models developed using the SMTP feature set outperformed those developed using other feature sets in 8 out of 11 classification tasks. Additionally, chemical diversity analysis confirmed the encoder's effectiveness in distinguishing between two classes of molecules.

Read full abstract
Just Published