Year
Publisher
Journal
1
Institution
Institution Country
Publication Type
Field Of Study
Topics
Open Access
Language
Filter 1
Year
Publisher
Journal
1
Institution
Institution Country
Publication Type
Field Of Study
Topics
Open Access
Language
Filter 1
Export
Sort by: Relevance
dsRNAPredictor-II: An improved predictor of identifying dsRNA and its silencing efficiency for Tribolium castaneum based on sequence length distribution

RNA interference (RNAi) has been widely utilized to investigate gene functions and has significant potential for control of pest insects. However, recent studies have revealed that the target insect species, dsRNA molecule length, target genes, and other experimental factors can affect the efficiency of RNAi mediated control, restricting the further development and application of this technology. Therefore, the aim of this study was to establish a deep learning model using bioinformatics to help researchers identify dsRNA fragments with the highest RNAi efficiency. In this study, we optimized an existing model, namely, dsRNAPredictor, by designing sub-models based on different sequence lengths. Accordingly, the data were divided into two groups: 130–399 bp and 400–616 bp long sequences. Then, one-hot encoding was employed to extract sequence information. The convolutional neural network framework comprising three convolutional layers, three average pooling layers, a flattened layer, and three dense layers was employed as the classifier. By adjusting the parameters, we established two sub-models for different sequence distributions. Using multiple independent test datasets and conducting hypothesis testing, we demonstrated that our model exhibits superior performance and strong robustness to dsRNAPredictor, respectively. Therefore, our model may help design dsRNAs with pre-screening potential and facilitate further research and applications.

Read full abstract
Just Published
A heterogeneous graph transformer framework for accurate cancer driver gene prediction and downstream analysis

Accurately predicting cancer driver genes remains a formidable challenge amidst the burgeoning volume and intricacy of cancer genomic data. In this investigation, we propose HGTDG, an innovative heterogeneous graph transformer framework tailored for precisely predicting cancer driver genes and exploring downstream tasks. A heterogeneous graph construction module is central to the framework, which assembles a gene-protein heterogeneous network leveraging the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and protein-protein interactions sourced from the STRING (search tool for recurring instances of neighboring genes) database. Moreover, our framework introduces a pioneering heterogeneous graph transformer module, harnessing multi-head attention mechanisms for nuanced node embedding. This transformative module proficiently captures distinct representations for both nodes and edges, thereby enriching the model's predictive capacity. Subsequently, the generated node embeddings are seamlessly integrated into a classification module, facilitating the discrimination between driver and non-driver genes. Our experimental findings evince the superiority of HGTDG over existing methodologies, as evidenced by the enhanced performance metrics, including the area under the receiver operating characteristic curves (AUROC) and the area under the precision-recall curves (AUPRC). Furthermore, the downstream analysis utilizing the newly identified cancer driver genes underscores the efficacy and versatility of our proposed framework.

Read full abstract
Just Published
HistoSPACE: Histology-Inspired Spatial Transcriptome Prediction And Characterization Engine

Spatial transcriptomics (ST) enables the visualization of gene expression within the context of tissue morphology. This emerging discipline has the potential to serve as a foundation for developing tools to design precision medicines. However, due to the higher costs and expertise required for such experiments, its translation into a regular clinical practice might be challenging. Despite implementing modern deep learning to enhance information obtained from histological images using AI, efforts have been constrained by limitations in the diversity of information. In this paper, we developed a model, HistoSPACE, that explores the diversity of histological images available with ST data to extract molecular insights from tissue images. Further, our approach allows us to link the predicted expression with disease pathology. Our proposed study built an image encoder derived from a universal image autoencoder. This image encoder was connected to convolution blocks to build the final model. It was further fine-tuned with the help of ST-Data. The number of model parameters is small and requires lesser system memory and relatively lesser training time. Making it lightweight in comparison to traditional histological models. Our developed model demonstrates significant efficiency compared to contemporary algorithms, revealing a correlation of 0.56 in leave-one-out cross-validation. Finally, its robustness was validated through an independent dataset, showing similar prediction with predefined disease pathology. Our code is available at https://github.com/samrat-lab/HistoSPACE.

Read full abstract
Just Published
Gluconeogenesis unraveled: A proteomic Odyssey with machine learning

The metabolic pathway known as gluconeogenesis, which produces glucose from non-carbohydrate substrates, is essential for maintaining balanced blood sugar levels while fasting. It’s extremely important to anticipate gluconeogenesis rates accurately to recognize metabolic disorders and create efficient treatment strategies. The implementation of deep learning and machine learning methods to forecast complex biological processes has been gaining popularity in recent years. The recognition of both the regulation of the pathway and possible therapeutic applications of proteins depends on accurate identification associated with their gluconeogenesis patterns. This article analyzes the uses of machine learning and deep learning models, to predict gluconeogenesis efficiency. The study also discusses the challenges that come with restricted data availability and model interpretability, as well as possible applications in personalized healthcare, metabolic disease treatment, and the discovery of drugs. The predictor utilizes statistics moments on the structures of gluconeogenesis and their enzymes, while Random Forest is utilized as a classifier to ensure the accuracy of this model in identifying the best outcomes. The method was validated utilizing the independent test, self-consistency, 10 k fold cross-validations, and jackknife test which achieved 92.33 %, 91.87 %, 87.88 %, and 87.02 %. An accurate prediction of gluconeogenesis has significant implications for understanding metabolic disorders and developing targeted therapies. This study contributes to the rising field of predictive biology by mixing algorithms for deep learning, and machine learning, with metabolic pathways.

Read full abstract
Just Published
MVCLST: A spatial transcriptome data analysis pipeline for cell type classification based on multi-view comparative learning

Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integratingthegene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methodsin cell classification. In this work, we propose MVCLST, a multi-view comparative learningmethod to analyze spatial transcriptomicsdata for accurate cell type classification. MVCLSTconstructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to clusterthe learned featuresfor cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomicsdata analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. Italso outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulbdata.

Read full abstract
Just Published
In silico identification of Histone Deacetylase inhibitors using Streamlined Masked Transformer-based Pretrained features

Histone Deacetylases (HDACs) are enzymes that regulate gene expression by removing acetyl groups from histones. They are involved in various diseases, including neurodegenerative, cardiovascular, inflammatory, and metabolic disorders, as well as fibrosis in the liver, lungs, and kidneys. Successfully identifying potent HDAC inhibitors may offer a promising approach to treating these diseases. In addition to experimental techniques, researchers have introduced several in silico methods for identifying HDAC inhibitors. However, these existing computer-aided methods have shortcomings in their modeling stages, which limit their applications. In our study, we present a Streamlined Masked Transformer-based Pretrained (SMTP) encoder, which can be used to generate features for downstream tasks. The training process of the SMTP encoder was directed by masked attention-based learning, enhancing the model's generalizability in encoding molecules. The SMTP features were used to develop 11 classification models identifying 11 HDAC isoforms. We trained SMTP, a lightweight encoder, with only 1.9 million molecules, a smaller number than other known molecular encoders, yet its discriminant ability remains competitive. The results revealed that machine learning models developed using the SMTP feature set outperformed those developed using other feature sets in 8 out of 11 classification tasks. Additionally, chemical diversity analysis confirmed the encoder's effectiveness in distinguishing between two classes of molecules.

Read full abstract
Just Published
A roadmap to cysteine specific labeling of membrane proteins for single-molecule photobleaching studies

Single-molecule photobleaching analysis is a useful approach for quantifying reactive membrane protein oligomerization in membranes. It provides a binary readout of a fluorophore attached to a protein subunit at dilute conditions. However, quantification of protein stoichiometry from this data requires information about the subunit labeling yields and whether there is non-specific background labeling. Any increases in subunit-specific labeling improves the ability to determine oligomeric states with confidence. A common strategy for site-specific labeling is by conjugation of a fluorophore bearing a thiol-reactive maleimide group to a substituted cysteine. Yet, cysteine reactivity can be difficult to predict as it depends on many factors such as solvent accessibility and electrostatics from the surrounding protein structure. Here we report a general methodology for screening potential cysteine labeling sites on purified membrane proteins. We present the results of two example systems for which the dimerization reactions in membranes has been characterized: (1) the CLC-ec1 Cl-/H+ antiporter, an Escherichia coli homologue of voltage-gated chloride ion channels in humans and (2) a mutant form of a member of the family of fluoride channels Fluc from Bordetella pertussis (Fluc-Bpe-N43S). To demonstrate how we identify such sites, we first discuss considerations of residue positions hypothesized to be suitable and describe the specific steps to rigorously assess site-specific labeling while maintaining the functional activity and robust single-molecule fluorescence signals. We find that our initial, well rationalized choices are not strong predictors of success, as rigorous testing of the labeling sites shows that only ≈30 % of sites end up being useful for single-molecule photobleaching studies

Read full abstract
Open Access Just Published