Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Year Year arrow
arrow-active-down-0
Publisher Publisher arrow
arrow-active-down-1
Journal
1
Journal arrow
arrow-active-down-2
Institution Institution arrow
arrow-active-down-3
Institution Country Institution Country arrow
arrow-active-down-4
Publication Type Publication Type arrow
arrow-active-down-5
Field Of Study Field Of Study arrow
arrow-active-down-6
Topics Topics arrow
arrow-active-down-7
Open Access Open Access arrow
arrow-active-down-8
Language Language arrow
arrow-active-down-9
Filter Icon Filter 1
Export
Sort by: Relevance
  • Open Access Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1093/bfgp/elae050
A lossless reference-free sequence compression algorithm leveraging grammatical, statistical, and substitution rules.
  • Jan 8, 2025
  • Briefings in functional genomics
  • Subhankar Roy + 2 more

Deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) sequence compressors for novel species frequently face challenges when processing wide-scale raw, FASTA, or multi-FASTA structured data. For years, molecular sequence databases have favored the widely used general-purpose Gzip and Zstd compressors. The absence of sequence-specific characteristics in these encoders results in subpar performance, and their use depends on time-consuming parameter adjustments. To address these limitations, in this article, we propose a reference-free, lossless sequence compressor called GraSS (Grammatical, Statistical, and Substitution Rule-Based). GraSS compresses sequences more effectively by taking advantage of certain characteristics seen in DNA and RNA sequences. It supports various formats, including raw, FASTA, and multi-FASTA, commonly found in GenBank DNA and RNA files. We evaluate GraSS's performance using ten benchmark DNA sequences with reduced number of repeats, two highly repetitive RNA sequences, and fifteen raw DNA sequences. Test results indicate that the weighted average compression ratios (WACR) for DNA and RNA sequences are 4.5 and 19.6, respectively. Additionally, the entire DNA sequence corpus has a total compression time (TCT) of 246.8 seconds (s). These results demonstrate that the proposed compression method performs better than several advanced algorithms specifically designed to handle various levels of sequence redundancy. The decompression times, memory usage, and CPU usage are also very competitive. Contact: anirban@klyuniv.ac.in.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1093/bfgp/elae051
m6A RNA modification pathway: orchestrating fibrotic mechanisms across multiple organs.
  • Jan 5, 2025
  • Briefings in functional genomics
  • Xiangfei Huang + 7 more

Organ fibrosis, a common consequence of chronic tissue injury, presents a significant health challenge. Recent research has revealed the regulatory role of N6-methyladenosine (m6A) RNA modification in fibrosis of various organs, including the lung, liver, kidney, and heart. In this comprehensive review, we summarize the latest findings on the mechanisms and functions of m6A modification in organ fibrosis. By highlighting the potential of m6A modification as a therapeutic target, our goal is to encourage further research in this emerging field and support advancements in the clinical treatment of organ fibrosis.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1093/bfgp/elae049
Using artificial intelligence and statistics for managing peritoneal metastases from gastrointestinal cancers.
  • Dec 30, 2024
  • Briefings in functional genomics
  • Adam Wojtulewski + 4 more

The primary objective of this study is to investigate various applications of artificial intelligence (AI) and statistical methodologies for analyzing and managing peritoneal metastases (PM) caused by gastrointestinal cancers. Relevant keywords and search criteria were comprehensively researched on PubMed and Google Scholar to identify articles and reviews related to the topic. The AI approaches considered were conventional machine learning (ML) and deep learning (DL) models, and the relevant statistical approaches included biostatistics and logistic models. The systematic literature review yielded nearly 30 articles meeting the predefined criteria. Analyses of these studies showed that AI methodologies consistently outperformed traditional statistical approaches. In the AI approaches, DL consistently produced the most precise results, while classical ML demonstrated varied performance but maintained high predictive accuracy. The sample size was the recurring factor that increased the accuracy of the predictions for models of the same type. AI and statistical approaches can detect PM developing among patients with gastrointestinal cancers. Therefore, if clinicians integrated these approaches into diagnostics and prognostics, they could better analyze and manage PM, enhancing clinical decision-making and patients' outcomes. Collaboration across multiple institutions would also help in standardizing methods for data collection and allowing consistent results.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1093/bfgp/elae048
STLBRF: an improved random forest algorithm based on standardized-threshold for feature screening of gene expression data.
  • Dec 30, 2024
  • Briefings in functional genomics
  • Huini Feng + 4 more

When the traditional random forest (RF) algorithm is used to select feature elements in biostatistical data, a large amount of noise data and parameters can affect the importance of the selected feature elements, making the control of feature selection difficult. Therefore, it is a challenge for the traditional RF algorithm to preserve the accuracy of algorithm results in the presence of noise data. Generally, directly removing noise data can result in significant bias in the results. In this study, we develop a new algorithm, standardized threshold, and loops based random forest (STLBRF), and apply it to the field of gene expression data for feature gene selection. This algorithm, based on the traditional RF algorithm, combines backward elimination and K-fold cross-validation to construct a cyclic system and set a standardized threshold: error increment. The algorithm overcomes the shortcomings of existing gene selection methods. We compare ridge regression, lasso regression, elastic net regression, the traditional RF algorithm, and our improved RF algorithm using three real gene expression datasets and conducting a quantitative analysis. To ensure the reliability of the results, we validate the effectiveness of the genes selected by these methods using the Random Forest classifier. The results indicate that, compared to other methods, the STLBRF algorithm achieves not only higher effectiveness in feature gene selection but also better control over the number of selected genes. Our method offers reliable technical support for feature expression analysis and research on biomarker selection.

  • Research Article
  • 10.1093/bfgp/elae047
Discoveries by the genome profiling, symbolic powers of non-next generation sequencing methods.
  • Nov 27, 2024
  • Briefings in functional genomics
  • Koichi Nishigaki

Next-generation sequencing and other sequencing approaches have made significant progress in DNA analysis. However, there are indispensable advantages in the nonsequencing methods. They have their justifications such as being speedy, cost-effective, multi-applicable, and straightforward. Among the nonsequencing methods, the genome profiling method is worthy of reviewing because of its high potential. This article first reviews its basic properties, highlights the key concept of species identification dots (spiddos), and then summarizes its various applications.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 4
  • 10.1093/bfgp/elae046
SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition.
  • Nov 22, 2024
  • Briefings in functional genomics
  • Junxi Feng + 7 more

It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1093/bfgp/elae045
Less is more: relative rank is more informative than absolute abundance for compositional NGS data.
  • Nov 20, 2024
  • Briefings in functional genomics
  • Xubin Zheng + 10 more

High-throughput gene expression data have been extensively generated and utilized in biological mechanism investigations, biomarker detection, disease diagnosis and prognosis. These applications encompass not only bulk transcriptome, but also single cell RNA-seq data. However, extracting reliable biological information from transcriptome data remains challenging due to the constrains of Compositional Data Analysis. Current data preprocessing methods, including dataset normalization and batch effect correction, are insufficient to address these issues and improve data quality for downstream analysis. Alternatively, qualification methods focusing on the relative order of gene expression (ROGER) are more informative than the quantification methods that rely on gene expression abundance. The Pairwise Analysis of Gene expression method is an enhancement of ROGER, designed for data integration in either sample space or feature space. In this review, we summarize the methods applied to transcriptome data analysis and discuss their potentials in predicting clinical outcomes.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1093/bfgp/elae043
DeepMEns: an ensemble model for predicting sgRNA on-target activity based on multiple features.
  • Nov 11, 2024
  • Briefings in functional genomics
  • Shumei Ding + 2 more

The CRISPR/Cas9 system developed from Streptococcus pyogenes (SpCas9) has high potential in gene editing. However, its successful application is hindered by the considerable variability in target efficiencies across different single guide RNAs (sgRNAs). Although several deep learning models have been created to predict sgRNA on-target activity, the intrinsic mechanisms of these models are difficult to explain, and there is still scope for improvement in prediction performance. To overcome these issues, we propose an ensemble interpretable model termed DeepMEns based on deep learning to predict sgRNA on-target activity. By using five different training and validation datasets, we constructed five sub-regressors, each comprising three parts. The first part uses one-hot encoding, wherein 0-1 representation of the secondary structure is used as the input to the convolutional neural network (CNN) with Transformer encoder. The second part uses the DNA shape feature matrix as the input to the CNN with Transformer encoder. The third part uses positional encoding feature matrices as the proposed input into a long short-term memory network with an attention mechanism. These three parts are concatenated through the flattened layer, and the final prediction result is the average of the five sub-regressors. Extensive benchmarking experiments indicated that DeepMEns achieved the highest Spearman correlation coefficient for 6 of 10 independent test datasets as compared to previous predictors, this finding confirmed that DeepMEns can accomplish state-of-the-art performance. Moreover, the ablation analysis also indicated that the ensemble strategy may improve the performance of the prediction model.

  • Open Access Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1093/bfgp/elae044
Application of computational algorithms for single-cell RNA-seq and ATAC-seq in neurodegenerative diseases.
  • Nov 5, 2024
  • Briefings in functional genomics
  • Hwisoo Choi + 4 more

Recent advancements in single-cell technologies, including single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq), have greatly improved our insight into the epigenomic landscapes across various biological contexts and diseases. This paper reviews key computational tools and machine learning approaches that integrate scRNA-seq and scATAC-seq data to facilitate the alignment of transcriptomic data with chromatin accessibility profiles. Applying these integrated single-cell technologies in neurodegenerative diseases, such as Alzheimer's disease and Parkinson's disease, reveals how changes in chromatin accessibility and gene expression can illuminate pathogenic mechanisms and identify potential therapeutic targets. Despite facing challenges like data sparsity and computational demands, ongoing enhancements in scATAC-seq and scRNA-seq technologies, along with better analytical methods, continue to expand their applications. These advancements promise to revolutionize our approach to medical research and clinical diagnostics, offering a comprehensive view of cellular function and disease pathology.

  • Research Article
  • Cite Count Icon 1
  • 10.1093/bfgp/elae041
The frontier of precision medicine: application of single-cell multi-omics in preimplantation genetic diagnosis.
  • Nov 1, 2024
  • Briefings in functional genomics
  • Jinglei Zhang + 3 more

The advent of single-cell multi-omics technologies has revolutionized the landscape of preimplantation genetic diagnosis (PGD), offering unprecedented insights into the genetic, transcriptomic, and proteomic profiles of individual cells in early-stage embryos. This breakthrough holds the promise of enhancing the accuracy, efficiency, and scope of PGD, thereby significantly improving outcomes in assisted reproductive technologies (ARTs) and genetic disease prevention. This review provides a comprehensive overview of the importance of PGD in the context of precision medicine and elucidates how single-cell multi-omics technologies have transformed this field. We begin with a brief history of PGD, highlighting its evolution and application in detecting genetic disorders and facilitating ART. Subsequently, we delve into the principles, methodologies, and applications of single-cell genomics, transcriptomics, and proteomics in PGD, emphasizing their role in improving diagnostic precision and efficiency. Furthermore, we review significant recent advances within this domain, including key experimental designs, findings, and their implications for PGD practices. The advantages and limitations of these studies are analyzed to assess their potential impact on the future development of PGD technologies. Looking forward, we discuss the emerging research directions and challenges, focusing on technological advancements, new application areas, and strategies to overcome existing limitations. In conclusion, this review underscores the pivotal role of single-cell multi-omics in PGD, highlighting its potential to drive the progress of precision medicine and personalized treatment strategies, thereby marking a new era in reproductive genetics and healthcare.