Biological Sequences Research Articles

Chaos game representation (CGR) has been successfully applied in bioinformatics for over 30 years. Since then, many further extensions were announced. Numerical encoding of biological sequences is especially convenient in the visualisation process, free-alignment methods and input preparation for machine learning techniques. The development and applications of CGR have embraced mainly linear nucleotide sequences. However, there were also some attempts to create a representation of proteins. The latter need to be more sophisticated, as arbitrary coordinates for amino acids do not reflect their properties which is crucial during the encoding process. In this paper, the authors summarised various variations of CGRs and their limitations. We began by studying the PROSITE motifs and showed the immense number of amino acid properties employed by different proteins. To this aim, we harnessed the Principal Component Analysis (PCA) and studied the relation between explained variance and the number of features that describe them. It appeared that even after many reductions, about 50 features are non-redundant. This was the reason we introduced an embedding concept from natural language processing which enables adjusting features for a given list of sequences. We presented a simple neural network architecture with one hidden layer and one neuron within it and showed it provides satisfactory results in phylogenetic tree construction in ND5 and SPARC protein cases. To this aim, we transformed CGR representations for all considered sequences using Discrete Fourier Transform (DFT) and applied Unweighted Pair Group Method with Arithmetic Mean (UPGMA) algorithm. Moreover, we indicated some similarities between CGR and Recurrent Neural Networks (RNN). In the end, we attempted to include information about the RNA secondary structure and defined some measures to validate biological significance. We studied their properties and showed on ALMV-3 example its usefulness.

Read full abstract

BackgroundMerino sheep exhibit high wool production and excellent wool quality. The fleece of Merino sheep is predominantly composed of wool fibers grown from hair follicles (HFs). The HF is a complex biological system involved in a dynamic process governed by gene regulation, and gene expression is regulated by microRNAs (miRNAs). miRNA inhibits posttranscriptional gene expression by specifically binding to target messenger RNA (mRNA) and plays an important role in regulating gene expression, the cell cycle and biological development sequences. The purpose of this study was to examine mRNA and miRNA binding to identify key miRNAs and target genes related to HF development. This will provide new and important insights into fundamental mechanisms that regulate cellular activity and cell fate decisions within and outside of the skin.ResultsWe analyzed miRNA data in skin tissues collected from 18 Merino sheep on four embryonic days (E65, E85, E105 and E135) and two postnatal days (D7 and D30) and identified 87 differentially expressed miRNAs (DE-miRNAs). These six stages were further divided into two longer developmental stages based on heatmap cluster analysis, and the results showed that DE-mRNAs in Stage A were closely related to HF morphogenesis. A coanalysis of Stage A DE-mRNAs and DE-miRNAs revealed that 9 DE-miRNAs and 17 DE-mRNAs presented targeting relationships in Stage A. We found that miR-23b and miR-133 could target and regulate ACVR1B and WNT10A. In dermal fibroblasts, the overexpression of miR-133 significantly reduced the mRNA and protein expression levels of ACVR1B. The overexpression of miR-23b significantly reduced the mRNA and protein expression levels of WNT10A.ConclusionThis study provides a new reference for understanding the molecular basis of HF development and lays a foundation for further improving sheep HF breeding. miRNAs and target genes related to hair follicular development were found, which provided a theoretical basis for molecular breeding for the culture of fine-wool sheep.

Read full abstract

Biological Sequences Research Articles

Articles published on Biological Sequences

Growth Factors: Key Biological Mediators Involved in Periodontal Regeneration

Remediation of arsenic-containing ferrihydrite in soil using iron- and sulfate-reducing bacteria: Implications for microbially-assisted clean technology

Management of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb.

Multifarious aspects of the chaos game representation and its applications in biological sequence analysis

Integrated analysis of miRNAs and mRNA profiling reveals the potential roles of miRNAs in sheep hair follicle development

Protein language models trained on multiple sequence alignments learn phylogenetic relationships

The dynseq browser track shows context-specific features at nucleotide resolution.

Relation is an option for processing context information.

Function-based classification of hazardous biological sequences: Demonstration of a new paradigm for biohazard assessments.

SODA: a TypeScript/JavaScript library for visualizing biological sequence annotation.

E. coli 6S RNA complexed to RNA polymerase maintains product RNA synthesis at low cellular ATP levels by initiation with noncanonical initiator nucleotides.

Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy

Improving language model of human genome for DNA-protein binding prediction based on task-specific pre-training.

ENA Source Attribute Helper: An Application Programming Interface to facilitate accurate reference to biological source data

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.

Morganella Phage Mecenats66 Utilizes an Evolutionarily Distinct Subtype of Headful Genome Packaging with a Preferred Packaging Initiation Site.

Integrating temporal single-cell gene expression modalities for trajectory inference and disease prediction

A survey on improving pattern matching algorithms for biological sequences

Developmental Trajectories of Student Self-Perception over a Yearlong Introductory Biology Sequence.

SEMgraph: an R package for causal network inference of high-throughput data with structural equation models.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biological Sequences Research Articles

Articles published on Biological Sequences

Growth Factors: Key Biological Mediators Involved in Periodontal Regeneration

Remediation of arsenic-containing ferrihydrite in soil using iron- and sulfate-reducing bacteria: Implications for microbially-assisted clean technology

Management of DNA reference libraries for barcoding and metabarcoding studies with the R package refdb.

Multifarious aspects of the chaos game representation and its applications in biological sequence analysis

Integrated analysis of miRNAs and mRNA profiling reveals the potential roles of miRNAs in sheep hair follicle development

Protein language models trained on multiple sequence alignments learn phylogenetic relationships

The dynseq browser track shows context-specific features at nucleotide resolution.

Relation is an option for processing context information.

Function-based classification of hazardous biological sequences: Demonstration of a new paradigm for biohazard assessments.

SODA: a TypeScript/JavaScript library for visualizing biological sequence annotation.

E. coli 6S RNA complexed to RNA polymerase maintains product RNA synthesis at low cellular ATP levels by initiation with noncanonical initiator nucleotides.

Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy

Improving language model of human genome for DNA-protein binding prediction based on task-specific pre-training.

ENA Source Attribute Helper: An Application Programming Interface to facilitate accurate reference to biological source data

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer.

Morganella Phage Mecenats66 Utilizes an Evolutionarily Distinct Subtype of Headful Genome Packaging with a Preferred Packaging Initiation Site.

Integrating temporal single-cell gene expression modalities for trajectory inference and disease prediction

A survey on improving pattern matching algorithms for biological sequences

Developmental Trajectories of Student Self-Perception over a Yearlong Introductory Biology Sequence.

SEMgraph: an R package for causal network inference of high-throughput data with structural equation models.