Correction to: Classification of the human THAP protein family identifies an evolutionarily conserved coiled coil region

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 19
  • 10.1186/s12900-019-0102-2
Classification of the human THAP protein family identifies an evolutionarily conserved coiled coil region
  • Mar 5, 2019
  • BMC Structural Biology
  • Hiral M Sanghavi + 2 more

BackgroundThe THAP (Thanatos Associated Proteins) protein family in humans is implicated in various important cellular processes like epigenetic regulation, maintenance of pluripotency, transposition and disorders like cancers and hemophilia. The human THAP protein family which consists of twelve members of different lengths has a well characterized amino terminal, zinc-coordinating, DNA-binding domain called the THAP domain. However, the carboxy terminus of most THAP proteins is yet to be structurally characterized. A coiled coil region is known to help in protein oligomerization in THAP1 and THAP11. It is not known if other human THAP proteins oligomerize. We have used bioinformatic tools to explore the possibility of dimerization of THAP proteins via a coiled coil region.ResultsClassification of human THAP protein into three size based groups led to the identification of an evolutionarily conserved alpha helical region, downstream of the amino terminal THAP domain. Secondary structure predictions, alpha helical wheel plots and protein models demonstrated the strong possibility of coiled coil formation in this conserved, leucine rich region of all THAP proteins except THAP10.ConclusionsThe identification of a predicted oligomerization region in the human THAP protein family opens new directions to investigate the members of this protein family.

  • Research Article
  • Cite Count Icon 95
  • 10.1016/j.jmb.2006.04.024
From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase
  • Apr 25, 2006
  • Journal of Molecular Biology
  • Daniel Kuhn + 4 more

From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase

  • Research Article
  • Cite Count Icon 718
  • 10.1128/mcb.21.21.7117-7136.2001
Structural and evolutionary relationships among protein tyrosine phosphatase domains.
  • Nov 1, 2001
  • Molecular and Cellular Biology
  • Jannik N Andersen + 9 more

With the current access to the whole genomes of various organisms and the completion of the first draft of the human genome, there is a strong need for a structure-function classification of protein families as an initial step in moving from DNA databases to a comprehensive understanding of human biology. As a result of the explosion in nucleic acid sequence information and the concurrent development of methods for high-throughput functional characterization of gene products, the genomic revolution also promises to provide a new paradigm for drug discovery, enabling the identification of molecular drug targets in a significant number of human diseases. This molecular view of diseases has contributed to the importance of combining primary sequence data with three-dimensional structure and has increased the awareness of computational homology modeling and its potential to elucidate protein function. In particular, when important proteins or novel therapeutic targets are identified—like the family of protein tyrosine phosphatases (PTPs) (reviewed in reference 53)—a structure-function classification of such protein families becomes an invaluable framework for further advances in biomedical science. Here, we present a comparative analysis of the structural relationships among vertebrate PTP domains and provide a comprehensive resource for sequence analysis of phosphotyrosine-specific PTPs.

  • Research Article
  • Cite Count Icon 20
  • 10.1371/journal.pone.0258625
An improved deep learning model for hierarchical classification of protein families.
  • Oct 20, 2021
  • PLOS ONE
  • Pahalage Dhanushka Sandaruwan + 1 more

Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.

  • Research Article
  • Cite Count Icon 54
  • 10.1007/s004220050357
Classification of protein families and detection of the determinant residues with an improved self-organizing map.
  • Jul 22, 1997
  • Biological cybernetics
  • Miguel A Andrade + 3 more

Using a SOM (self-organizing map) we can classify sequences within a protein family into subgroups that generally correspond to biological subcategories. These maps tend to show sequence similarity as proximity in the map. Combining maps generated at different levels of resolution, the structure of relations in protein families can be captured that could not otherwise be represented in a single map. The underlying representation of maps enables us to retrieve characteristic sequence patterns for individual subgroups of sequences. Such patterns tend to correspond to functionally important regions. We present a modified SOM algorithm that includes a convergence test that dynamically controls the learning parameters to adapt them to the learning set instead of being fixed and externally optimized by trial and error. Given the variability of protein family size and distribution, the addition of this features is necessary. The method is successfully tested with a number of families. The rab family of small GTPases is used to illustrate the performance of the method.

  • Research Article
  • Cite Count Icon 59
  • 10.1093/nar/gks1157
HAMAP in 2013, new developments in the protein family classification and annotation system
  • Nov 26, 2012
  • Nucleic Acids Research
  • Ivo Pedruzzi + 12 more

HAMAP (High-quality Automated and Manual Annotation of Proteins—available at http://hamap.expasy.org/) is a system for the classification and annotation of protein sequences. It consists of a collection of manually curated family profiles for protein classification, and associated annotation rules that specify annotations that apply to family members. HAMAP was originally developed to support the manual curation of UniProtKB/Swiss-Prot records describing microbial proteins. Here we describe new developments in HAMAP, including the extension of HAMAP to eukaryotic proteins, the use of HAMAP in the automated annotation of UniProtKB/TrEMBL, providing high-quality annotation for millions of protein sequences, and the future integration of HAMAP into a unified system for UniProtKB annotation, UniRule. HAMAP is continuously updated by expert curators with new family profiles and annotation rules as new protein families are characterized. The collection of HAMAP family classification profiles and annotation rules can be browsed and viewed on the HAMAP website, which also provides an interface to scan user sequences against HAMAP profiles.

  • Research Article
  • Cite Count Icon 29
  • 10.1109/tcbb.2020.2966633
Protein Family Classification from Scratch: A CNN Based Deep Learning Approach.
  • Jan 16, 2020
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • Da Zhang + 1 more

Next-generation sequencing techniques provide us with an opportunity for generating sequenced proteins and identifying the biological families and functions of these proteins. However, compared with identified proteins, uncharacterized proteins consist of a notable percentage of the overall proteins in the bioinformatics research field. Traditional family classification methods often devote themselves to extracting N-Gram features from sequences while ignoring motif information as well as affinity information between motifs and adjacent amino acids. Previous clustering-based algorithms have typically been used to define protein features with domain knowledge and annotate protein families based on extensive data samples. In this paper, we apply CNN based amino acid representation learning with limited characterized proteins to explore the performances of annotated protein families by taking into account the amino acid location information. Additionally, we apply the method to all reviewed protein sequences with their families retrieved from the UniProt database to evaluate our approach. Last but not least, we verify our model using those unreviewed protein records, which is typically ignored by other methods.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/iacc.2016.57
Understanding of Genetic Code Degeneracy and New Way of Classifying of Protein Family: A Mathematical Approach
  • Feb 1, 2016
  • Jayanta Kumar Das + 3 more

The genetic code is the set of rules by which information encoded in genetic material (DNA or RNA sequences) is translated into proteins (amino acid sequences) by living cells. The code defines a mapping between tri-nucleotide sequences, called codons, and amino acids. Since there are 20 amino acids and 64 possible tri-nucleotide sequences, more than one among these 64 triplets can code for a single amino acid which incorporates the problem of degeneracy. This manuscript explains the underlying logic of degeneracy of genetic code based on a mathematical point of view using a parameter named Impression. Classification of protein family is also a long standing problem in the field of Bio-chemistry and Genomics. Proteins belonging to a particular class have some similar bio-chemical properties which are of utmost importance for new drug design. Using the same parameter Impression and using graph theoretic properties we have also devised a new way of classifying a protein family.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1016/s0076-6879(05)03002-8
Use of Search Algorithms to Define Specificity in Rab GTPase Domain Function
  • Jan 1, 2005
  • Methods in Enzymology
  • Maria Nussbaum + 1 more

Use of Search Algorithms to Define Specificity in Rab GTPase Domain Function

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 44
  • 10.1038/srep42571
Phylogenetic analysis of eukaryotic NEET proteins uncovers a link between a key gene duplication event and the evolution of vertebrates
  • Feb 16, 2017
  • Scientific Reports
  • Madhuri A Inupakutika + 7 more

NEET proteins belong to a unique family of iron-sulfur proteins in which the 2Fe-2S cluster is coordinated by a CDGSH domain that is followed by the “NEET” motif. They are involved in the regulation of iron and reactive oxygen metabolism, and have been associated with the progression of diabetes, cancer, aging and neurodegenerative diseases. Despite their important biological functions, the evolution and diversification of eukaryotic NEET proteins are largely unknown. Here we used the three members of the human NEET protein family (CISD1, mitoNEET; CISD2, NAF-1 or Miner 1; and CISD3, Miner2) as our guides to conduct a phylogenetic analysis of eukaryotic NEET proteins and their evolution. Our findings identified the slime mold Dictyostelium discoideum’s CISD proteins as the closest to the ancient archetype of eukaryotic NEET proteins. We further identified CISD3 homologs in fungi that were previously reported not to contain any NEET proteins, and revealed that plants lack homolog(s) of CISD3. Furthermore, our study suggests that the mammalian NEET proteins, mitoNEET (CISD1) and NAF-1 (CISD2), emerged via gene duplication around the origin of vertebrates. Our findings provide new insights into the classification and expansion of the NEET protein family, as well as offer clues to the diverged functions of the human mitoNEET and NAF-1 proteins.

  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/11532323_20
Sequence Motif Identification and Protein Family Classification Using Probabilistic Trees
  • Jan 1, 2005
  • Florencia Leonardi + 1 more

Efficient family classification of newly discovered protein sequences is a central problem in bioinformatics. We present a new algorithm, using Probabilistic Suffix Trees, which identifies equivalences between the amino acids in different positions of a motif for each family. We also show that better classification can be achieved identifying representative fingerprints in the amino acid chains.

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s11033-023-09064-2
Real-time expression and in silico characterization of pea genes involved in salt and water-deficit stress.
  • Dec 15, 2023
  • Molecular biology reports
  • Muhammad Farooq + 8 more

To tolerate salt and water-deficit stress, the plant adapts to the adverse environment by regulating its metabolism and expressing certain stress-induced metabolic pathways. This research analyzed the relative expression of four pea genes (P5CR, PAL1, SOD, and POX) in three pea varieties (Climax, Green grass, and Meteor) under different levels of salt and water-deficit stress. The experiments on salt stress and water-deficit stress were carried out within greenhouse settings under controlled environment. The saturation percentage was employed to create artificial salinity conditions: Control without NaCl treatment, Treatment 1: 50 mM NaCl treatment, Treatment 2: 75 mM NaCl treatment, and Treatment 3: 100 mM NaCl treatment. Field capacity (FC) was used for the development of artificial water-deficit treatments in the pots, i.e., Treatment 1 (Control; water application 100% of FC), Treatment 2 (water application 75% of FC), and Treatment 3 (water application 50% of FC). Pea genes involved in biosynthetic pathways of proline, flavonoids, and enzymatic antioxidant enzymes including P5CR, PAL1, SOD, and POX were selected based on literature. Quantitative real-time PCR using cDNA as a template was used to analyze the gene expression. Pea genes were analyzed for phylogenetic analysis in closely related crops having similarity percent identity 80% and above. In silico characterization of selected proteins including the family classification was done by the NCBI CDD and INTERPRO online servers. Results from RT-qPCR analysis showed increased expression of P5CR, PAL1, and POX genes, while SOD gene expression decreased under both stresses. Climax exhibited superior stress tolerance with elevated expression of P5CR and PAL1, while Meteor showed better tolerance through increased POX expression. Phylogenetic analysis revealed common ancestry with other species like chickpea, red clover, mung bean, and barrel clover, suggesting the cross relationship among these plant species. Conserved domain analysis of respective proteins revealed that these proteins contain PLNO 2688, PLN02457, Cu-Zn Superoxide dismutase, and secretory peroxidase conserved domains. Furthermore, protein family classification indicated that the oxidation-reduction process is the most common chemical process involved in these stresses given to pea plant which validates the relationship of these proteins. Salt and water-deficit stresses trigger distinct metabolic pathways, leading to the up-regulation of specific genes and the synthesis of corresponding proteins. These findings further emphasize the conservation of stress-tolerance-related genes and proteins across various plant species. This knowledge enhances our understanding of plant adaptation to stress and offers opportunities for developing strategies to improve stress resilience in crops, thereby addressing global food security challenges.

  • Book Chapter
  • 10.1201/9781003527442-18
Classification of Protein Family from the Protein Sequence Using Machine Learning
  • Jun 20, 2024
  • R Priyanka + 1 more

Classification of Protein Family from the Protein Sequence Using Machine Learning

  • Research Article
  • Cite Count Icon 316
  • 10.1093/nar/gkt1208
MMDB and VAST+: tracking structural similarities between macromolecular complexes
  • Dec 6, 2013
  • Nucleic Acids Research
  • Thomas Madej + 6 more

The computational detection of similarities between protein 3D structures has become an indispensable tool for the detection of homologous relationships, the classification of protein families and functional inference. Consequently, numerous algorithms have been developed that facilitate structure comparison, including rapid searches against a steadily growing collection of protein structures. To this end, NCBI’s Molecular Modeling Database (MMDB), which is based on the Protein Data Bank (PDB), maintains a comprehensive and up-to-date archive of protein structure similarities computed with the Vector Alignment Search Tool (VAST). These similarities have been recorded on the level of single proteins and protein domains, comprising in excess of 1.5 billion pairwise alignments. Here we present VAST+, an extension to the existing VAST service, which summarizes and presents structural similarity on the level of biological assemblies or macromolecular complexes. VAST+ simplifies structure neighboring results and shows, for macromolecular complexes tracked in MMDB, lists of similar complexes ranked by the extent of similarity. VAST+ replaces the previous VAST service as the default presentation of structure neighboring data in NCBI’s Entrez query and retrieval system. MMDB and VAST+ can be accessed via http://www.ncbi.nlm.nih.gov/Structure.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.3390/ijms24043507
Structural and Functional Annotation of Hypothetical Proteins from the Microsporidia Species Vittaforma corneae ATCC 50505 Using in silico Approaches
  • Feb 9, 2023
  • International Journal of Molecular Sciences
  • Lilian Mbaisi Ang’Ang’O + 2 more

Microsporidia are spore-forming eukaryotes that are related to fungi but have unique traits that set them apart. They have compact genomes as a result of evolutionary gene loss associated with their complete dependency on hosts for survival. Despite having a relatively small number of genes, a disproportionately high percentage of the genes in microsporidia genomes code for proteins whose functions remain unknown (hypothetical proteins—HPs). Computational annotation of HPs has become a more efficient and cost-effective alternative to experimental investigation. This research developed a robust bioinformatics annotation pipeline of HPs from Vittaforma corneae, a clinically important microsporidian that causes ocular infections in immunocompromised individuals. Here, we describe various steps to retrieve sequences and homologs and to carry out physicochemical characterization, protein family classification, identification of motifs and domains, protein–protein interaction network analysis, and homology modelling using a variety of online resources. Classification of protein families produced consistent findings across platforms, demonstrating the accuracy of annotation utilizing in silico methods. A total of 162 out of 2034 HPs were fully annotated, with the bulk of them categorized as binding proteins, enzymes, or regulatory proteins. The protein functions of several HPs from Vittaforma corneae were accurately inferred. This improved our understanding of microsporidian HPs despite challenges related to the obligate nature of microsporidia, the absence of fully characterized genes, and the lack of homologous genes in other systems.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface