Abstract

BackgroundStudying genetic variation distribution in proteins containing charged regions, called charge clusters (CCs), is of great interest to unravel their functional role. Charge clusters are 20 to 75 residue segments with high net positive charge, high net negative charge, or high total charge relative to the overall charge composition of the protein. We previously developed a bioinformatics tool (FCCP) to detect charge clusters in proteomes and scanned the human proteome for the occurrence of CCs. In this paper we investigate the genetic variations in the human proteins harbouring CCs.ResultsWe studied the coding regions of 317 positively charged clusters and 1020 negatively charged ones previously detected in human proteins. Results revealed that coding parts of CCs are richer in sequence variants than their corresponding genes, full mRNAs, and exonic + intronic sequences and that these variants are predominately rare (Minor allele frequency < 0.005). Furthermore, variants occurring in the coding parts of positively charged regions of proteins are more often pathogenic than those occurring in negatively charged ones. Classification of variants according to their types showed that substitution is the major type followed by Indels (Insertions-deletions). Concerning substitutions, it was found that within clusters of both charges, the charged amino acids were the greatest loser groups whereas polar residues were the greatest gainers.ConclusionsOur findings highlight the prominent features of the human charged regions from the DNA up to the protein sequence which might provide potential clues to improve the current understanding of those charged regions and their implication in the emergence of diseases.

Highlights

  • Studying genetic variation distribution in proteins containing charged regions, called charge clusters (CCs), is of great interest to unravel their functional role

  • We found that the distributions of Negative Charge Cluster (NCC) and Positive Charge Cluster (PCC) in human genes are similar

  • Variant distributions The CCs encoding sequences were found to be significantly more exposed to variants than their corresponding genes, full messenger ribonucleic acid (mRNA) and exonic + intronic sequences (p ≥ 10−6, Fig. 1)

Read more

Summary

Introduction

Studying genetic variation distribution in proteins containing charged regions, called charge clusters (CCs), is of great interest to unravel their functional role. Charge clusters are 20 to 75 residue segments with high net positive charge, high net negative charge, or high total charge relative to the overall charge composition of the protein. Karlin [1] defined and identified CCs in proteins as 20 to 75 residue segments with high net positive charge (Positive Charge Clusters, PCCs), high net negative charge (Negative Charge Clusters, NCCs), or high total charge (mixed charge clusters) relative to the overall charge composition of the protein. In a recent unpublished study, we found out that CCs are structurally mainly intrinsically disordered or contained in intrinsically disordered proteins. This result was reported by Choura and Rebai [5]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call