Abstract

Amino acids and their properties are variably distributed in proteins and different compositions determine all protein features, ranging from solubility to stability and functionality. Gini index, a tool to estimate distribution uniformity, is widely used in macroeconomics and has numerous statistical applications. Here, Gini index is used to analyze the distribution of hydrophobicity in proteins and to compare hydrophobicity distribution in globular and intrinsically disordered proteins. Based on the analysis of carefully selected high-quality data sets of proteins extracted from the Protein Data Bank (http://www.rcsb.org) and from the DisProt database (http://www.disprot.org/), it is observed that hydrophobicity is distributed in a more diverse way in intrinsically disordered proteins than in folded and soluble globular proteins. This correlates with the observation that the amino acid composition deviates from the uniformity (estimate with the Shannon and the Gini-Simpson indices) more in intrinsically disordered proteins than in globular and soluble proteins. Although statistical tools tike the Gini index have received little attention in molecular biology, these results show that they allow one to estimate sequence diversity and that they are useful to delineate trends that can hardly be described, otherwise, in simple and concise ways.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.