Abstract

In this study, we delineate an unsupervised clustering algorithm, minimum span clustering (MSC), and apply it to detect G-protein coupled receptor (GPCR) sequences and to study the GPCR network using a base dataset of 2770 GPCR and 652 non-GPCR sequences. High detection accuracy can be achieved with a proper dataset. The clustering results of GPCRs derived from MSC show a strong correlation between their sequences and functions. By comparing our level 1 MSC results with the GPCRdb classification, the consistency is 87.9% for the fourth level of GPCRdb, 89.2% for the third level, 98.4% for the second level, and 100% for the top level (the lowest resolution level of GPCRdb). The MSC results of GPCRs can be well explained by estimating the selective pressure of GPCRs, as exemplified by investigating the largest two subfamilies, peptide receptors (PRs) and olfactory receptors (ORs), in class A GPCRs. PRs are decomposed into three groups due to a positive selective pressure, whilst ORs remain as a single group due to a negative selective pressure. Finally, we construct and compare phylogenetic trees using distance-based and character-based methods, a combination of which could convey more comprehensive information about the evolution of GPCRs.

Highlights

  • In this study, we delineate an unsupervised clustering algorithm, minimum span clustering (MSC), and apply it to detect G-protein coupled receptor (GPCR) sequences and to study the G-protein-coupled receptors (GPCRs) network using a base dataset of 2770 GPCR and 652 non-GPCR sequences

  • Our results suggest a significant difference in sequence between GPCR and non-GPCR proteins, and MSC is able to detect GPCR sequences with high accuracy if the base dataset is properly chosen

  • The clustering of 2770 GPCR sequences was performed at various resolution levels of MSC

Read more

Summary

Introduction

We delineate an unsupervised clustering algorithm, minimum span clustering (MSC), and apply it to detect G-protein coupled receptor (GPCR) sequences and to study the GPCR network using a base dataset of 2770 GPCR and 652 non-GPCR sequences. Class C includes the metabotropic glutamate family, GABA receptors, calcium-sensing receptors, and taste receptors These receptors are characterized by seven TM helices and a large extracellular N-terminal domain with approximately 600 residues to which ligands bind. The amino acid sequences of GPCRs in classes D-F contain seven hydrophobic domains that are considered TM helices Another classification system of GPCRs, called “GRAFS”, has been proposed based on the phylogenetic tree of approximately 800 human GPCR sequences[13]. This system contains five main families named Glutamate (G), Rhodopsin (R), Adhesion (A), Frizzled/Taste[2] (F), and Secretin (S). The main difference between the GRAFS system and the A-F system is the further division of class B into the Secretin family and the Adhesion family in the GRAFS system based on a preliminary finding that the evolutionary history of these two families is distinct from each other

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call