Identification of Pockets on Protein Surface to Predict Protein–Ligand Binding Sites

Bingding Huang

doi:10.1007/978-94-007-5285-6_2

Abstract

Proteins perform their biological functions in different cell processes mainly by interacting with other molecules such as other proteins, ligands, DNAs and RNAs etc. Not all but only parts of residues in proteins are involved in such interactions. Therefore, identification of these interacting residues on a protein is of great importance to understanding of protein functions. In the variety of interactions, the interactions between proteins and ligands have been widely studied in protein-ligand docking, in virtual screening and structure-based drug design etc. There exist a number of cavities or pocket sites on protein surface where small molecules might bind. Therefore, identification of such pocket sites is often the first step in protein ligand-binding site prediction. Many computational algorithms and tools have been developed in recent decades to predict protein-ligand binding site from identification of pockets on protein structures, such as POCKET (Levitt and Banaszak 1992), LIGSITE (Hendlich et al. 1997), CAST (Dundas et al. 2006; Binkowski et al. 2003), LIGSITECS/C (Huang and Schroeder 2006), PASS (Brady and Stouten 2000), Q-SiteFinder (Laurie and Jackson 2005), SURFNET (Laskowski 1995), Fpocket (Le Guilloux et al. 2009), GHECOM (Kawabata 2010), ConCavity (Capra et al. 2009), POCASA (Yu et al. 2010), PocketPicker (Weisel et al. 2007), SiteHound (Ghersi and Sanchez 2009; Hernandez et al. 2009) and so on. Some of these methods have been described in details in other chapters. Most of the existing methods for protein-ligand binding site prediction can be classified into two types: geometry-based and energy-based. The geometry-based methods can be further classified into grid-based, sphere-based and α-shape-based (Kawabata 2010; Yu et al. 2010). In the grid based methods, the protein structure is projected into a 3D grid and the grid points are categorized into different types such as “outside protein”, “inside protein” and “near protein surface” according to their positions related to the protein. Then those grid points not inside protein are clustered using some geometry attributes and those grids points at the pocket sites can be recognized in the end. LIGSITECS, GHECOM, PocketPicker and ConCavity are the representatives of such type. In LIGSITEcs, the grid points are categorized into three types: inside protein, near surface and in the solvent. For all the solvent points, a seven-direction scanning is applied. All the solvent grid points will be evaluated by the number of SSS (surface-solvent-surface) event it has, and if the grid point has more or equal than five such events, it normally locates at a pocket site point. LIGSITEcs will be explained in details in the next section. GHECOM also firstly projects the protein into a 3D grid, and the geometry attribute used in this method is mathematical morphology. It uses the theory of mathematical morphology to define the pocket region on protein surface. In mathematical morphology (Masuya and Doi 1995), there are four basic operations of dilation, erosion, opening and closing for a probe to define a pocket site. In ConCavity, a 3D grid is constructed to include the protein as well. Each grid point is evaluated and scored by the structural information and the evolutional information. In the end, the regions with many high-scoring grid points are considered to be pocket sites. In the sphere-based approaches, the common strategy is to fulfill the spheres on protein surface layer by layer and a cutting method is applied when fulfilling. The final pocket sites are that those regions which are in rich of such spheres. This kind of methods include SURFNET, PASS, PHECOM (Kawabata and Go 2007) and POCASA (Yu et al. 2010). Approaches based on α-shape include CAST and Fpocket. CAST computes the triangulations of the protein’s surface atoms and these triangulations are grouped by letting small sized ones flow towards the neighboring larger one. The pocket sites are the collection of empty triangles. Different from CAST, Fpocket uses the idea of α- sphere which is a sphere contacting four atoms on its boundary and containing no inside atom. The next step is to identify clusters of spheres close together and those clusters are potential pocket sites. In contrast to geometry-based methods, there are some methods which Q-SiteFinder (Laurie and Jackson 2005) aims to find pocket sites by computing the interaction energy between protein atoms and a small molecule probe. In Q-SiteFinder, layers of methyl (―CH3) probes are initialized on protein surface to calculate the van der Waals interaction energy between the protein atoms and the probes. Then the probes are clustered into many groups and are ranked by the total energy of probes. Those clusters with high energy will be the potential ligand binding sites. SiteHound (Ghersi and Sanchez 2009; Hernandez et al. 2009) is similar to Q-SiteFinder but it includes Lennard-Jones and electrostatics energy terms and uses different types of probes to calculate interaction energy. Table 2.1 briefly summarizes the category of these existing computational methods.

Full Text