Constant time approximation scheme for largest well predicted subset

Bin Fu,Lusheng Wang

doi:10.1007/s10878-010-9371-1

Abstract

The largest well predicted subset problem is formulated for comparison of two predicted 3D protein structures from the same sequence. A 3D protein structure is represented by an ordered point set A={a 1,?,a n }, where each a i is a point in 3D space. Given two ordered point sets A={a 1,?,a n } and B={b 1,b 2,?b n } containing n points, and a threshold d, the largest well predicted subset problem is to find the rigid body transformation T for a largest subset B opt of B such that the distance between a i and T(b i ) is at most d for every b i in B opt . A meaningful prediction requires that the size of B opt is at least ?n for some constant ? (Li et al. in CPM 2008, 2008). We use LWPS(A,B,d,?) to denote the largest well predicted subset problem with meaningful prediction. An (1+? 1,1?? 2)-approximation for LWPS(A,B,d,?) is to find a transformation T to bring a subset B?⊆B of size at least (1?? 2)|B opt | such that for each b i ?B?, the Euclidean distance between the two points distance?(a i ,T(b i ))≤(1+? 1)d. We develop a constant time (1+? 1,1?? 2)-approximation algorithm for LWPS(A,B,d,?) for arbitrary positive constants ? 1 and ? 2. To our knowledge, this is the first constant time algorithm in this area. Li et al. (CPM 2008, 2008) showed an $O(n(\log n)^{2}/\delta_{1}^{5})$ time randomized (1+? 1)-distance approximation algorithm for the largest well predicted subset problem under meaningful prediction. We also study a closely related problem, the bottleneck distance problem, where we are given two ordered point sets A={a 1,?,a n } and B={b 1,b 2,?b n } containing n points and the problem is to find the smallest d opt such that there exists a rigid transformation T with distance(a i ,T(b i ))≤d opt for every point b i ?B. A (1+?)-approximation for the bottleneck distance problem is to find a transformation T, such that for each b i ?B, distance?(a i ,T(b i ))≤(1+?)d opt , where ? is a constant. For an arbitrary constant ?, we obtain a linear O(n/? 6) time (1+?)-algorithm for the bottleneck distance problem. The best known algorithms for both problems require super-linear time (Li et al. in CPM 2008, 2008).

Full Text