Abstract

Affinity Propagation (AP) is widely used since being put forward, this paper shows advantages of using it in duplicate detection. To further enhance the effectiveness and efficiency of AP, a hybrid feature selection scheme is proposed, that is, a filter feature subset evaluation criterion is chosen to decide the best subsets for a given cardinality, thus reducing the complexity, then using the performance of AP as a criterion for feature subset evaluation criterion to select the final best subset among the best subsets across. In case of data with large number of features, sampling search and simulated annealing are used to reduce the time consuming. Experiments show that this scheme works with both effectiveness and efficiency. The sample search can achieve a good effectiveness with a relatively good efficiency, and simulated annealing can achieve a not bad effectiveness with good efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call