Abstract

Bump hunting is an important approach to the extraction of insights from Euclidean datasets. Recently, it has been explored for graph datasets for the first time, and a single bump is hunted in an unweighted graph in this exploration. Here, we extend this exploration by hunting multiple bumps in a weighted graph. Given a weighted graph and a set of query nodes exhibiting a property of interest, our objective is to find k non-overlapping and connected subgraphs, i.e., bumps, in which the discrepancy between the numbers of query and non-query nodes is maximized and the sum of edge costs is minimized simultaneously. We prove that our extended bump hunting problem can be transformed to a recently formulated Prize-Collecting Steiner Forest Problem (PCSFP). We further prove that PCSFP is NP-hard even in trees. Then, we propose a fast approximation algorithm for solving PCSFP in trees. Based on this algorithm, we improve the state-of-the-art approximation algorithm for solving PCSFP in graphs, and prove that the solutions of our improvement are always better than or equal to those of the state-of-the-art algorithm. Moreover, we adapt the existing bump hunting algorithms for solving our extended bump hunting problem. We evaluate our methodology via real datasets, and show that 1) our improvement scales well to large graphs, while producing solutions that dominate those of the state-of-the-art algorithm; and 2) our adaptation of an existing bump hunting algorithm can also produce solutions that are better than those of the state-of-the-art algorithm in some cases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.