McatCS: A Highly Efficient Cross-matching Scheme for Multi-band Astronomical Catalogs

Bingyao Li,Xiaoteng Hu,Ce Yu,Jian Xiao,Shanjiang Tang,Dongwei Fan,Chen Li,Chenzhou Cui

doi:10.1088/1538-3873/ab024c

Bingyao Li, Xiaoteng Hu + Show 6 more

Open Access

PDF Available

https://doi.org/10.1088/1538-3873/ab024c

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Multi-band astronomical catalog cross-matching has always been, and will continue to be, indispensable to astronomy research. However, the archived data volume in different wavebands is extremely huge, which results in the cross-matching process having high computational consumption and slow response. The complexity will also be augmented by the continuous growth of observational data. In this paper, we present mcatCS (multi-band catalog Cross-matching Scheme), a distributed cross-matching scheme to efficiently integrate celestial object data from billion-row multi-band astronomical catalogs. It is deployed on a cluster of commodity machines and provides a command-line-based interface to the end user. To allow fast cross-matching, the data in catalogs are reformatted into the Grouped Spatial Index File, which is a specially designed multi-band catalog uniform format. Furthermore, a min-conflicts data layout strategy is utilized to maximize the parallelization of cross-matching. Using real data, archived in the National Astronomical Observatories of China, we verify that mcatCS has good capabilities for performing efficient and reliable cross-matching between billion-row multi-band catalogs, and experimental results show that the query response speed is 38% to 45% greater than that of MongoDB and 21% to 32% greater than that of PostgreSQL with the HEALPix B-tree index. Moreover, although Q3C and H3C—the extension index packages for PostgreSQL—offer faster query response speed for less than 85 million sources, mcatCS proves to be advantageous after sources scale up to 100 million, and achieves a time reduction of 30.3% and 30.7% compared to Q3C and H3C for 200 million sources.

Full Text