Abstract

The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees2 every 15 s. Each exposure consists of 36 × 4k × 4k pixels, typically resulting in 36 × ∼175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDB’s high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call