Abstract

ABSTRACT The amount of observational data produced by time-domain astronomy is exponentially increasing. Human inspection alone is not an effective way to identify genuine transients from the data. An automatic real-bogus classifier is needed and machine learning techniques are commonly used to achieve this goal. Building a training set with a sufficiently large number of verified transients is challenging, due to the requirement of human verification. We present an approach for creating a training set by using all detections in the science images to be the sample of real detections and all detections in the difference images, which are generated by the process of difference imaging to detect transients, to be the samples of bogus detections. This strategy effectively minimizes the labour involved in the data labelling for supervised machine learning methods. We demonstrate the utility of the training set by using it to train several classifiers utilizing as the feature representation the normalized pixel values in 21 × 21 pixel stamps centred at the detection position, observed with the Gravitational-wave Optical Transient Observer (GOTO) prototype. The real-bogus classifier trained with this strategy can provide up to $95{{\ \rm per\ cent}}$ prediction accuracy on the real detections at a false alarm rate of $1{{\ \rm per\ cent}}$.

Highlights

  • Transient astronomy focuses on astrophysical objects that vary on timescales of hours to years, and can originate from events such as supernovae, accreting binaries, stellar flares, tidal disruption events and gamma-ray bursts

  • We further reduce the contaminants by filtering the detections falling outside the range between 0.3% and 99.7% percentiles of the normalized full-width half-maximum (FWHM) distribution over each image, as well as detections brighter than m=12 were removed in order to reduce the contamination due to bright objects with diffraction spikes

  • We compare the performance between the classifiers trained on the quick-build training set and the injection data set

Read more

Summary

INTRODUCTION

Transient astronomy focuses on astrophysical objects that vary on timescales of hours to years, and can originate from events such as supernovae, accreting binaries, stellar flares, tidal disruption events and gamma-ray bursts. The random forest (RF) technique is a machine learning algorithm with the architecture of multiple decision trees It performed best in terms of the figure-ofmerit (FOM) for both W15 and G17 studies i.e., using either isophotal measurements or normalized pixel values as the classification features. Some authors (e.g., Gieseke et al 2017) have claimed that CNN shows the best performance at picking out real candidates in difference images As their model was trained with a relatively small data set containing 2 237 instances and tested on a sample containing only 227 real transients, further tests are required on significantly larger datasets.

Motivation
Image processing
DATA SETS
Injection data set
FEATURE EXTRACTION AND PREPROCESSING
CLASSIFICATION ALGORITHMS
RESULTS AND PERFORMANCE
Performance of the injection test
Performance on the MP data set
Feature importance
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call