Abstract

FITS(Flexible Image Transport System) is the most widely used data format in astronomy. The size of one FITS file ranges from Megabytes(MB) to Gigabytes(GB), even to Terabytes(TB), and astronomers are among the first researchers to encounter Big Data. For most cases astronomers are only interested in certain small sub-area of time series image. However loading the whole raw FITS file from HDD(Hard Disk Drive) every time then cutting it for the target sub-area is both time consuming and I/O wasting, and there is no existing cache scheme optimized for the subset retrieval of FITS files. By recognizing the hot sub-areas according to the latest query history, loading and merging related sub-images via a coordinate-mapping algorithm, we proposed a Pattern-Aware(PA) cache management strategy to efficiently retrieve sub-image data from huge amounts of FITS files. Our novel method was compared with traditional LRU, LFU and LRFU strategies on full FITS files and sub-files respectively. The results show that our PA strategy can maintain a high hit ratio of 64.32%, and reduce the average response time by about 24% than the best of these traditional schemes. These results are achieved with a cache to raw requested data size ratio of 8.77%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call