Abstract

SummaryThe main drawback of conventional tools for digital image processing is the long processing time due to the high complexity of their algorithms. This gets worse when these algorithms need to be sequentially processed with large image sets. To alleviate part of this situation, this paper introduces a general‐purpose tool for massively processing large digital image sets by using Apache Spark. The proposed tool allows users to extract the image rasters and store them in any of Spark's basic distributed data representations, namely, Resilient Distributed Datasets (RDD) and DataFrame (DF), to treat all the subsequent image operations as RDD/DF transformations. Our experiments reveal that, with our proposal, it is possible to schedule and execute distributed image processing tasks in less time, in comparison with another Spark‐based massive image processing tool. In these experiments, we applied several algorithms to 25 000 images (the MIRFLICKR‐25000 set), reaching a maximum speedup of 54x. In addition, it was discovered that the number of images also influences the speedup, as the cluster memory is fully occupied. Therefore, we can claim that, using our proposal, more complex image processing workflows can be built and applied massively to large image sets, achieving competitive speedups.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.