Abstract

In this paper, we implement data reduction methods to reduce the size of image datasets from cotton fields for use in a high-throughput phenotyping (HTP) pipeline in order to allow for data transfer more quickly over poor internet connections. We investigate dimensionality reduction methods to accomplish this goal. Specifically, we utilize Principal Component Analysis (PCA) to compress image data into a smaller dimension space, which when uncompressed retains significant variability from the original image. To demonstrate the ability of PCA to produce quality reconstructions, we consider the example use case of detecting cotton bloom flowering patterns with reconstructed images. We employ Open Source Computer Vision (OpenCV) to generate pixel-wise masks which both further reduces the byte size of data and successfully identifies cotton bloom flowering. The results indicate a high amount of data reduction from the original to the reconstructed images; byte sizes reduce 93% through PCA while preserving around 98% variance when using a much smaller number of components. Bitwise masking with OpenCV yields a 99% reduction in file size. The results demonstrate great potential in employing machine learning techniques for the data reduction pre-processing step prior to performing subsequent analysis. This data reduction is a crucial step in developing a field-based HTP big data pipeline.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call