Abstract

In this paper, we implement data reduction methods to reduce the size of image datasets from cotton fields for use in a high-throughput phenotyping (HTP) pipeline in order to allow for data transfer more quickly over poor internet connections. We investigate dimensionality reduction methods to accomplish this goal. Specifically, we utilize Principal Component Analysis (PCA) to compress image data into a smaller dimension space, which when uncompressed retains significant variability from the original image. To demonstrate the ability of PCA to produce quality reconstructions, we consider the example use case of detecting cotton bloom flowering patterns with reconstructed images. We employ Open Source Computer Vision (OpenCV) to generate pixel-wise masks which both further reduces the byte size of data and successfully identifies cotton bloom flowering. The results indicate a high amount of data reduction from the original to the reconstructed images; byte sizes reduce 93% through PCA while preserving around 98% variance when using a much smaller number of components. Bitwise masking with OpenCV yields a 99% reduction in file size. The results demonstrate great potential in employing machine learning techniques for the data reduction pre-processing step prior to performing subsequent analysis. This data reduction is a crucial step in developing a field-based HTP big data pipeline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.