Abstract

Cloud computing offers the possibility to store and process massive amounts of remotely sensed hyperspectral data in a distributed way. Dimensionality reduction is an important task in hyperspectral imaging, as hyperspectral data often contains redundancy that can be removed prior to analysis of the data in repositories. In this regard, the development of dimensionality reduction techniques in cloud computing environments can provide both efficient storage and preprocessing of the data. In this paper, we develop a parallel and distributed implementation of a widely used technique for hyperspectral dimensionality reduction: principal component analysis (PCA), based on cloud computing architectures. Our implementation utilizes Hadoop's distributed file system (HDFS) to realize distributed storage, uses Apache Spark as the computing engine, and is developed based on the map-reduce parallel model, taking full advantage of the high throughput access and high performance distributed computing capabilities of cloud computing environments. We first optimized the traditional PCA algorithm to be well suited for parallel and distributed computing, and then we implemented it on a real cloud computing architecture. Our experimental results, conducted using several hyperspectral datasets, reveal very high performance for the proposed distributed parallel method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call