Abstract
This paper aims to address two significant challenges of Deep Learning(DL) model generation, high computational cost involved during the training phase and data interpretability from high dimensional data. The computational cost is curbed using centralized as well as distributed matrix factorization technique known as CUR decomposition. CUR decomposition has the advantage of reducing the dimensions of the dataset without data transformation and minimal information loss. Though Singular Value De-composition(SVD) is the best matrix decomposition technique with least reconstruction error, it transforms the data and hence the original data cannot be interpreted from the decomposed matrices while applying DL technique. So CUR decomposition is leveraged to reduce the number of features in high-dimensional data while preserving the essential information. Extensive experimental analysis is conducted to evaluate and compare the performance of CUR and SVD techniques based on reconstruction error and time complexity in centralized as well as distributed setting. The results exhibit that CUR outperforms SVD in terms of computational efficiency, especially in a distributed setting where the decomposition time for CUR was a fraction compared to SVD. At the same time DL models generated from this reduced dataset exhibit comparable accuracy with the original high dimensional dataset as well as reduced dataset obtained through SVD. This work also enhances the accessibility of the CUR algorithm to users, for which an application is developed that enables users to execute the algorithm on any dataset. The user-friendly interface of the application facilitates the upload of datasets, configuration of desired decomposition rank, and execution of the algorithm with a single click.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have