In domains such as the stock market and manufacturing, there’s a growing demand for faster and more accurate data distribution identification methods due to the rapid generation of vast volumes of data, highlighting the need for enhanced real-time decision-making capabilities. Traditional methods of identifying data distributions often rely on manual inspection, limited statistical tests and time-consuming analysis, leading to inefficiencies and inaccuracies in classification. In this scenario, the presented research offers a novel approach leveraging Deep Learning (DL) models to automate the process. The presented methodology also enables faster and more accurate identification of data distributions by the generation of synthetic data points and training of the DL model for identifying different distribution types. The primary objective of this study is to develop a DL model that categorizes data points into specific distributions based on an input dataset. Moreover, for model training and evaluation, a total of 1000 datasets are generated,each comprising 1000 data points. The study considers five distributions (Normal, Uniform, Exponential, Log-normal and Beta distribution), with 200 datasets generated (with randomly selected parameters) for each distribution. In the study, the DL model is trained first, and later, the model is evaluated on a separate test (unseen) dataset. Then, its performance in classifying the distributions is assessed based on metrics such as accuracy and loss. The study results demonstrate the effectiveness of the proposed approach in accurately classifying the distribution of data points, providing valuable insights into the application of DL for distribution classification tasks. The proposed method enhances scalability, robustness and efficiency by harnessing the power of convolutional neural networks and advanced preprocessing techniques.
Read full abstract