Abstract

The article considers the task of estimating the time of transmission of large data sets through a distributed data management environment based on a self-created neural network model and investigating the properties of this environment using statistical analysis methods. For the initial analysis, metadata for successful file transfers in the system was obtained, variables that affect file transfer time were transformed and highlighted. Different samples were used in the analysis to check whether the results were similar across the available data. The methods of correlation and regression analysis are applied for the study of the environment. It was found that there is no clear correlation between the transmission time and one of the input parameters. The file transfer time depends on a number of external factors that cannot be obtained using metadata, but it is possible to partially investigate the environment using the obtained metadata. A model based on two input levels for numerical and categorical variables was used and then combined into one branch. RMSE metric value and a scatter plot are used to display the prediction results to compare the target and predicted values. The performed calculations show satisfactory prediction results;

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call