Abstract

Abstract Weather data of Kashmir province has 6 attributes recorded at three different substations. This paper proposes a distributed decision tree algorithm and its implementation on Historical Geographical data of Kashmir province. The machine learning Decision tree algorithm applied on the Kashmir province dataset generates the accuracy of 81.54%. The distributed decision tree generates multiple trees based on the partitions of the original dataset in which the data is segregated according to the substations (42026, 42027 and 42044). The ratio between generated data sets was distributed in 32.38%, 34.19% and 33.42% respectively which is appropriate for the parallelism. Its distributed implementation, i.e. Distributed Decision Tree produces a specified number of sub-trees (depending upon number of partitions of input dataset) and at the end collects votes or averages the prediction or classification. In this paper, we have implemented the hard- voting approach to calculate the overall performance of the n-number of trees in distributed environment. The empirical results demonstrate that distributed decision trees approach has not improved the overall accuracy as compared to the original dataset without partitioning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.