Abstract

<span>Random forest is a machine learning algorithm that mainly built as a classification method to make predictions based on decision trees. Many machine learning approaches used random forest to perform deep analysis on different cancer diseases to understand their complex characterstics and behaviour. However, due to massive and complex data generated from such diseases, it has become difficult to run random forest using single machine. Therefore, advanced tools are highly required to run random forest to analyse such massive data. In this paper, random forest algorithm using Apache Mahout and Hadoop based software defined networking (SDN) are used to conduct the prediction and analysis on large lung cancer datasets. Several experiments are conducted to evaluate the proposed system. Experiments are conducted using nine virtual nodes. Experiments show that the implementation of random forest algorithm using the proposed work outperforms its implementation in traditional environment with regard to the execution time. Comparison between the proposed system using Hadoop based SDN and Hadoop only is performed. Results show that random forest using Hadoop based SDN has less execution time than when using Hadoop only. Furthermore, experiments reveal that the performance of implemented system achieved more efficiency regarding execution time, accuracy and reliability.</span>

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.