Abstract

The purpose of this project was to determine if I could develop an early and accurate model of breast cancer detection that can decrease the mortality rate of women by using a novel dual-layered Random Forest with Null Handling. Mammograms have an accuracy of about 86.9% and are susceptible to False negatives, and False positives. In order for my model to be trained and tested, the Wisconsin Data for Breast Cancer was accumulated and duplicated. In the duplicated data, random values were deleted. The first random forest is then trained on x% of the processed data. The next random forest is trained on the output of the previous random forest and the processed data. It acted to fine-tune results from the previous model. Lastly, the majority of the votes from the individual random forests led to the cancer prediction. I found out that dual-layered random forests with null values in their training data had an accuracy of 94.4%, which is 7% higher than human accuracy. This model also overcame overfitting. All our dual-layered models or models trained with appended null data worked better than human detection and could be built and tested in under 7 seconds with an easy-to-use interface, allowing for results in the same visit to the hospital. The best model had the first layer of 200 trees, the second layer of 800 trees, and accuracy of over 94% compared to humans with 86.9%. This model is fast, accurate, and can save people’s lives.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call