Abstract

Background: Breast cancer is one of the leading cause of mortality among women worldwide. The Breast Cancer Resource Centre (BCRC) of University Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia, started the Malaysian Breast Cancer Survivorship Cohort (MyBCC) study in 2012. Aim: As a further enhancement of the research, the MyBCC database has been developed to conduct the survey in a convenient way, which aims to predict the factors influencing different survival rate among patients from multiethnic origin using data science techniques. Methods: The database comprised of life style related data of the patients including demographic factors, information on other illness, clinical factors, quality of life, psychosocial support, physical activity, work related questions, depression score, family background, type of medication consumed and financial status of the patients. This paper presents an approach to build an automated workflow using the MySQL database management system and PHP, integrated with R and HTML for web display. Results: A relational database comprising 816 breast cancer patients' data were developed for the MyBCC cohort study. This database serves as the backend for the MyBCC application where researchers can register new patients' records, update and view the information of recruited patients by using the system in a more commodious environment than before. Besides, the MyBCC database has been integrated with R programming tool by deploying the RMySQL package to perform audits. A few important analysis using plotly package, leveraging the integration of R with database are presented. Conclusion: In this paper, the development of the MyBCC database is presented, with the aim to automate the manual process of data entry, storage and analysis for performing audits for the breast cancer cohort study. The integration of database with R for automated analysis of data are also shown using examples of predictions that can be generated using functions in R. This fully automated workflow reduces the workload and time taken in performing manual predictions using data sources stored in flat files.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call