Open-source analytical pipeline for robust data analysis, visualizations and sharing in crop breeding

Joie Ramos,Apurva Khanna,Mahender Anumalla,Ma Teresa Sta Cruz,Sankalp Bhosale,Waseem Hussain,Margaret Catolos

doi:10.1186/s13007-022-00845-7

Joie Ramos, Apurva Khanna + Show 5 more

Open Access

https://doi.org/10.1186/s13007-022-00845-7

Copy DOI

Journal: Plant Methods	Publication Date: Feb 5, 2022
Citations: 3	License type: open-access

Affiliation: International Rice Research Institute

Abstract

BackgroundDeveloping a systematic phenotypic data analysis pipeline, creating enhanced visualizations, and interpreting the results is crucial to extract meaningful insights from data in making better breeding decisions. Here, we provide an overview of how the Rainfed Rice Breeding (RRB) program at IRRI has leveraged R computational power with open-source resource tools like R Markdown, plotly, LaTeX, and HTML to develop an open-source and end-to-end data analysis workflow and pipeline, and re-designed it to a reproducible document for better interpretations, visualizations and easy sharing with collaborators.ResultsWe reported the state-of-the-art implementation of the phenotypic data analysis pipeline and workflow embedded into a well-descriptive document. The developed analytical pipeline is open-source, demonstrating how to analyze the phenotypic data in crop breeding programs with step-by-step instructions. The analysis pipeline shows how to pre-process and check the quality of phenotypic data, perform robust data analysis using modern statistical tools and approaches, and convert it into a reproducible document. Explanatory text with R codes, outputs either in text, tables, or graphics, and interpretation of results are integrated into the unified document. The analysis is highly reproducible and can be regenerated at any time. The analytical pipeline source codes and demo data are available at https://github.com/whussain2/Analysis-pipeline.ConclusionThe analysis workflow and document presented are not limited to IRRI’s RRB program but are applicable to any organization or institute with full-fledged breeding programs. We believe this is a great initiative to modernize the data analysis of IRRI’s RRB program. Further, this pipeline can be easily implemented by plant breeders or researchers, helping and guiding them in analyzing the breeding trials data in the best possible way.

Highlights

Developing a systematic phenotypic data analysis pipeline, creating enhanced visualizations, and interpreting the results is crucial to extract meaningful insights from data in making better breeding decisions
We showed how to pull Best Linear Unbiased Predictors (BLUPs), variance components, heritability, ANOVA, and variogram to check for spatial trends for the trait using the best model
In Multi-environment trial (MET) analysis, besides these results mentioned above, we used the ASExtras4 R package to extract additional results, including correlation and covariance matrix, G x E BLUPs, PCA biplot, and latent regression figures to check the stability of genotypes (Fig. 2)

Summary

Results

This section demonstrated how to extract the results from the separate analysis or MET analysis using either the ASReml-R package or lme R packages. For more details on these results, check the HTML workflows of ASReml and lme analysis available on GitHub. we demonstrated how to extract the heritability and generalized heritability [45] using different approaches. In lme R package analysis, ANOVA, variance components, fixed effect as BLUEs, random effect as BLUPs, and heritability were extracted 2) Any new data and editing/corrections to the existing pipeline can be done by re-knitting the R markdown ‘.Rmd’ document (https://rmarkdown.rstudio.com/articles_intro.html) This analytical pipeline avoids manually updating or generating reports or PowerPoint slides, which are otherwise highly prone to errors and time-consuming. Hyperlinks have been embedded in the required sections to help in understand-

Conclusion

Background

Conclusions