Base editing is an enhanced gene editing approach that enables the precise transformation of single nucleotides and has the potential to cure rare diseases. The design process of base editors is labour-intensive and outcomes are not easily predictable. For any clinical use, base editing has to be accurate and efficient. Thus, any bystander mutations have to be minimized. In recent years, computational models to predict base editing outcomes have been developed. However, the overall robustness and performance of those models is limited. One way to improve the performance is to train models on a diverse, feature-rich, and large dataset, which does not exist for the base editing field. Hence, we develop BE-dataHIVE, a mySQL database that covers over 460,000 gRNA target combinations. The current version of BE-dataHIVE consists of data from five studies and is enriched with melting temperatures and energy terms. Furthermore, multiple different data structures for machine learning were computed and are directly available. The database can be accessed via our website https://be-datahive.com/ or API and is therefore suitable for practitioners and machine learning researchers.
Read full abstract