A wealth of high-throughput biological data, of which omics constitute a significant fraction, has been made publicly available in repositories over the past decades. These data come in various formats and cover a range of species and research areas providing insights into the complexities of biological systems; the public repositories hosting these data serve as multifaceted resources. The potentially greater value of these data lies in their secondary utilization as the deployment of data science and artificial intelligence in biology advances. Here, we critically evaluate challenges in secondary data use, focusing on omics data of human embryonic kidney cell lines available in public repositories. The emerging issues are obstacles faced by secondary data users across diverse domains as they concern platforms and repositories, which accept deposition of data irrespective of their species type. The evolving landscape of data-driven research in biology prompts re-evaluation of open access data curation and submission procedures to ensure that these challenges do not impede novel research opportunities through data exploitation. This paper aims to draw attention to widespread issues with data reporting and encourages data owners to meticulously curate submissions to maximize not only their immediate research impact but also the long-term legacy of datasets.
Read full abstract