Biological Machine Learning Research Articles

Protein representations from deep language models have yielded state-of-the-art performance across many tasks in computational protein engineering. In recent years, progress has primarily focused on parameter count, with recent models’ capacities surpassing the size of the very datasets they were trained on. Here we propose an alternative direction. We show that large language models trained on codons, instead of amino acid sequences, provide high-quality representations that outperform comparable state-of-the-art models across a variety of tasks. In some tasks, such as species recognition, prediction of protein and transcript abundance or melting point estimation, we show that a language model trained on codons outperforms every other published protein language model, including some that contain over 50 times more parameters. These results indicate that, in addition to commonly studied scale and model complexity, the information content of biological data provides an orthogonal direction to improve the power of machine learning in biology.

Read full abstract

Machine learning (ML) has become an essential asset for the life sciences and medicine. We selected 250 articles describing ML applications from 17 journals sampling 26 different fields between 2011 and 2016. Independent evaluation by two readers highlighted three results. First, only half of the articles shared software, 64% shared data and 81% applied any kind of evaluation. Although crucial for ensuring the validity of ML applications, these aspects were met more by publications in lower-ranked journals. Second, the authors’ scientific backgrounds highly influenced how technical aspects were addressed: reproducibility and computational evaluation methods were more prominent with computational co-authors; experimental proofs more with experimentalists. Third, 73% of the ML applications resulted from interdisciplinary collaborations comprising authors from at least two of the three disciplines: computational sciences, biology, and medicine. The results suggested collaborations between computational and experimental scientists to generate more scientifically sound and impactful work integrating knowledge from both domains. Although scientifically more valid solutions and collaborations involving diverse expertise did not correlate with impact factors, such collaborations provide opportunities to both sides: computational scientists are given access to novel and challenging real-world biological data, increasing the scientific impact of their research, and experimentalists benefit from more in-depth computational analyses improving the technical correctness of work. Applications of machine learning in the life sciences and medicine require expertise in computational methods and in scientific subject matter. The authors surveyed articles in the life sciences that included machine learning applications, and found that interdisciplinary collaborations increased the scientific validity of published research.

Read full abstract

Biological Machine Learning Research Articles

Related Topics

Articles published on Biological Machine Learning

Codon language embeddings provide strong signals for use in protein engineering

Current Status of Machine Learning Applications in Molecular Biology and Biological Signal Processing

Artificial intelligence in interdisciplinary life science and drug discovery research.

A guide to machine learning for biologists.

Systematic auditing is essential to debiasing machine learning in biology

BactClass: Simplifying the Use of Machine Learning in Biology and Medicines

Validity of machine learning in biology and medicine increased through collaborations across fields of expertise

Setting the standards for machine learning in biology.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biological Machine Learning Research Articles

Related Topics

Articles published on Biological Machine Learning

Codon language embeddings provide strong signals for use in protein engineering

Current Status of Machine Learning Applications in Molecular Biology and Biological Signal Processing

Artificial intelligence in interdisciplinary life science and drug discovery research.

A guide to machine learning for biologists.

Systematic auditing is essential to debiasing machine learning in biology

BactClass: Simplifying the Use of Machine Learning in Biology and Medicines

Validity of machine learning in biology and medicine increased through collaborations across fields of expertise

Setting the standards for machine learning in biology.