Abstract

As the ability to engineer biological systems improves with increasingly advanced technology, the risk of accidental or intentional release of a dangerous genetically modified organism becomes greater. It is important that authorities can carry out attribution for the source of a genetically modified biological agent release. In the absence of evidence that ties a release directly to the individuals responsible, attribution can be carried out in part by discovering the in silico tools used to design the engineered genetic components, which can leave a signature in the DNA of the organism. Previous attribution methods have focused on identifying the laboratory of origin of an engineered organism using machine learning on plasmid signatures. The next logical step is to address attribution using signatures from the tools that are used to create the engineered modifications. A random forest classifier was developed that discriminates between design tools used to optimize coding regions for incorporation into the genome of another organism. To this end, tens of thousands of genes were optimized with 4 different codon optimization methods and relevant features from these sequences were generated for a machine learning classifier. This method achieves more than 97% accuracy in predicting which tools were used to design codon optimized genes for expression in other organisms. The methods presented here lay the groundwork for the creation of effective organism engineering attribution techniques. Such methods can act both as deterrents for future attempts at creating dangerous organisms as well as tools for forensic science.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call