Abstract

Support vector regression models are created and used to predict the retention times of oligonucleotides separated using gradient ion-pair chromatography with high accuracy. The experimental dataset consisted of fully phosphorothioated oligonucleotides. Two models were trained and validated using two pseudo-orthogonal gradient modes and three gradient slopes. The results show that the spread in retention time differs between the two gradient modes, which indicated varying degree of sequence dependent separation. Peak widths from the experimental dataset were calculated and correlated with the guanine-cytosine content and retention time of the sequence for each gradient slope. This data was used to predict the resolution of the n – 1 impurity among 250 000 random 12- and 16-mer sequences; showing one of the investigated gradient modes has a much higher probability of exceeding a resolution of 1.5, particularly for the 16-mer sequences. Sequences having a high guanine-cytosine content and a terminal C are more likely to not reach critical resolution. The trained SVR models can both be used to identify characteristics of different separation methods and to assist in the choice of method conditions, i.e. to optimize resolution for arbitrary sequences. The methodology presented in this study can be expected to be applicable to predict retention times of other oligonucleotide synthesis and degradation impurities if provided enough training data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.