Abstract

Recombinant proteins and monoclonal antibodies offer great promise as therapeutics for hundreds of diseases. Today, there are almost 400 biotechnology drugs in development for over 200 different conditions. Many of these drugs are glycoproteins for which the correct glycosylation patterns are important for their structure and function. Achieving and maintaining proper glycosylation is a major challenge in biotechnology manufacturing. Most recombinant therapeutic glycoproteins are produced in living cells. This method is used in an attempt to correctly match the glycosylation patterns found in the natural human form of the protein and achieve optimal in vivo functionality. However, utilizing cell systems to produce glycoproteins requires balancing the cells ability to produce the protein with its ability to attach the appropriate carbohydrates. One limitation of this approach is that the expression systems do not maintain complete glycosylation under high-volume production conditions. This results in low yields of usable product and contributes to the cost and complexity of producing these drugs. Incorrect glycosylation also affects the half-life of the drug. Low production yields are a significant contributor to the critical worldwide shortage of biotechnology manufacturing capacity. To achieve higher production yields, the required quality standards to fulfill regulations by health authorities, fast, accurate and preferably inexpensive analytical methods are required. Nowadays the (routine) analysis of therapeutic glycoprotein is accomplished by analytical HPLC, MS or Lectin blotting and in conjunction with chemical derivatization, exo-glycosidases treatment, and/or other selective chemical cleavage reactions. The fact that different carbohydrates have very similar molecular weights and physicochemical properties makes the analysis of glycosylation slow and complex. Conventional glycoanalysis requires multiple steps to obtain the structure, sequence and prevalence of all glycans in a glycoprotein sample. Complete analysis typically takes several days and highly trained personnel. Therefore, the need for more efficient and rapid glycoanalysis methodology is fundamental to the success of biotechnologically produced drugs. With this demand in the back of one's mind, a 13C-NMR spectra analysis system for oligosaccharides based on multiple Back-propagation neural networks was developed during this thesis. Before the realization of the idea, some fundamental questions had to be posed: 1. Are the monosaccharide moieties, the anomeric configuration and the substitution pattern of an oligosaccharide shown in a NMR (13C or 1H) spectrum? 2. What kind of NMR data provides this information better (1H or 13C-NMR)? 3. How can spectroscopic data be processed, compressed and transferred into a neural network? 4. Which neural network architecture, learning algorithm and learning parameters lead to optimal results? Preliminary experiments showed that the six chemical shifts of a monosaccharide moiety (from glucose, galactose and mannose) suffice to identify the monosaccharide itself, the anomeric configuration (if the anomeric carbon atom is substituted) and the substitution position(s). The experiments also revealed that these compounds could be almost completely separated by the help of Counter-propagation neural networks. The main goal of the neural network approach was to recognize every single monosaccharide moiety in an oligosaccharide and train specialized separated networks for each monosaccharide moiety group. Therefore, the neural networks should be trained with the 13C-NMR spectra of these monosaccharide moieties. During the test phase, the whole spectrum of an oligosaccharide will be presented to the network and the specialized networks should then only recognize the monosaccharide moieties they are trained for. Initial attempts to train a Back-propagation neural network to identify six methyl pyranoside compounds failed. This lack of success was because the data set used was too small and an uncompressed NMR spectrum leads to too many input neurons. Therefore, the data foundation was changed and enlarged with 535 monosaccharide moieties (mostly galactose, glucose and mannose) from literature and a special data compression (JCAMP-DX for NMR files) and parsing software tool called ANN Pattern File Generator was developed. The entire dataset was normalized and stored in a FileMaker 13C-NMR database. Further experiments with this new dataset, different Back-propagation network layouts and training parameters still did not achieve the designated recognition rate of unknown test compounds. The training performance of the neural networks seems to be insensible against major changes of training parameters. Tests with a new and enlarged dataset (1000 oligosaccharides and approx. 2500 monosaccharide moieties) with Kohonen networks highlighted, that separate Kohonen networks for each monosaccharide type yield to higher recognition rates than networks, which have to deal with all three monosaccharide types at once. This cognition was transferred to separate back propagation networks, which now showed recognition rates higher than 90% for unknown compounds. This separated approach worked excellent for disaccharides with two different monosaccharide moieties. Disaccharides with similar or identical moieties cannot be identified because the designated neural network recognizes only one monosaccharide at once. Out of this disadvantage, the so-called 'ensemble' or 'group of experts' approach was developed. Here, one utilizes the fact, that no trained neural network shows exactly the same recognition characteristics. Different neural networks respond differently to the same test inputs. Twenty trained neural networks at a time were grouped into ensembles. All these networks are trained to recognize the same monosaccharide moiety. After presenting a test input (e.g. disaccharide) to this group of experts, one gets at the most extreme case, twenty different recognition results. Afterwards, the results can be statistically analyzed. In the case of a disaccharide with two monosaccharide moieties of the same carbohydrate (e.g. α-D-Glcp-1-4-β-DGlcp- OMe), the analysis will deliver both monosaccharide compounds because some networks recognized one and other networks the other part of the disaccharide. The ensemble approach brought the final breakthrough of this thesis. Disaccharide recognition rates in the range of 85 – 96% (depending on the monosaccharide moiety – glucose, galactose or mannose) demonstrate the feasibility of the approach. The hit rates of the different ensembles can certainly be improved by a more subtle choice of the members of each ensemble. An ongoing diploma work shows a recognition improvement in this direction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call