Abstract
Protein domain interactions with short linear peptides, such as those of the Src homology 2 (SH2) domain with phosphotyrosine-containing peptide motifs (pTyr), are ubiquitous and important to many biochemical processes of the cell. The desire to map and quantify these interactions has resulted in the development of high-throughput (HTP) quantitative measurement techniques, such as microarray or fluorescence polarization assays. For example, in the last 15 years, experiments have progressed from measuring single interactions to covering 500,000 of the 5.5 million possible SH2-pTyr interactions in the human proteome. However, high variability in affinity measurements and disagreements about positive interactions between published data sets led us here to reevaluate the analysis methods and raw data of published SH2-pTyr HTP experiments. We identified several opportunities for improving the identification of positive and negative interactions and the accuracy of affinity measurements. We implemented model-fitting techniques that are more statistically appropriate for the nonlinear SH2-pTyr interaction data. We also developed a method to account for protein concentration errors due to impurities and degradation or protein inactivity and aggregation. Our revised analysis increases the reported affinity accuracy, reduces the false-negative rate, and increases the amount of useful data by adding reliable true-negative results. We demonstrate improvement in classification of binding versus nonbinding when using machine-learning techniques, suggesting improved coherence in the reanalyzed data sets. We present revised SH2-pTyr affinity results and propose a new analysis pipeline for future HTP measurements of domain-peptide interactions.
Highlights
Replicates, reflecting random noise and experimental error, tak- est likelihood of the true population value of affinity
Protein concentration errors due to batch impurities or degradation can manifest as a range of Kd values in replicate measurements made from different batches of protein, all of which would be equal to or higher than the true Kd, while simultaneously coming from high-quality, low-noise replicate fits
Because we do not have true information at the batch level or activity of each protein sample, these patterns must be inferred from the data. These patterns are difficult to spot due to the nature of the experimental design, we find examples of nonrandom run-dependent variations in affinity in the data (Fig. S11)
Summary
In the process of evaluating published high-throughput data, we found significant disagreement between data sets. Protein concentration errors due to batch impurities or degradation can manifest as a range of Kd values in replicate measurements made from different batches of protein, all of which would be equal to or higher than the true Kd, while simultaneously coming from high-quality, low-noise replicate fits This exact phenomenon has been demonstrated experimentally [31]. Note that the minimum of each replicate group was selected as most accurately reflecting the true affinity, our revised affinity values are not all lower than the original publication This is primarily due to significant changes at the replicate level, where some original replicates were removed from consideration by changes in the fitting process, and a number of new replicates were included in each replicate set. The improved average performance and lower variability in our revised results suggest improved coherency in our revised analysis over the original published results
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.