Abstract

Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These ‘dirty data’ problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine‐learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open‐source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass‐to‐charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp‐applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.

Highlights

  • We examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using missing values preprocessor (MVP). doi:10.1002/2211-5463.12247

  • We considered the key identifier in Mass spectrometry (MS) data as the implementation core for the missing values preprocessor (MVP) open-source platform

  • We assessed the performance of the MVP software (Computational Systems Biology Lab., School of Electrical Engineering and Computer Science (EECS), Gwangju Institute of Science and Technology (GIST), Gwangju, Korea) via quantitative and qualitative analyses

Read more

Summary

METHOD

Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species These data often contain unexpected duplicate records and missing values due to technical or biological factors. MVP uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the MVP process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and MVP-applied data This analysis showed that using MVP reduces problems associated with duplicate records and missing values.

Materials and methods
Results
C Negative ion mode original data
B Negative ion mode data
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.