Abstract

Omics studies attempt to extract meaningful messages from large-scale and high-dimensional data sets by treating the data sets as a whole. The concept of treating data sets as a whole is important in every step of the data-handling procedures: the pre-processing step of data records, the step of statistical analyses and machine learning, translation of the outputs into human natural perceptions, and acceptance of the messages with uncertainty. In the pre-processing, the method by which to control the data quality and batch effects are discussed. For the main analyses, the approaches are divided into two types and their basic concepts are discussed. The first type is the evaluation of many items individually, followed by interpretation of individual items in the context of multiple testing and combination. The second type is the extraction of fewer important aspects from the whole data records. The outputs of the main analyses are translated into natural languages with techniques, such as annotation and ontology. The other technique for making the outputs perceptible is visualization. At the end of this review, one of the most important issues in the interpretation of omics data analyses is discussed. Omics studies have a large amount of information in their data sets, and every approach reveals only a very restricted aspect of the whole data sets. The understandable messages from these studies have unavoidable uncertainty.

Highlights

  • In omics studies, a particular type of molecule in samples is measured in terms of character and quantity as a whole, and the patterns and/or relation to the sample attributes are investigated

  • Unlike smallscale experiments, omics requires (v) checking the distribution of all data values and of their quality measures, as well as the consideration of batch effects [5], so that the records that could be considered to have poorer quality are included in the analyses with probabilistic interpretation

  • Once the pre-processing step has been completed, the processed data should be handed to the main analysis step

Read more

Summary

Omics studies

A particular type of molecule in samples is measured in terms of character and quantity as a whole, and the patterns and/or relation to the sample attributes are investigated. When two NGS are run for two DNA samples, the first set of reads of the NGS may tend to be better than the second set of reads, for example, because the DNA sample conditions are different This is referred to as inter-experimental heterogeneity (Fig. 1b). One omics experimental procedure corresponds to a large number of single experiments conducted simultaneously and the quality among the experiments can vary (Fig. 1a) This is referred to as intra-experimental quality heterogeneity. A set of data records from one omics experimental procedure is affected by factors shared by all of the records, and another set of data records from another omics procedure are affected differently (Fig. 1b) This is referred to as interexperimental heterogeneity and the procedure-dependent batch effect. Two issues are discussed regarding quality measures: [1] absolute quality vs. relative quality and [2] whether noisiness of quality can be the target of study

Absolute and relative quality
Noisiness can be the target of biological studies
Intensity of interests in the heterogeneity varies among studies
Main analyses
Collapse of the whole data set or dimension reduction
Classification of methods from statistics and learning attitudes
Statistical test
Statistical estimation and machine learning of a predictive model
Descriptive statistics and unsupervised learning to identify patterns
Bayesian approach
Translation of the outputs into perceptible forms
Annotation and ontology
Linear methods
Graph or network visualization
How to view paintings of data analysis outputs
Many methods and many tools
Compliance with ethical standards
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.