Outlier Probability Research Articles

The LAMOST spectral analysis pipeline, called the 1D pipeline, aims to classify and measure the spectra observed in the LAMOST survey. Through this pipeline, the observed stellar spectra are classified into different subclasses by matching with template spectra. Consequently, the performance of the stellar classification greatly depends on the quality of the template spectra. In this paper, we construct a new LAMOST stellar spectral classification template library, which is supposed to improve the precision and credibility of the present LAMOST stellar classification. About one million spectra are selected from LAMOST Data Release One to construct the new stellar templates, and they are gathered in 233 groups by two criteria: (1) pseudo g – r colors obtained by convolving the LAMOST spectra with the Sloan Digital Sky Survey ugriz filter response curve, and (2) the stellar subclass given by the LAMOST pipeline. In each group, the template spectra are constructed using three steps. (1) Outliers are excluded using the Local Outlier Probabilities algorithm, and then the principal component analysis method is applied to the remaining spectra of each group. About 5% of the one million spectra are ruled out as outliers. (2) All remaining spectra are reconstructed using the first principal components of each group. (3) The weighted average spectrum is used as the template spectrum in each group. Using the previous 3 steps, we initially obtain 216 stellar template spectra. We visually inspect all template spectra, and 29 spectra are abandoned due to low spectral quality. Furthermore, the MK classification for the remaining 187 template spectra is manually determined by comparing with 3 template libraries. Meanwhile, 10 template spectra whose subclass is difficult to determine are abandoned. Finally, we obtain a new template library containing 183 LAMOST template spectra with 61 different MK classes by combining it with the current library.

Recently there has been enhanced and advanced biomedical technology such as high-throughput microarrays and molecular imaging to monitor SNPs, gene and protein expressions, and so forth, to provide exhaustive situations for individuals. In principle we could get much more information to know the biological and medical status from such data sets, which are viewed as biomarkers in a wide sense to help to do identification, association, and prediction studies for phenotypes such as cancer subtypes, prognosis, treatment responsiveness, and adverse reactions for personalized medicine. However, it is frequently difficult to extract only an informative part in the hull of data sets which include a garbage heap of uninformative observations. In particular, we cannot confirm statistical evidence because of the small sample size relative to the dimension of the data when we achieve the practical use of such statistical applications. For example, if we implement machine learning methods for the prediction of the treatment effects, then we typically face difficulties to confirm the reproducibility and robustness for the performance. In the meanwhile, effective study design is crucial for developing and validating biomarkers. In effect we present eight papers challenging this difficult situation of statistical analysis for biomarkers including statistical estimation, prediction, and testing hypothesis. The paper, “The number of candidate variants in exome sequencing for Mendelian Disease under no genetic heterogeneity,” by J. Nishino and S. Mano studies the expected number of candidate single nucleotide variants (SNVs) in exome data for autosomal dominant or recessive Mendelian disorders so that the filtering method is useful with an assumption of “no genetic heterogeneity.” The paper, “Radial basis function-sparse partial least squares for application to brain imaging data,” by H. Yoshida et al. proposes a novel statistical method, called radial basis function sparse partial least squares (RBF-sPLS), for investigating the relationship between clinical characteristics and brain morphology based on three-dimensional MRI data. The paper, “Power analysis of C-TDT for small sample size genome-wide association studies by the joint use of case-parent trios and pairs,” by F. Rajabli et al. addresses family-based genetic association studies with missing genotype information for one of the parents. The paper provides a simulation study for the disequilibrium test statistic across different sample sizes for a family-based genome-wide association study. The paper, “Genomic biomarkers for personalized medicine: development and validation in clinical studies” by S. Matsui addresses the two critical statistical analyses of high-dimensional genomic data, gene screening and prediction, in the framework of development and validation of genomic biomarkers or signatures with biomarker-based clinical trial designs to assess clinical utility of a biomarker or a new treatment with a companion biomarker. The paper, “Multiple suboptimal solutions for prediction rules in gene expression data,” by O. Komori et al. reanalyzes the gene expression data for pattern recognition of phenotypes to extract informative features embedded in the data. The paper concludes that it is not possible to extract only informative genes with high performance in the all observed genes and points out that the mutual coherence among genes is too high to find the solution of informative genes. The paper, “An empirical Bayes optimal discovery procedure based on semiparametric hierarchical mixture models,” by H. Noma and S. Matsui proposes an empirical Bayes optimal discovery procedure with the smoothing-by-roughening approach in semiparametric hierarchical mixture models. This approach provides analytically a tractable posterior distribution with the flexible prior section in genome-wide studies. The paper, “Cancer outlier analysis based on mixture modeling of gene expression data,” by K. Mori et al. proposes a gene-based statistic for gene selection on the basis of a posterior probability of cancer outlier for each cancer sample with an application to real data from hematologic malignancies. The paper, “A robust rerank approach for feature selection and its application to pooling-based GWA studies,” by J. Liu et al. proposes a method based on the concept of rank-over-variable which is shown to be insensitive to the selection of tuning parameters compared with t-statistics, AUC statistics, and SAM with simulation studies and real data analysis of pooling-based genome-wide association (GWA) study to demonstrate the usefulness of their method. Shinto Eguchi Shigeyuki Matsui Su-Yun Huang Chuhsing Kate Hsiao

Outlier Probability Research Articles

Articles published on Outlier Probability

Application of the Variable Precision Rough Sets Model to Estimate the Outlier Probability of Each Element

Testing the Presence of Outliers to Assess Misspecification in Regression Models

Large-deviation probabilities for correlated Gaussian processes and intermittent dynamical systems.

Robust AOA based acoustic source localization method with unreliable measurements

Improving Lithium-Ion Battery Pack Diagnostics by Optimizing the Internal Allocation of Demand Current for Parameter Identifiability

Terrestrial Mollusks as Chronological Records in Brazilian Shellmounds

External analysis and moving window LoOP based monitoring for a class of multi-mode timevarying process

An incremental and approximate local outlier probability algorithm for intrusion detection and its evaluation

Outlier detection and robust regression for correlated data

FAST PROBABILISTIC FUSION OF 3D POINT CLOUDS VIA OCCUPANCY GRIDS FOR SCENE CLASSIFICATION

FAST PROBABILISTIC FUSION OF 3D POINT CLOUDS VIA OCCUPANCY GRIDS FOR SCENE CLASSIFICATION

VFA: A Variable-Factor Authentication Framework for Mobile Users

Neighborhood based global coordination for multimode process monitoring

Adaptive Local Outlier Probability for Dynamic Process Monitoring

Novel Outlier-Resistant Extended Kalman Filter for Robust Online Structural Identification

Continuous Outlier Monitoring on Uncertain Data Streams

ON THE CONSTRUCTION OF A NEW STELLAR CLASSIFICATION TEMPLATE LIBRARY FOR THE LAMOST SPECTRAL ANALYSIS PIPELINE

A generalised approach to outlier identification in pavement condition data

Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data

Statistical Analysis of Biomarkers for Personalized Medicine

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Outlier Probability Research Articles

Articles published on Outlier Probability

Application of the Variable Precision Rough Sets Model to Estimate the Outlier Probability of Each Element

Testing the Presence of Outliers to Assess Misspecification in Regression Models

Large-deviation probabilities for correlated Gaussian processes and intermittent dynamical systems.

Robust AOA based acoustic source localization method with unreliable measurements

Improving Lithium-Ion Battery Pack Diagnostics by Optimizing the Internal Allocation of Demand Current for Parameter Identifiability

Terrestrial Mollusks as Chronological Records in Brazilian Shellmounds

External analysis and moving window LoOP based monitoring for a class of multi-mode timevarying process

An incremental and approximate local outlier probability algorithm for intrusion detection and its evaluation

Outlier detection and robust regression for correlated data

FAST PROBABILISTIC FUSION OF 3D POINT CLOUDS VIA OCCUPANCY GRIDS FOR SCENE CLASSIFICATION

FAST PROBABILISTIC FUSION OF 3D POINT CLOUDS VIA OCCUPANCY GRIDS FOR SCENE CLASSIFICATION

VFA: A Variable-Factor Authentication Framework for Mobile Users

Neighborhood based global coordination for multimode process monitoring

Adaptive Local Outlier Probability for Dynamic Process Monitoring

Novel Outlier-Resistant Extended Kalman Filter for Robust Online Structural Identification

Continuous Outlier Monitoring on Uncertain Data Streams

ON THE CONSTRUCTION OF A NEW STELLAR CLASSIFICATION TEMPLATE LIBRARY FOR THE LAMOST SPECTRAL ANALYSIS PIPELINE

A generalised approach to outlier identification in pavement condition data

Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data

Statistical Analysis of Biomarkers for Personalized Medicine