Benchmark Collection Research Articles

A critical aspect of dimensionality reduction is to assess the quality of selected (or produced) feature subsets properly. Feature subset assessment in machine learning refers to split a given feature subset into a training set, which is used to estimate the parameters of a classification model, and a test set used to estimate the predictive performance of the model. Then, averaging the results of multiple splitting (i.e., Cross-Validation, CV) is commonly used to decrease the variance of the estimator. But in practice, CV scheme is very computationally expensive. In this paper, we propose a new statistics index method called LW-index for evaluation of feature subset and dimensionality reduction algorithms in general. The proposed method is a type of “classical statistics” approach that uses the feature subset to compute an empirical estimate of the quality of feature subset. A large number of performance comparisons with the machine learning approach conducted on fourteen benchmark collections show that the proposed LW-index is highly correlated with the external indices (i.e., MacroF1, MicroF1) of SVM and Centroid-Based Classifier (CBC) trained by five-fold CV scheme. Furthermore, the experimental results indicate that LW-index has the same performance as the traditional CV scheme for evaluating the dimensionality reduction algorithms and it is more efficient than the traditional methodology. Therefore, one contribution of this paper is to present an alternative methodology, based on an internal index typically used in the unsupervised learning context, that is computationally cheaper than the traditional CV methodology. Another contribution is to propose a new internal index that behaves better than other similar indices widely used in clustering and shows high correlation with the results obtained by the traditional methodology.

Read full abstract

The simulations have always been of a great importance as they substitute the real processes when those are too expensive to be performed or impossible to be observed. The latter case is typical for optical microscopy. Here, we observe fixed or living cells under assumption, that the optical system and the attached electronic acquisition device do not affect the quality of the original specimen too much. Even though we can measure the most of optical aberrations and estimate the dominant sources of noise that together cause the final observed image to be blurred and noisy, we are still not able to reveal the original unaffected image how it would appear without any damage. Although several deconvolution methods are capable of inverting this degradation process, they can improve the quality of the data only to some extent. As there exists no exact knowledge, how the microscopic specimens look like, it is very difficult to evaluate the quality of new emerging segmentation and tracking algorithms that are of a great importance in medicine and biology. The same issue arises when one wants to tune-up their parameters. In the past, the only available quality measurement of the algorithms was an expert’s knowledge. The expert either classified the results of selected algorithm or provided an annotation of some real image dataset that was further used for evaluation purposes. Both ways, however, suffer from two main issues. First, the expert’s evaluation is nondeterministic. Second, for higher dimensional data (sequences of 2D or 3D images) the handmade annotation is impractical or even impossible. For this reason, the synthetic data, naturally accompanied by their ground truth, have started to appear. In the very beginning, only the basic geometric shapes like spheres or disc without any texture representing the internal structure of the observed cells were used. Since the late 90s, computer generated images have started to be more complex as the computer capabilities rose and allowed for calculations that required higher performance and extensive memory and disk usage. Namely, in the last 10 years, several simulation frameworks able to gener- ate for example cells with detailed description of subcellular components, large cell populations, and time-lapse image sequences emerged.

Read full abstract

Benchmark Collection Research Articles

Related Topics

Articles published on Benchmark Collection

A new validity index of feature subset for evaluating the dimensionality reduction algorithms

Constrained Laplacian biogeography-based optimization algorithm

Using Copac data to benchmark collections

A collection of challenging motion segmentation benchmark datasets

Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions.

Overlaid Arrow Detection for Labeling Regions of Interest in Biomedical Images

A survey of direct methods for sparse linear systems

The first reactive synthesis competition (SYNTCOMP 2014)

Edge map analysis in chest X-rays for automatic pulmonary abnormality screening.

Parallelism analysis: Precise WCET values for complex multi-core systems

Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing

Vectorization of apply to reduce interpretation overhead of R

Performance of Laplacian Biogeography-Based Optimization Algorithm on CEC 2014 continuous optimization benchmarks and camera calibration problem

Chaim Potok: Confronting Modernity through the Lens of Tradition

A robust semi-supervised learning approach via mixture of label information

A cross-benchmark comparison of 87 learning to rank methods

Synthesis of circular compositional program proofs via abduction

An exact algorithm for sparse matrix bipartitioning

Quality improvement and patient safety organizations in anesthesiology.

Next step toward the automation of screening for cervical cancer.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Benchmark Collection Research Articles

Related Topics

Articles published on Benchmark Collection

A new validity index of feature subset for evaluating the dimensionality reduction algorithms

Constrained Laplacian biogeography-based optimization algorithm

Using Copac data to benchmark collections

A collection of challenging motion segmentation benchmark datasets

Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions.

Overlaid Arrow Detection for Labeling Regions of Interest in Biomedical Images

A survey of direct methods for sparse linear systems

The first reactive synthesis competition (SYNTCOMP 2014)

Edge map analysis in chest X-rays for automatic pulmonary abnormality screening.

Parallelism analysis: Precise WCET values for complex multi-core systems

Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing

Vectorization of apply to reduce interpretation overhead of R

Performance of Laplacian Biogeography-Based Optimization Algorithm on CEC 2014 continuous optimization benchmarks and camera calibration problem

Chaim Potok: Confronting Modernity through the Lens of Tradition

A robust semi-supervised learning approach via mixture of label information

A cross-benchmark comparison of 87 learning to rank methods

Synthesis of circular compositional program proofs via abduction

An exact algorithm for sparse matrix bipartitioning

Quality improvement and patient safety organizations in anesthesiology.

Next step toward the automation of screening for cervical cancer.