Cluster-based Classification Research Articles

The environmental purpose is to characterize watersheds in a region regarding vulnerability and resiliency relative to present and potential degradation of water quality due to human impact based on available spatial information and multidisciplinary expertise. Available information is of six general types as (1) physical and topographic conformation, (2) soil factors, (3) climatic factors, (4) hydrologic characteristics, (5) land-cover/land-use, and (6) prior records of sampling at selected locations for water quality and biological indicators. The strategy is first to develop cluster-based classes of watersheds that are expected to have similar responses to anthropogenic stressors, without using indicators of landscape condition that are directly influenced by local human activity. Watersheds in these classes can then be analyzed for degree of human influence as indicated by land-cover/land-use demographics. More sparse data on water quality and biological indicators at stream sampling locations provide a basis for determining the degradation response to human-induced stressors in each class along with potential for remediation. Focus in this paper is on the first task of cluster-based classification. Statistical adaptation comes in combining empirical objectivity of clustering with interdisciplinary environmental expertise, such that the trajectory of investigation arises from team expertise while the formulation is shaped statistically. Expertise enters initially in recognizing subsets of available descriptors that characterize different aspects of the watershed context needing to be explored separately rather than being completely confounded. Reduction of redundancy among available descriptors and removal of outliers are preliminary concerns. Clustering then proceeds through a series of phases using the sets of variables individually and in selected combinations. Contingency of composite clustering relative to separately clustered sets is examined via special cross tabulations in order to elucidate interactions between sets of variables. The spatial nature of the investigation contributes the major contextual capability for exercising team expertise through visualization using geographic information systems (GIS) that enhances and integrates insights from clustering, particularly with regard to spatial distribution of cluster membership.

Read full abstract

Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer-related cellular processes. Gene expression data is also expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical samples: The first set consists of 2,000 genes, measured in 62 epithelial colon samples (Alon et al., 1999). The second consists of approximately equal to 100,000 clones, measured in 32 ovarian samples (unpublished extension of data set described in Schummer et al. (1999)). The third set consists of approximately equal to 7,100 genes, measured in 72 bone marrow and peripheral blood samples (Golub et al, 1999). We examine the use of scoring methods, measuring separation of tissue type (e.g., tumors from normals) using individual gene expression levels. These are then coupled with high-dimensional classification methods to assess the classification power of complete expression profiles. We present results of performing leave-one-out cross validation (LOOCV) experiments on the three data sets, employing nearest neighbor classifier, SVM (Cortes and Vapnik, 1995), AdaBoost (Freund and Schapire, 1997) and a novel clustering-based classification technique. As tumor samples can differ from normal samples in their cell-type composition, we also perform LOOCV experiments using appropriately modified sets of genes, attempting to eliminate the resulting bias. We demonstrate success rate of at least 90% in tumor versus normal classification, using sets of selected genes, with, as well as without, cellular-contamination-related members. These results are insensitive to the exact selection mechanism, over a certain range.

Read full abstract

Cluster-based Classification Research Articles

Related Topics

Articles published on Cluster-based Classification

Recognition of Hand Written Kannada Numerals Using K-Medoids

MULTICLASS CLASSIFICATION BASED ON META PROBABILITY CODES

Robust formulations for clustering-based large-scale classification

Operative assessment of predicted generalization errors on non-stationary distributions in data-intensive applications

A linguistic approach to classification of bacterial genomes

On-line evolving image classifiers and their application to surface inspection

A fuzzy decision tree method for fault classification in the steam generator of a pressurized water reactor

Cluster-based classification using self-organising maps for medical image databases

Development and validation of a cluster-based classification system to facilitate treatment tailoring

Consensus Clustering of Gene Expression Data and its Application to Gene Function Prediction

Squeezing the last drop: Cluster-based classification algorithm

An incremental cluster-based approach to spam filtering

Contextual clustering for configuring collaborative conservation of watersheds in the Mid-Atlantic Highlands

Tissue classification with gene expression profiles.

SOORLS: A SOFTWARE REUSE APPROACH ON THE WEB

Coarse classification of Chinese characters via stroke clustering method

An examination of cluster-based classification schemes for DUI offenders.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Cluster-based Classification Research Articles

Related Topics

Articles published on Cluster-based Classification

Recognition of Hand Written Kannada Numerals Using K-Medoids

MULTICLASS CLASSIFICATION BASED ON META PROBABILITY CODES

Robust formulations for clustering-based large-scale classification

Operative assessment of predicted generalization errors on non-stationary distributions in data-intensive applications

A linguistic approach to classification of bacterial genomes

On-line evolving image classifiers and their application to surface inspection

A fuzzy decision tree method for fault classification in the steam generator of a pressurized water reactor

Cluster-based classification using self-organising maps for medical image databases

Development and validation of a cluster-based classification system to facilitate treatment tailoring

Consensus Clustering of Gene Expression Data and its Application to Gene Function Prediction

Squeezing the last drop: Cluster-based classification algorithm

An incremental cluster-based approach to spam filtering

Contextual clustering for configuring collaborative conservation of watersheds in the Mid-Atlantic Highlands

Tissue classification with gene expression profiles.

SOORLS: A SOFTWARE REUSE APPROACH ON THE WEB

Coarse classification of Chinese characters via stroke clustering method

An examination of cluster-based classification schemes for DUI offenders.