Implementing Clustering with Weka and R

Parteek Bhatia

doi:10.1017/9781108635592.009

Abstract

Chapter Objectives ✓ To apply the K-means algorithm in Weka and R language ✓ To interpret the results of clustering ✓ To identify the optimum number of clusters ✓ To apply classification on un-labeled data by using clustering as an intermediate step Introduction As discussed earlier, if data is not labeled then we can analyze this data by performing a clustering analysis, where clustering refers to the task of grouping a set of objects into classes of similar objects. In this chapter, we will apply clustering on Fisher's Iris dataset. We will use clustering algorithms to group flower samples into clusters with similar flower dimensions. These clusters then become possible ways to group flowers samples into species. We will implement a simple k-means algorithm to cluster numerical attributes with the help of Weka and R. In the case of classification, we know the attributes and classes of instances. For example, the flower dimensions and classes were already known to us for the Iris dataset. Our goal was to predict the class of an unknown sample as shown in Figure 8.1. Earlier, we used the Weka J48 classification algorithm to build a decision tree on Fisher's Iris dataset using samples with known class, which helped in predicting the class of unknown samples. We used the flower's Sepal length and width, and the Petal length and width as the specific attributes for this. Based on flower dimensions and using this tree, we can identify an unknown Iris as one of three species, Setosa, Versicolor, and Virginica. In clustering, we know the attributes for the instances, but we don't know the classes. For example, we know the flower dimensions for samples of the Iris dataset but we don't know what classes exist as shown in Figure 8.2. Therefore, our goal is to group instances into clusters with similar attributes or dimensions and then identify the class. In this chapter, we will learn what happens if we don't know what classes the samples belong to, or even how many classes there are, or even what defines a class? Since, Fisher's Iris dataset is already labeled, we will first make this dataset unlabeled by removing the class attribute, i.e., the species column. Then, we will apply clustering algorithms to cluster this data on the basis of its input attributes, i.e., Sepal length, Sepal width, Petal length, and Petal width.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implementing Clustering with Weka and R

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Classification of Flower Dataset using Machine Learning Models
Tina Gupta ... Puja Arora
-
Tina Gupta, et. al.Tina Gupta ... Puja Arora
09 Dec 2022
09 Dec 2022

Analysis of Naive Bayesian and Back Propagation algorithms in iris classification
Chengyang Yu
Applied and Computational Engineering | VOL. 37
Chengyang YuChengyang Yu
07 Feb 2024
Applied and Computational Engineering | VOL. 37

MORPHOLOGICAL AND CHEMICAL EVALUATION ON HYPERICUM PERFORATUM AND H. MACULATUM IN LITHUANIA
J Raduå¡Ienã© ... E Bagdonaité
Acta Horticulturae | VOL. -
J Raduå¡Ienã©, et. al.J Raduå¡Ienã© ... E Bagdonaité
01 Jan 2004
Acta Horticulturae | VOL. -

Spectral Clustering and Visualization: A novel Clustering of Fisher's Iris Data Set
David Benson-Putnins ... Margaret Bonfardin
SIAM Undergraduate Research Online | VOL. 4
David Benson-Putnins, et. al.David Benson-Putnins ... Margaret Bonfardin
01 Jan 2010
SIAM Undergraduate Research Online | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementing Clustering with Weka and R

Abstract

Talk to us

Similar Papers