Chapter 4 - Unsupervised machine learning: clustering algorithms

Hoss Belyadi,Alireza Haghighat

doi:10.1016/b978-0-12-821929-4.00002-0

Abstract

In this chapter, various unsupervised machine learning algorithms such as k-means clustering, hierarchical clustering, density-based spatial clustering of applications with noise (DBSCAN), isolation forest, and local outlier factor (LOF) are discussed. This chapter starts with a step-by-step explanation of how k-means clustering functions, the math behind the clustering technique, and the step-by-step codes to apply k-means clustering to a geologic data set using scikit-learn in Python. Determining the number of clusters when using k-means clustering using the elbow point is also discussed and illustrated in Python. Another method for determining the effectiveness of the clusters called “silhouette coefficient” is also discussed and applied in Python. Next, the chapter describes the step-by-step process of creating a dendrogram and applying hierarchical clustering to an oil and gas–related data set. Thereafter, an application of DBSCAN for a geomechanical well log is illustrated using the scikit-learn package. Finally, various outlier detection techniques such as isolation forest and local outlier factor along with their step-by-step codes using the scikit-learn library are discussed with an oil and gas–related example problem.

Full Text