 # Elbow Method for optimal value of k in KMeans

## Elbow Method for optimal value of k in KMeans

A fundamental step for any unsupervised algorithm is to determine the optimal number of clusters into which the data may be clustered. The Elbow Method is one of the most popular methods to determine this optimal value of k.
We now demonstrate the given method using the K-Means clustering technique using the Sklearn library of python.

Step 1: Importing the required libraries

 `from` `sklearn.cluster ``import` `KMeans` `from` `sklearn ``import` `metrics` `from` `scipy.spatial.distance ``import` `cdist` `import` `numpy as np` `import` `matplotlib.pyplot as plt`

Step 2: Creating and Visualizing the data

 `# Creating the data` `x1 ``=` `np.array([``3``, ``1``, ``1``, ``2``, ``1``, ``6``, ``6``, ``6``, ``5``, ``6``, ``7``, ``8``, ``9``, ``8``, ``9``, ``9``, ``8``])` `x2 ``=` `np.array([``5``, ``4``, ``5``, ``6``, ``5``, ``8``, ``6``, ``7``, ``6``, ``7``, ``1``, ``2``, ``1``, ``2``, ``3``, ``2``, ``3``])` `X ``=` `np.array(``list``(``zip``(x1, x2))).reshape(``len``(x1), ``2``)` `# Visualizing the data` `plt.plot()` `plt.xlim([``0``, ``10``])` `plt.ylim([``0``, ``10``])` `plt.title(``'Dataset'``)` `plt.scatter(x1, x2)` `plt.show()` From the above visualization, we can see that the optimal number of clusters should be around 3. But visualizing the data alone cannot always give the right answer. Hence we demonstrate the following steps.
We now define the following:-

1. Distortion: It is calculated as the average of the squared distances from the cluster centers of the respective clusters. Typically, the Euclidean distance metric is used.
2. Inertia: It is the sum of squared distances of samples to their closest cluster center.

We iterate the values of k from 1 to 9 and calculate the values of distortions for each value of k and calculate the distortion and inertia for each value of k in the given range.

Step 3: Building the clustering model and calculating the values of the Distortion and Inertia:

 `distortions ``=` `[]` `inertias ``=` `[]` `mapping1 ``=` `{}` `mapping2 ``=` `{}` `K ``=` `range``(``1``, ``10``)` `for` `k ``in` `K:` `    ``# Building and fitting the model` `    ``kmeanModel ``=` `KMeans(n_clusters``=``k).fit(X)` `    ``kmeanModel.fit(X)` `    ``distortions.append(``sum``(np.``min``(cdist(X, kmeanModel.cluster_centers_,` `                                        ``'euclidean'``), axis``=``1``)) ``/` `X.shape[``0``])` `    ``inertias.append(kmeanModel.inertia_)` `    ``mapping1[k] ``=` `sum``(np.``min``(cdist(X, kmeanModel.cluster_centers_,` `                                   ``'euclidean'``), axis``=``1``)) ``/` `X.shape[``0``]` `    ``mapping2[k] ``=` `kmeanModel.inertia_`

Step 4: Tabulating and Visualizing the results
a) Using the different values of Distortion:

 `for` `key, val ``in` `mapping1.items():` `    ``print``(f``'{key} : {val}'``)` `plt.plot(K, distortions, ``'bx-'``)` `plt.xlabel(``'Values of K'``)` `plt.ylabel(``'Distortion'``)` `plt.title(``'The Elbow Method using Distortion'``)` `plt.show()` b) Using the different values of Inertia:

 `for` `key, val ``in` `mapping2.items():` `    ``print``(f``'{key} : {val}'``)` `plt.plot(K, inertias, ``'bx-'``)` `plt.xlabel(``'Values of K'``)` `plt.ylabel(``'Inertia'``)` `plt.title(``'The Elbow Method using Inertia'``)` `plt.show()` To determine the optimal number of clusters, we have to select the value of k at the “elbow” ie the point after which the distortion/inertia start decreasing in a linear fashion. Thus for the given data, we conclude that the optimal number of clusters for the data is 3.
The clustered data points for different value of k:-
1. k = 1 2. k = 2 3. k = 3 4. k = 4 Last Updated on October 28, 2021 by admin

## Get the data type of column in Pandas – PythonGet the data type of column in Pandas – Python

Get the data type of column in Pandas – Python Let’s see how to get data types of columns in the pandas dataframe. First, Let’s create a pandas dataframe. Example: # importing pandas library import pandas as pd   # List

## Python | Pandas Series/Dataframe.any()Python | Pandas Series/Dataframe.any()

Python | Pandas Series/Dataframe.any() Pandas any() method is applicable both on Series and Dataframe. It checks whether any value in the caller object (Dataframe or series) is not 0 and returns True for that. If all values are 0, it will return False. Syntax: DataFrame.any(axis=0, bool_only=None,

## How to append a list as a row to a Pandas DataFrame in Python?How to append a list as a row to a Pandas DataFrame in Python?

How to append a list as a row to a Pandas DataFrame in Python? In this article, We are going to see how to append a list as a row to a pandas dataframe in Python. It can be done

## Pandas Series.plot() methodPandas Series.plot() method

Python | Pandas Series.plot() method With the help of Series.plot() method, we can get the plot of pandas series by using Series.plot() method. Syntax : Series.plot() Return : Return the plot of series. Example #1 : In this example we can see that by using Series.plot() method, we

## How to select the rows of a dataframe using the indices of another dataframe?How to select the rows of a dataframe using the indices of another dataframe?

How to select the rows of a dataframe using the indices of another dataframe? Using Pandas module it is possible to select rows from a data frame using indices from another data frame. This article discusses that in detail. It

## Add Column to Pandas DataFrame with a Default ValueAdd Column to Pandas DataFrame with a Default Value

Add Column to Pandas DataFrame with a Default Value The three ways to add a column to Pandas DataFrame with Default Value. Using pandas.DataFrame.assign(**kwargs) Using [] operator Using pandas.DataFrame.insert() Using Pandas.DataFrame.assign(**kwargs) It Assigns new columns to a DataFrame and returns

## Python | pandas.to_numeric methodPython | pandas.to_numeric method

Python | pandas.to_numeric method Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. pandas.to_numeric() is one

## Select first or last N rows in a Dataframe using head() and tail() method in Python-PandasSelect first or last N rows in a Dataframe using head() and tail() method in Python-Pandas

Select first or last N rows in a Dataframe using head() and tail() method in Python-Pandas Let’s discuss how to select top or bottom N number of rows from a Dataframe using head() & tail() methods. 1) Select first N