# How to scale Pandas DataFrame columns ?

## How to scale Pandas DataFrame columns ?

When a dataset has values of different columns at drastically different scales, it gets tough to analyze the trends and patterns and comparison of the features or columns. So, in cases where all the columns have a significant difference in their scales, are needed to be modified in such a way that all those values fall into the same scale. This process is called Scaling.

There are two most common techniques of how to scale columns of Pandas dataframe – Min-Max Normalization and Standardization. Both of them have been discussed in the content below.

Dataset in Use: Iris

### Min-Max Normalization

Here, all the values are scaled in between the range of [0,1] where 0 is the minimum value and 1 is the maximum value. The formula for Min-Max Normalization is –

Method 1: Using Pandas and Numpy

The first way of doing this is by separately calculate the values required as given in the formula and then apply it to the dataset.

Example:

 import seaborn as sns import pandas as pd import numpy as np data = sns.load_dataset('iris') print('Original Dataset') data.head() # Min-Max Normalization df = data.drop('species', axis=1) df_norm = (df-df.min())/(df.max()-df.min()) df_norm = pd.concat((df_norm, data.species), 1) print("Scaled Dataset Using Pandas") df_norm.head()

Output:

Method 2: Using MinMaxScaler from sklearn

This is a straightforward method of doing the same. It just requires sklearn module to be imported.

Example:

 import seaborn as sns from sklearn.preprocessing import MinMaxScaler import pandas as pd data = sns.load_dataset('iris') print('Original Dataset') data.head() scaler = MinMaxScaler() df_scaled = scaler.fit_transform(df.to_numpy()) df_scaled = pd.DataFrame(df_scaled, columns=[   'sepal_length', 'sepal_width', 'petal_length', 'petal_width']) print("Scaled Dataset Using MinMaxScaler") df_scaled.head()

Output:

### Standardization

Standardization doesn’t have any fixed minimum or maximum value. Here, the values of all the columns are scaled in such a way that they all have a mean equal to 0 and standard deviation equal to 1. This scaling technique works well with outliers. Thus, this technique is preferred if outliers are present in the dataset.

Example:

 import pandas as pd from sklearn.preprocessing import StandardScaler import seaborn as sns data = sns.load_dataset('iris') print('Original Dataset') data.head() std_scaler = StandardScaler() df_scaled = std_scaler.fit_transform(df.to_numpy()) df_scaled = pd.DataFrame(df_scaled, columns=[   'sepal_length','sepal_width','petal_length','petal_width']) print("Scaled Dataset Using StandardScaler") df_scaled.head()

Output :

Last Updated on October 23, 2021 by admin

## How to display most frequent value in a Pandas series?How to display most frequent value in a Pandas series?

How to display most frequent value in a Pandas series? In this article, our basic task is to print the most frequent value in a series. We can find the number of occurrences of elements using the value_counts() method. From that the

## Get the index of maximum value in DataFrame columnGet the index of maximum value in DataFrame column

Get the index of maximum value in DataFrame column Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Let’s see how can we get the index of maximum value in DataFrame column. Observe this

## Difference between map, applymap and apply methods in PandasDifference between map, applymap and apply methods in Pandas

Difference between map, applymap and apply methods in Pandas Pandas library is extensively used for data manipulation and analysis. map(), applymap() and apply() methods are methods of Pandas library. applymap() method only works on a pandas dataframe where function is applied on every element individually.

## How to reverse the column order of the Pandas DataFrame?How to reverse the column order of the Pandas DataFrame?

How to reverse the column order of the Pandas DataFrame? Sometimes when working with DataFrames we might want to change or reverse the order of the column of the dataframe. In this article, let’s see how to reverse the order

## How to Pretty Print an Entire Pandas Series or DataFrame?How to Pretty Print an Entire Pandas Series or DataFrame?

How to Pretty Print an Entire Pandas Series or DataFrame? In this article, we are going to see how to Pretty Print entire pandas Series / Dataframe. There are 2 ways to Pretty Print entire pandas Series / Dataframe: Use

## How to select rows from a dataframe based on column values ?How to select rows from a dataframe based on column values ?

Select rows from a dataframe based on column values   The rows of a dataframe can be selected based on conditions as we do use the SQL queries. The various methods to achieve this is explained in this article with

## Pandas DataFrame.set_index()Pandas DataFrame.set_index()

Python Pandas DataFrame.set_index()   Pandas set_index() is a method to set a List, Series or Data frame as index of a Data Frame. Index column can be set while making a data frame too. But sometimes a data frame is

## Python | Pandas DatetimeIndex.datePython | Pandas DatetimeIndex.date

Pandas DatetimeIndex.date Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas DatetimeIndex.date attribute outputs an Index object containing the