 # One Hot Encoding to treat Categorical data parameters

## ML | One Hot Encoding to treat Categorical data parameters

Sometimes in datasets, we encounter columns that contain categorical features (string values) for example parameter Gender will have categorical parameters like MaleFemale. These labels have no specific order of preference and also since the data is string labels, the machine learning model can not work on such data.

One approach to solve this problem can be label encoding where we will assign a numerical value to these labels for example Male and Female mapped to 0 and 1. But this can add bias in our model as it will start giving higher preference to the Female parameter as 1>0 and ideally both labels are equally important in the dataset. To deal with this issue we will use One Hot Encoding technique.

## One Hot Encoding:

In this technique, we each of the categorical parameters, it will prepare separate columns for both Male and Female label. SO, whenever there is Male in Gender, it will 1 in Male column and 0 in Female column and vice-versa.

Let’s understand with an example:

Consider the data where fruits and their corresponding categorical value and prices are given.

Fruit Categorical value of fruit Price
apple 1 5
mango 2 10
apple 1 15
orange 3 20

The output after one hot encoding the data is given as follows,

apple mango orange price
1 0 0 5
0 1 0 10
1 0 0 15
0 0 1 20

Code: Python code implementation of One-Hot Encoding Technique

 `# Program for demonstration of one hot encoding` ` ` `# import libraries` `import` `numpy as np` `import` `pandas as pd` ` ` `# import the data required` `data ``=` `pd.read_csv(``"employee_data.csv"``)` `print``(data.head())`

Output: Checking for the labels in the categorical parameters

 `print``(data[``'Gender'``].unique())` `print``(data[``'Remarks'``].unique())`

Output:

```array(['Male', 'Female'], dtype=object)
array(['Nice', 'Good', 'Great'], dtype=object)
```

Checking for the label counts in the categorical parameters

 `data[``'Gender'``].value_counts()` `data[``'Remarks'``].value_counts()`

Output:

```Female    7
Male      5
Name: Gender, dtype: int64

Nice     5
Great    4
Good     3
Name: Remarks, dtype: int64
```

One-Hot encoding the categorical parameters using get_dummies()

 `one_hot_encoded_data ``=` `pd.get_dummies(data, columns ``=` `[``'Remarks'``, ``'Gender'``])` `print``(one_hot_encoded_data)`

Output: We can observe that we have 3 Remarks and 2 Gender columns in the data. However, you can just use n-1 columns to define parameters if it has n unique labels. For example if we only keep Gender_Female column and drop Gender_Male column, then also we can convey the entire information as when label is 1, it means female and when label is 0 it means male. This way we can encode the categorical data and reduce the number of parameters as well.

Last Updated on October 29, 2021 by admin

## Filter Pandas Dataframe with multiple conditionsFilter Pandas Dataframe with multiple conditions

Filter Pandas Dataframe with multiple conditions In this article, let’s discuss how to filter pandas dataframe with multiple conditions. There are possibilities of filtering data from Pandas dataframe with multiple conditions during the entire software development. The reason is dataframe

## Pandas Series.valuesPandas Series.values

Python | Pandas Series.values Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas series is a One-dimensional

## Fillna in multiple columns in place in Python PandasFillna in multiple columns in place in Python Pandas

Fillna in multiple columns in place in Python Pandas In this article, we are going to write python script to fill multiple columns in place in Python using pandas library. A data frame is a 2D data structure that can

## Python | Pandas Dataframe.at[ ]Python | Pandas Dataframe.at[ ]

Python | Pandas Dataframe.at[ ] Pandas at[] is used to return data in a dataframe at the passed location. The passed location is in the format [position, Column Name]. This method works in a similar way to Pandas loc[ ] but at[

## Pandas dataframe.assign()Pandas dataframe.assign()

Pandas dataframe.assign()   Dataframe.assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones. Existing columns that are re-assigned will be overwritten. Length of newly assigned column must

## Pandas dataframe.skew()Pandas dataframe.skew()

Pandas dataframe.skew()   Pandas dataframe.skew() function return unbiased skew over requested axis Normalized by N-1. Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. For more information on skewness, refer

## Pandas remove rows with special charactersPandas remove rows with special characters

Pandas remove rows with special characters In this article we will learn how to remove the rows with special characters i.e; if a row contains any value which contains special characters like @, %, &, \$, #, +, -, *,

## How to append a list as a row to a Pandas DataFrame in Python?How to append a list as a row to a Pandas DataFrame in Python?

How to append a list as a row to a Pandas DataFrame in Python? In this article, We are going to see how to append a list as a row to a pandas dataframe in Python. It can be done