## Groupby without aggregation in Pandas

Pandas is a great python package for manipulating data and some of the tools which we learn as a beginner are an aggregation and group by functions of pandas.

**Groupby()**** **is a function used to split the data in dataframe into groups based on a given condition. **Aggregation **on other hand operates on series, data and returns a numerical summary of the data. There are a lot of aggregation functions as *count(),max(),min(),mean(),std(),describe()*. We can combine both functions to find multiple aggregations on a particular column.

Instead of using groupby aggregation together, we can perform groupby without aggregation which is applicable to aggregate data separately. We will see this with an example where we will take a breast cancer dataset with different numerical features like mean area, worst texture, and many more. The target column has 0 which means the cancer is benign and 1 means the cancer is malignant.

**Example 1:**

- Python3

`# importing python libraries and breast_cancer dataset from sklearn` `import` `numpy as np` `import` `pandas as pd` `from` `sklearn ` `import` `datasets` `from` `sklearn.datasets ` `import` `load_breast_cancer` `# data is loaded in a DataFrame` `cancer_data ` `=` `load_breast_cancer()` `df ` `=` `pd.DataFrame(cancer_data.data, columns` `=` `cancer_data.feature_names)` `df[` `'target'` `] ` `=` `pd.Series(cancer_data.target)` `df.head()` |

**Output:**

Thus, we can visualize the data which have all the columns, but all the columns are in numerical form and there are no categorical data instead of only the target column, so let’s have a look in target and another column named *‘worst texture’.*

- Python3

`print` `(df[` `'target'` `].describe(), df[` `'worst texture'` `].describe())` |

**Output:**

count 569.000000 mean 0.627417 std 0.483918 min 0.000000 25% 0.000000 50% 1.000000 75% 1.000000 max 1.000000 Name: target, dtype: float64 count 569.000000 mean 25.677223 std 6.146258 min 12.020000 25% 21.080000 50% 25.410000 75% 29.720000 max 49.540000 Name: worst texture, dtype: float64

Here we can see the summary of* target* and* worst texture* column, we take only these columns to understand the groupby aggregate functions better.

- Python3

`df1 ` `=` `df[[` `'worst texture'` `, ` `'worst area'` `, ` `'target'` `]]` `gr1 ` `=` `df1.groupby(df1[` `'target'` `]).mean()` `gr1` |

**Output:**

So here we see the mean of worst texture and worst area grouped around benign and malignant cancer, now the normal data has been interfered by this method, and we have to add them separately there’s why groupby without aggregation becomes handy.

- Python3

`# function to take the data as group and perform aggregation` `def` `meanofTargets(group1):` ` ` ` ` `wt ` `=` `group1[` `'worst texture'` `].agg(` `'mean'` `)` ` ` `wa ` `=` `group1[` `'worst area'` `].agg(` `'mean'` `)` ` ` `group1[` `'Mean worst texture'` `] ` `=` `wt` ` ` `group1[` `'Mean worst area'` `] ` `=` `wa` ` ` `return` `group1` `df2 ` `=` `df1.groupby(` `'target'` `).` `apply` `(meanofTargets)` `df2` |

**Output:**

Thus, in the above dataset, we are able to join the mean of the worst area and worst texture in a separate column, and we do it with groupby method of the target column where it grouped ‘1’s and 0’s separately.

**Example 2:**

Similarly, let’s see another example of using groupby without aggregation. But as there is no categorical column we will have to make a categorical column myself. For this, let’s choose mean area which have a max value of 2500 and min value of 150, so we will classify them into 6 groups of range 400 using the pandas cut method to convert continuous to categorical. As this doesn’t concern subject of the article refer to the GitHub repo here for more info.

Thus, we make a categorical column ‘Cat_mean_area’ and we can perform the groupby aggregation method here too. But instead of grouping the whole dataset we can use some specific columns like *mean area* and *target *only.

- Python3

`# dataframe df_3 to contain only mean_area,Cat_mean_area and target` `df_3 ` `=` `df_2[[` `'mean area'` `, ` `'Cat_mean_area'` `, ` `'target'` `]]` `# applying groupby sum` `gr2 ` `=` `df_3.groupby(df_2[` `'Cat_mean_area'` `]).` `sum` `()` `gr2` |

**Output:**

Thus, by the steps mentioned above, we perform groupby without aggregation.

- Python3

`# function to take the data as group and perform aggregation` `def` `totalTargets(group):` ` ` `g ` `=` `group[` `'target'` `].agg(` `'sum'` `)` ` ` `group[` `'Total_targets'` `] ` `=` `g` ` ` `return` `group` `df_4 ` `=` `df_3.groupby(df_3[` `'Cat_mean_area'` `]).` `apply` `(totalTargets)` `df_4` |

**Output:**

Last Updated on October 19, 2021 by admin