How to count unique values in a Pandas Groupby object?



How to count unique values in a Pandas Groupby object?

Groupby as the name suggests groups attributes on the basis of similarity in some value. We can count the unique values in pandas Groupby object using groupby(), agg(), and reset_index() method. This article depicts how the count of unique values of some attribute in a data frame can be retrieved using pandas.

Functions Used

  • groupby() – groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes.

Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Parameters :

  • by : mapping, function, str, or iterable
  • axis : int, default 0
  • level : If the axis is a MultiIndex (hierarchical), group by a particular level or levels
  • as_index : For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
  • sort : Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.
  • group_keys : When calling apply, add group keys to index to identify pieces
  • squeeze : Reduce the dimensionality of the return type if possible, otherwise return a consistent type

Returns : GroupBy object

  • agg() – agg() is used to pass a function or list of functions to be applied on a series or even each element of series separately. In the case of the list of functions, multiple results are returned by agg() method.

Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)

Parameters:

  • func : callable, string, dictionary, or list of string/callables. Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.
  • axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column. 1 or ‘columns’: apply function to each row.

Returns: Aggregated DataFrame

  • reset-index() – Pandas reset_index() is a method to reset index of a Data Frame. reset_index() method sets a list of integers ranging from 0 to length of data as index.

Syntax: DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”)

Parameters:

  • level: int, string or a list to select and remove passed column from index.
  • drop: Boolean value, Adds the replaced index column to the data if False.
  • inplace: Boolean value, make changes in the original data frame itself if True.
  • col_level: Select in which column level to insert the labels.
  • col_fill: Object, to determine how the other levels are named.

Return type: DataFrame

Approach:

  • Import libraries
  • Make data
  • Group Data
  • Use aggregate function
  • Reset Index
  • Print Data

Example 1:

 

 

# import pandas
import pandas as pd
 
# create dataframe
df = pd.DataFrame({'Col_1': ['a', 'b', 'c', 'b', 'a', 'd'],
                   'Col_2': [1, 2, 3, 3, 2, 1]})
 
# print original dataframe
print("original dataframe:")
display(df)
 
 
# call groupby method.
df = df.groupby("Col_1")
 
# call agg method
df = df.agg({"Col_2": "nunique"})
 
# call reset_index method
df = df.reset_index()
 
# print dataframe
print("final dataframe:")
display(df)

Output:

Example 2:

# import pandas
import pandas as pd
 
# create dataframe
df = pd.DataFrame({'Col_1': ['a', 'b', 'c', 'b', 'a', 'd'],
                   'Col_2': [1, 2, 3, 3, 2, 1]})
 
# print original dataframe
print("original dataframe:")
display(df)
 
 
# call groupby method.
df = df.groupby("Col_2")
 
# call agg method
df = df.agg({"Col_1": "nunique"})
 
# call reset_index method
df = df.reset_index()
 
# print dataframe
print("final data frame:")
display(df)

Output:

 

Last Updated on October 23, 2021 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs