How to count unique values in a Pandas Groupby object?
Groupby as the name suggests groups attributes on the basis of similarity in some value. We can count the unique values in pandas Groupby object using groupby(), agg(), and reset_index() method. This article depicts how the count of unique values of some attribute in a data frame can be retrieved using pandas.
Functions Used
- groupby() – groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes.
Syntax: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)
Parameters :
- by : mapping, function, str, or iterable
- axis : int, default 0
- level : If the axis is a MultiIndex (hierarchical), group by a particular level or levels
- as_index : For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
- sort : Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.
- group_keys : When calling apply, add group keys to index to identify pieces
- squeeze : Reduce the dimensionality of the return type if possible, otherwise return a consistent type
Returns : GroupBy object
- agg() – agg() is used to pass a function or list of functions to be applied on a series or even each element of series separately. In the case of the list of functions, multiple results are returned by agg() method.
Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)
Parameters:
- func : callable, string, dictionary, or list of string/callables. Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.
- axis : (default 0) {0 or ‘index’, 1 or ‘columns’} 0 or ‘index’: apply function to each column. 1 or ‘columns’: apply function to each row.
Returns: Aggregated DataFrame
- reset-index() – Pandas reset_index() is a method to reset index of a Data Frame. reset_index() method sets a list of integers ranging from 0 to length of data as index.
Syntax: DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill=”)
Parameters:
- level: int, string or a list to select and remove passed column from index.
- drop: Boolean value, Adds the replaced index column to the data if False.
- inplace: Boolean value, make changes in the original data frame itself if True.
- col_level: Select in which column level to insert the labels.
- col_fill: Object, to determine how the other levels are named.
Return type: DataFrame
Approach:
- Import libraries
- Make data
- Group Data
- Use aggregate function
- Reset Index
- Print Data
Example 1:
- Python
# import pandas import pandas as pd # create dataframe df = pd.DataFrame({ 'Col_1' : [ 'a' , 'b' , 'c' , 'b' , 'a' , 'd' ], 'Col_2' : [ 1 , 2 , 3 , 3 , 2 , 1 ]}) # print original dataframe print ( "original dataframe:" ) display(df) # call groupby method. df = df.groupby( "Col_1" ) # call agg method df = df.agg({ "Col_2" : "nunique" }) # call reset_index method df = df.reset_index() # print dataframe print ( "final dataframe:" ) display(df) |
Output:
Example 2:
- Python
# import pandas import pandas as pd # create dataframe df = pd.DataFrame({ 'Col_1' : [ 'a' , 'b' , 'c' , 'b' , 'a' , 'd' ], 'Col_2' : [ 1 , 2 , 3 , 3 , 2 , 1 ]}) # print original dataframe print ( "original dataframe:" ) display(df) # call groupby method. df = df.groupby( "Col_2" ) # call agg method df = df.agg({ "Col_1" : "nunique" }) # call reset_index method df = df.reset_index() # print dataframe print ( "final data frame:" ) display(df) |
Output:
Last Updated on October 23, 2021 by admin