How to Apply a function to single or selected columns or rows in Pandas Dataframe
Pandas is a powerful library in Python that is widely used for data manipulation and analysis. One of the common tasks in data analysis is applying a function to a subset of data, either a single column or row, or a selection of them. Pandas provides several ways to achieve this.
Syntax: Dataframe/series.apply(func, convert_dtype=True, args=())
Parameters: This method will take following parameters :
func: It takes a function and applies it to all values of pandas series.
convert_dtype: Convert dtype as per the function’s operation.
args=(): Additional arguments to pass to function instead of series.Return Type: Pandas Series after applied function/operation.
Using apply() method
The apply() method is used to apply a function along an axis of the DataFrame. By default, it applies the function to each column of the DataFrame. For instance, let’s say we have a DataFrame with two columns, ‘A’ and ‘B’, and we want to compute the sum of these columns. We can define a function that takes a DataFrame as input and returns the sum of its columns:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
def sum_columns(df):
return df.sum()
result = df.apply(sum_columns)
print(result)
This will output:
A B
0 6 15
By default, the apply() method applies the function to each column. We can apply the function to each row by specifying the axis parameter:
result = df.apply(sum_columns, axis=1)
print(result)
This will output:
0 5
1 7
2 9
Using applymap() method
The applymap() method is used to apply a function to each element of the DataFrame. For instance, let’s say we have a DataFrame with two columns, ‘A’ and ‘B’, and we want to compute the square of each element. We can define a function that takes a scalar value as input and returns its square:
def square(x):
return x ** 2
result = df.applymap(square)
print(result)
This will output:
A B
0 1 16
1 4 25
2 9 36
Using map() method
The map() method is used to replace each value in a column with another value. For instance, let’s say we have a DataFrame with a column ‘A’ that contains country codes, and we want to replace each country code with its full name. We can define a dictionary that maps each country code to its full name:
country_codes = {'US': 'United States', 'UK': 'United Kingdom', 'FR': 'France'}
df['A'] = df['A'].map(country_codes)
print(df)
Applying a Function to Selected Columns
Sometimes, you may only want to apply a function to specific columns of a dataframe. You can do this by selecting the columns of interest and passing them to the apply() method.
import pandas as pd # create a dataframe data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'income': [50000, 60000, 70000]} df = pd.DataFrame(data) # define a function to add a bonus to the income column def add_bonus(income): return income + 10000 # apply the function to the income column df['income'] = df['income'].apply(add_bonus) # print the updated dataframe print(df)
Output:
name age income 0 Alice 25 60000 1 Bob 30 70000 2 Charlie 35 80000
In this example, we defined a function called add_bonus() that adds a bonus of 10000 to the income column. We then passed only the income column to the apply() method and assigned the returned values to the same column.
Applying a Function to Selected Rows
You can also apply a function to specific rows of a dataframe by using the loc[] method to select the rows of interest.
import pandas as pd # create a dataframe data = {'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'income': [50000, 60000, 70000]} df = pd.DataFrame(data) # define a function to add a bonus to the income column def add_bonus(income): return income + 10000 # apply the function to the second row of the dataframe df.loc[1] = df.loc[1].apply(add_bonus) # print the updated dataframe print(df)
Output:
name age income 0 Alice 25 50000 1 Bob 30 70000 2 Charlie 35 70000
In this example, we applied the add_bonus() function only to the second row of the dataframe by using the loc[] method to select the row. We then assigned the returned values to the same row using the same loc[] method.
Apply custom function to a selected subset of columns
import pandas as pd # Create a sample dataframe df = pd.DataFrame({'Name': ['John', 'Mary', 'Mark', 'Emma'], 'Age': [28, 25, 32, 27], 'Salary': [50000, 60000, 75000, 45000], 'Bonus': [10000, 12000, 15000, 9000]}) # Define a custom function to add a percentage sign to values def add_percent(val): return str(val) + '%' # Apply the function to selected columns using the applymap method df[['Salary', 'Bonus']] = df[['Salary', 'Bonus']].applymap(add_percent) # Print the updated dataframe print(df)
Above given example creates a dataframe with columns for name, age, salary, and bonus. The custom function add_percent takes a value and returns a string with a percent sign appended. The function is applied to the ‘Salary’ and ‘Bonus’ columns using the applymap method. The resulting dataframe shows the updated values with the percent sign added.
Applyfunction to rows based on a conditional statement
import pandas as pd # Create a sample dataframe df = pd.DataFrame({'Name': ['John', 'Mary', 'Mark', 'Emma'], 'Age': [28, 25, 32, 27], 'Salary': [50000, 60000, 75000, 45000], 'Bonus': [10000, 12000, 15000, 9000]}) # Define a custom function to add a bonus to employees under age 30 def add_bonus(row): if row['Age'] < 30: row['Bonus'] += 5000 return row # Apply the function to rows using the apply method df = df.apply(add_bonus, axis=1) # Print the updated dataframe print(df)
Above example creates a dataframe with columns for name, age, salary, and bonus. The custom function add_bonus takes a row and adds a bonus of 5000 to employees who are under 30 years old. The function is applied to rows using the apply method with axis=1. The resulting dataframe shows the updated bonus values for employees under 30.
Last Updated on May 11, 2023 by admin