Apply a function to single or selected columns or rows in Pandas Dataframe



How to Apply a function to single or selected columns or rows in Pandas Dataframe

Pandas is a powerful library in Python that is widely used for data manipulation and analysis. One of the common tasks in data analysis is applying a function to a subset of data, either a single column or row, or a selection of them. Pandas provides several ways to achieve this.

Syntax: Dataframe/series.apply(func, convert_dtype=True, args=())

Parameters: This method will take following parameters :
func: It takes a function and applies it to all values of pandas series.
convert_dtype: Convert dtype as per the function’s operation.
args=(): Additional arguments to pass to function instead of series.

Return Type: Pandas Series after applied function/operation.

Using apply() method

The apply() method is used to apply a function along an axis of the DataFrame. By default, it applies the function to each column of the DataFrame. For instance, let’s say we have a DataFrame with two columns, ‘A’ and ‘B’, and we want to compute the sum of these columns. We can define a function that takes a DataFrame as input and returns the sum of its columns:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

def sum_columns(df):
return df.sum()

result = df.apply(sum_columns)

print(result)

This will output:

 A B
0 6 15

By default, the apply() method applies the function to each column. We can apply the function to each row by specifying the axis parameter:

result = df.apply(sum_columns, axis=1)

print(result)

This will output:

0 5
1 7
2 9

Using applymap() method

The applymap() method is used to apply a function to each element of the DataFrame. For instance, let’s say we have a DataFrame with two columns, ‘A’ and ‘B’, and we want to compute the square of each element. We can define a function that takes a scalar value as input and returns its square:

def square(x):
return x ** 2

result = df.applymap(square)

print(result)

This will output:

 A B
0 1 16
1 4 25
2 9 36

Using map() method

The map() method is used to replace each value in a column with another value. For instance, let’s say we have a DataFrame with a column ‘A’ that contains country codes, and we want to replace each country code with its full name. We can define a dictionary that maps each country code to its full name:

country_codes = {'US': 'United States', 'UK': 'United Kingdom', 'FR': 'France'}
df['A'] = df['A'].map(country_codes)

print(df)

 

Applying a Function to Selected Columns

Sometimes, you may only want to apply a function to specific columns of a dataframe. You can do this by selecting the columns of interest and passing them to the apply() method.

import pandas as pd

# create a dataframe
data = {'name': ['Alice', 'Bob', 'Charlie'],
        'age': [25, 30, 35],
        'income': [50000, 60000, 70000]}

df = pd.DataFrame(data)

# define a function to add a bonus to the income column
def add_bonus(income):
    return income + 10000

# apply the function to the income column
df['income'] = df['income'].apply(add_bonus)

# print the updated dataframe
print(df)

Output:

       name  age  income
0     Alice   25   60000
1       Bob   30   70000
2  Charlie   35   80000

In this example, we defined a function called add_bonus() that adds a bonus of 10000 to the income column. We then passed only the income column to the apply() method and assigned the returned values to the same column.

Applying a Function to Selected Rows

You can also apply a function to specific rows of a dataframe by using the loc[] method to select the rows of interest.

import pandas as pd

# create a dataframe
data = {'name': ['Alice', 'Bob', 'Charlie'],
        'age': [25, 30, 35],
        'income': [50000, 60000, 70000]}

df = pd.DataFrame(data)

# define a function to add a bonus to the income column
def add_bonus(income):
    return income + 10000

# apply the function to the second row of the dataframe
df.loc[1] = df.loc[1].apply(add_bonus)

# print the updated dataframe
print(df)

Output:

       name  age  income
0     Alice   25   50000
1       Bob   30   70000
2  Charlie   35   70000

In this example, we applied the add_bonus() function only to the second row of the dataframe by using the loc[] method to select the row. We then assigned the returned values to the same row using the same loc[] method.

Apply custom function to a selected subset of columns

import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({'Name': ['John', 'Mary', 'Mark', 'Emma'],
                   'Age': [28, 25, 32, 27],
                   'Salary': [50000, 60000, 75000, 45000],
                   'Bonus': [10000, 12000, 15000, 9000]})

# Define a custom function to add a percentage sign to values
def add_percent(val):
    return str(val) + '%'

# Apply the function to selected columns using the applymap method
df[['Salary', 'Bonus']] = df[['Salary', 'Bonus']].applymap(add_percent)

# Print the updated dataframe
print(df)

Above given example creates a dataframe with columns for name, age, salary, and bonus. The custom function add_percent takes a value and returns a string with a percent sign appended. The function is applied to the ‘Salary’ and ‘Bonus’ columns using the applymap method. The resulting dataframe shows the updated values with the percent sign added.

Applyfunction to rows based on a conditional statement

import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({'Name': ['John', 'Mary', 'Mark', 'Emma'],
                   'Age': [28, 25, 32, 27],
                   'Salary': [50000, 60000, 75000, 45000],
                   'Bonus': [10000, 12000, 15000, 9000]})

# Define a custom function to add a bonus to employees under age 30
def add_bonus(row):
    if row['Age'] < 30:
        row['Bonus'] += 5000
    return row

# Apply the function to rows using the apply method
df = df.apply(add_bonus, axis=1)

# Print the updated dataframe
print(df)

Above example creates a dataframe with columns for name, age, salary, and bonus. The custom function add_bonus takes a row and adds a bonus of 5000 to employees who are under 30 years old. The function is applied to rows using the apply method with axis=1. The resulting dataframe shows the updated bonus values for employees under 30.

Last Updated on May 11, 2023 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs