Pandas DataFrame.isin



When working with Pandas DataFrames, there are often cases where we need to check if a value is present in a column or not. One way to achieve this is to use the isin() method of a DataFrame, which returns a Boolean mask indicating whether each element in the DataFrame is contained in a set of values.

isin() Method

The isin() method is used to filter out the rows in a DataFrame that contain a specific set of values in one or more columns. The method returns a Boolean mask indicating whether each element in the DataFrame is contained in the set of values or not.

The syntax for using the isin() method is as follows:

		import pandas as pd

		# create a sample DataFrame
		df = pd.DataFrame({
		    'Name': ['John', 'Mary', 'Peter', 'Jane'],
		    'Age': [25, 18, 32, 27],
		    'Gender': ['Male', 'Female', 'Male', 'Female']
		})

		# filter rows where Age is either 18 or 27
		filter_age = df['Age'].isin([18, 27])
		print(filter_age)
	

Using isin() with Multiple Columns

The isin() method can also be used to filter rows based on multiple columns. To achieve this, we simply pass a dictionary to the isin() method, where the keys represent the column names and the values represent the sets of values to check for in each column. The method will then return a Boolean mask indicating whether each row in the DataFrame matches all of the specified conditions.

		# filter rows where Age is either 18 or
                # 27 and Gender is Female
		filter_age_gender = df.isin({'Age': [18, 27], 
                          'Gender': ['Female']}).all(axis=1)
		print(filter_age_gender)
	

Output

The output for the above examples will be a Boolean mask indicating whether each element in the DataFrame matches the specified conditions or not. The output for the first example will be:

		0    False
		1     True
		2    False
		3     True
		Name: Age, dtype: bool

The output for the second example will be:

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'city': ['New York', 'Paris', 'London', 'Tokyo']}

df = pd.DataFrame(data)
# check if values are in a list

print(df.isin(['Bob', 30, 'London']))

 

The output of the code above is a DataFrame with the same shape as the original DataFrame, but with boolean values indicating whether each element in the original DataFrame is contained in the input list. For example, the first row in the output DataFrame has True for the column name because the name ‘Alice’ is not in the input list, but has True for the column age because the age 25 is not in the input list, and has False for the column city because the city ‘New York’ is not in the input list.

Using a Dictionary to Check Multiple Columns

The isin() method can also be used with a dictionary to check multiple columns at once. The keys of the dictionary represent the column names, and the values represent the list of values to check for each column. Here’s an example:


data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 40],
'city': ['New York', 'Paris', 'London', 'Tokyo'],
'state': ['NY', 'PA', 'UK', 'JP']}

df = pd.DataFrame(data)
# check if values are in a dictionary

print(df.isin({'name': ['Alice', 'Bob'], 
                'state': ['NY', 'PA']}))

 

The output of the code above is a DataFrame with boolean values indicating whether each element in the specified columns is contained in the corresponding list. For example, the first row in the output DataFrame has True for the column name because the name ‘Alice’ is in the list of names to check, but has False for the column state because the state ‘NY’ is not in the list of states to check.

 

Last Updated on May 16, 2023 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs