When working with Pandas DataFrames, there are often cases where we need to check if a value is present in a column or not. One way to achieve this is to use the isin()
method of a DataFrame, which returns a Boolean mask indicating whether each element in the DataFrame is contained in a set of values.
isin()
Method
The isin()
method is used to filter out the rows in a DataFrame that contain a specific set of values in one or more columns. The method returns a Boolean mask indicating whether each element in the DataFrame is contained in the set of values or not.
The syntax for using the isin()
method is as follows:
import pandas as pd # create a sample DataFrame df = pd.DataFrame({ 'Name': ['John', 'Mary', 'Peter', 'Jane'], 'Age': [25, 18, 32, 27], 'Gender': ['Male', 'Female', 'Male', 'Female'] }) # filter rows where Age is either 18 or 27 filter_age = df['Age'].isin([18, 27]) print(filter_age)
Using isin()
with Multiple Columns
The isin()
method can also be used to filter rows based on multiple columns. To achieve this, we simply pass a dictionary to the isin()
method, where the keys represent the column names and the values represent the sets of values to check for in each column. The method will then return a Boolean mask indicating whether each row in the DataFrame matches all of the specified conditions.
# filter rows where Age is either 18 or # 27 and Gender is Female filter_age_gender = df.isin({'Age': [18, 27], 'Gender': ['Female']}).all(axis=1) print(filter_age_gender)
Output
The output for the above examples will be a Boolean mask indicating whether each element in the DataFrame matches the specified conditions or not. The output for the first example will be:
0 False 1 True 2 False 3 True Name: Age, dtype: bool
The output for the second example will be:
import pandas as pd data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'age': [25, 30, 35, 40], 'city': ['New York', 'Paris', 'London', 'Tokyo']} df = pd.DataFrame(data) # check if values are in a list print(df.isin(['Bob', 30, 'London']))
The output of the code above is a DataFrame with the same shape as the original DataFrame, but with boolean values indicating whether each element in the original DataFrame is contained in the input list. For example, the first row in the output DataFrame has True
for the column name
because the name ‘Alice’ is not in the input list, but has True
for the column age
because the age 25 is not in the input list, and has False
for the column city
because the city ‘New York’ is not in the input list.
Using a Dictionary to Check Multiple Columns
The isin() method can also be used with a dictionary to check multiple columns at once. The keys of the dictionary represent the column names, and the values represent the list of values to check for each column. Here’s an example:
data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'age': [25, 30, 35, 40], 'city': ['New York', 'Paris', 'London', 'Tokyo'], 'state': ['NY', 'PA', 'UK', 'JP']} df = pd.DataFrame(data) # check if values are in a dictionary print(df.isin({'name': ['Alice', 'Bob'], 'state': ['NY', 'PA']}))
The output of the code above is a DataFrame with boolean values indicating whether each element in the specified columns is contained in the corresponding list. For example, the first row in the output DataFrame has True
for the column name
because the name ‘Alice’ is in the list of names to check, but has False
for the column state
because the state ‘NY’ is not in the list of states to check.
Last Updated on May 16, 2023 by admin