Pandas Extracting rows using .loc[]



Pandas is a popular data manipulation library in Python. It provides many useful methods to extract, filter, and manipulate data in a DataFrame. In this article, we will discuss how to extract rows from a DataFrame using the .loc[] method.

What is .loc[] Method

The .loc[] method is used to select rows and columns from a DataFrame based on the labels of the rows and columns. It takes two arguments:

  1. Label of the row(s) to be selected.
  2. Label of the column(s) to be selected.

Syntax of the .loc[] method:

df.loc[row_label(s), column_label(s)]

Extracting Rows with .loc[]

To extract rows from a DataFrame using the .loc[] method, we need to specify the label(s) of the row(s) we want to select. The label(s) can be either a single value or a list of values. We can also use slicing to select a range of labels.

Let’s consider the following example:

import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Sarah', 'David', 'Lily', 'Bob'],
        'Age': [28, 25, 22, 31, 19],
        'Gender': ['M', 'F', 'M', 'F', 'M'],
        'Country': ['USA', 'Canada', 'Australia', 'USA', 'Canada']}
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

Output:

    Name  Age Gender    Country
0   John   28      M        USA
1  Sarah   25      F     Canada
2  David   22      M  Australia
3   Lily   31      F        USA
4    Bob   19      M     Canada

Now, let’s extract some rows using the .loc[] method:

# Select a single row using a label
row = df.loc[2]
print(row)

# Select multiple rows using a list of labels
rows = df.loc[[1, 3]]
print(rows)

# Select a range of rows using slicing
rows = df.loc[1:3]
print(rows)

Output:

Name          David
Age              22
Gender            M
Country    Australia
Name          Sarah
Age              25
Gender            F
Country      Canada
Name          Lily
Age             31
Gender           F

Extracting Rows using slicing

You can also use slicing to extract rows from a Pandas DataFrame using the .loc[] method. Slicing involves specifying a range of row labels (or index values) to extract using the syntax start:end.

import pandas as pd

# Create a sample DataFrame with a custom index
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
    'age': [25, 30, 35, 40],
}, index=['a', 'b', 'c', 'd'])

# Use slicing to extract a range of rows
result = df.loc['b':'d']

# Display the result
print(result)

Output:

       name  age
b       Bob   30
c  Charlie   35
d      Dave   40

Extracting Rows with Multiple Conditions

Sometimes, you may need to extract rows from a DataFrame based on multiple conditions. For example, you may want to extract all rows where the “score” column is greater than 90 and the “grade” column is “A”.

You can do this using the ampersand (&) operator to combine the conditions within the .loc[] method. For example:

# Extract rows where score is greater than 90 and grade is "A"

df.loc[(df['score'] > 90) & (df['grade'] == 'A')]

This will return a DataFrame with all rows where the score is greater than 90 and the grade is “A”.

Extracting Rows with Partial String Matches

You can also use the .loc[] method to extract rows based on partial string matches. For example, if you have a column of names and you want to extract all rows where the name contains the string “John”, you can do this as follows:

# Extract rows where name contains "John"

df.loc[df['name'].str.contains('John')]

This will return a DataFrame with all rows where the “name” column contains the string “John”.

Extracting Rows with Regular Expressions

If you need even more advanced string matching capabilities, you can use regular expressions with the .loc[] method. For example, if you have a column of phone numbers and you want to extract all rows where the phone number is in the format “(123) 456-7890”, you can do this as follows:

# Extract rows where phone number matches pattern

df.loc[df['phone'].str.match(r'\d3\d3 \d{3}-\d{4}')]

This will return a DataFrame with all rows where the “phone” column matches the specified pattern.

Extracting Rows Using Boolean Indexing

Boolean indexing is a powerful technique for filtering rows in a Pandas DataFrame based on a condition. It involves creating a boolean mask that specifies the rows that meet the condition, and then passing this mask to the .loc[] method. Here’s an example:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
    'age': [25, 30, 35, 40],
    'gender': ['F', 'M', 'M', 'M']
})

# Create a boolean mask based on a condition
mask = df['age'] >= 30

# Use the boolean mask to extract the relevant rows
result = df.loc[mask]

# Display the result
print(result)

Output:

       name  age gender
1       Bob   30      M
2  Charlie   35      M
3      Dave   40      M

Conclusion

The .loc[] method is a powerful tool for extracting rows from a Pandas DataFrame based on various conditions. By using this method along with conditional statements, partial string matching, and regular expressions, you can easily extract the data you need from even the largest and most complex datasets.

Last Updated on May 16, 2023 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs