How to select multiple columns in a pandas dataframe



When working with data in Python using the Pandas library, it is common to need to select specific columns from a dataframe. Pandas provides several methods for selecting multiple columns based on various criteria. In this article, we will explore different techniques to accomplish this task.

Selecting Columns by Name

If you know the names of the columns you want to select, you can use the square bracket notation to specify them as a list:

import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Emma', 'David'],
        'Age': [25, 30, 35],
        'City': ['New York', 'Paris', 'London'],
        'Salary': [50000, 60000, 70000]}

df = pd.DataFrame(data)

# Select multiple columns by name
selected_columns = df[['Name', 'Age', 'Salary']]

The selected_columns variable will now contain a new dataframe with only the specified columns.

Selecting Columns by Index

If you prefer to select columns based on their index position, you can use the iloc method:

# Select multiple columns by index
selected_columns = df.iloc[:, [0, 2, 3]]

In this example, :, [0, 2, 3] selects all rows and the columns at index positions 0, 2, and 3.

Selecting Columns by Data Type

If you want to select columns based on their data type, you can use the select_dtypes method:

# Select columns by data type
selected_columns = df.select_dtypes(include=['object'])

This code snippet selects all columns with the data type ‘object’, which typically includes string columns.

Selecting Columns by Condition

You can also select columns based on specific conditions using boolean indexing:

# Select columns based on condition
selected_columns = df.loc[:, df['Age'] > 30]

This code selects all columns where the values in the ‘Age’ column are greater than 30.

Selecting Columns by Regular Expression

If you have a large number of columns and want to select them based on a pattern or regular expression, you can use the filter method:

# Select columns using regular expression
selected_columns = df.filter(regex='^S')

This code snippet selects all columns that start with the letter ‘S’.

Selecting Columns by Column Labels

You can use the loc method to select columns by their column labels:

# Select columns by column labels
selected_columns = df.loc[:, ['Name', 'City']]

This code selects all rows and the columns with labels ‘Name’ and ‘City’.

Selecting Columns by Position Range

If you want to select a range of columns based on their position, you can use the slicing operator ::

# Select columns by position range
selected_columns = df.iloc[:, 1:3]

In this example, :, 1:3 selects all rows and columns at positions 1 and 2.

Selecting Columns by Callable Function

If you need to select columns based on a custom condition, you can use a callable function with the apply method:

# Define a custom condition
def custom_condition(column_name):
    return column_name.startswith('S')

# Select columns using a callable function
selected_columns = df.loc[:, df.columns.map(custom_condition)]

This code selects columns where the column name starts with the letter ‘S’ by applying the custom_condition function to each column name.

Selecting Columns by Regular Expression Match

If you want to select columns based on a regular expression match, you can use the filter method with a regular expression pattern:

# Select columns using regular expression match
selected_columns = df.filter(regex='e$')

This code selects all columns that end with the letter ‘e’.

Last Updated on May 18, 2023 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs