Split a column in Pandas dataframe and get part of it



Split a column in Pandas dataframe and get part of it

When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement.

We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str.split, it can be used with split to get the desired part of the string. To get the nth part of the string, first split the column by delimiter and apply str[n-1] again on the object returned, i.e. Dataframe.columnName.str.split(" ").str[n-1].

 

Let’s make it clear by examples.

Code #1: Print a data object of the splitted column.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id'
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
 
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
 
print(df.Geek_ID.str.split('_').str[0])

Output:

0    Geek1
1    Geek2
2    Geek3
3    Geek4
4    Geek5
dtype: object

Code #2: Print a list of returned data object.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
 
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
 
print(df.Geek_ID.str.split('_').str[0].tolist())

Output:

['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5']

Code #3: Print a list of elements.

import pandas as pd
import numpy as np
 
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
 
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
 
print(df.Geek_ID.str.split('_').str[1].tolist())

Output:

['id', 'id', 'id', 'id', 'id']

Last Updated on October 18, 2021 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs