Python – Pandas DataFrame.astype()



In this article, we will explore the DataFrame.astype() method in the Python pandas library. The astype() method is used to change the data type of one or more columns in a DataFrame. It allows us to convert the data to a desired data type, such as converting numeric data to strings or strings to integers. Let’s dive into the details.

Syntax

The syntax of the astype() method is as follows:

dataframe.astype(dtype, copy=True, errors='raise')

Parameters

  • dtype: The data type to which the column(s) should be converted.
  • copy: Optional. Specifies whether to create a copy of the DataFrame. The default value is True.
  • errors: Optional. Specifies how to handle errors. The default value is 'raise', which raises an exception when the conversion fails. Other options are 'ignore' to suppress errors and 'coerce' to force invalid values to NaN.

Example 1: Converting Numeric Data to Strings

Let’s say we have a DataFrame with a column containing numeric data, and we want to convert it to strings:

import pandas as pd

# Create a DataFrame
data = {'Number': [10, 20, 30, 40]}
df = pd.DataFrame(data)

# Convert the 'Number' column to strings
df['Number'] = df['Number'].astype(str)

print(df)

The output will be:

  Number
0     10
1     20
2     30
3     40

Example 2: Converting Strings to Integers

Now, let’s assume we have a DataFrame with a column containing strings representing numbers, and we want to convert them to integers:

import pandas as pd

# Create a DataFrame
data = {'Number': ['10', '20', '30', '40']}
df = pd.DataFrame(data)

# Convert the 'Number' column to integers
df['Number'] = df['Number'].astype(int)

print(df)

The output will be:

   Number
0      10
1      20
2      30
3      40

Example 3: Handling Errors

The astype() method allows us to handle errors during the conversion process. Let’s consider a scenario where the column contains some non-numeric values:

import pandas as pd

# Create a DataFrame
data = {'Number': ['10', '20', '30', '40', 'abc']}
df = pd.DataFrame(data)

# Convert the 'Number' column to integers
df['Number'] = df['Number'].astype(int, errors='coerce')

print(df)

The output will be:

   Number
0      10
1      20
2      30
3      40
4     NaN

In this example, the errors='coerce' parameter is used to force the invalid value “abc” to NaN.

Example 4: Converting Data Types for Multiple Columns

The `astype()` method can be used to convert the data types for multiple columns simultaneously. Let’s consider a scenario where we have a DataFrame with multiple columns, and we want to convert two columns to different data types:

import pandas as pd

# Create a DataFrame
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': ['25', '30', '35']}
df = pd.DataFrame(data)

# Convert 'Age' column to integers and 'Name' column to uppercase strings
df = df.astype({'Age': int, 'Name': str.upper})

print(df)

The output will be:

   Name  Age
0  JOHN   25
1  ALICE  30
2  BOB    35

Example 5: Converting to Categorical Data Type

The `astype()` method can also be used to convert a column to the categorical data type. Categorical data type is useful when working with categorical variables or when we want to optimize memory usage. Let’s consider an example:

import pandas as pd

# Create a DataFrame
data = {'Category': ['A', 'B', 'A', 'C']}
df = pd.DataFrame(data)

# Convert 'Category' column to categorical data type
df['Category'] = df['Category'].astype('category')

print(df.dtypes)

The output will be:

Category    category
dtype: object

In this example, the Category column is converted to the categorical data type.

Example 6: Converting to DateTime Data Type

The `astype()` method can also be used to convert a column to the DateTime data type. This is useful when working with date and time values. Let’s consider an example:

import pandas as pd

# Create a DataFrame
data = {'Date': ['2023-01-01', '2023-02-01', '2023-03-01']}
df = pd.DataFrame(data)

# Convert 'Date' column to DateTime data type
df['Date'] = pd.to_datetime(df['Date'])

print(df.dtypes)

The output will be:

Date    datetime64[ns]
dtype: object

References

Last Updated on May 18, 2023 by admin

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Blogs