10 Tips and Tricks for Pandas

Ten quick and useful tips and tricks for Pandas

DZ
4 min readAug 15, 2023
Photo by Ddou Dou from Pexels

Explode

The explode method transforms each element in a list-like into a new row. While doing so, it also replicates the indexes.

import pandas as pd

df = pd.DataFrame(
{"Continent": ["Europe", "America"],
"Countries": [["Britain", "France", "Spain", "Italy"], ["US", "Canda", "Mexico"]]}
)

In the “Countries” column, the elements are lists with names of countries. We can use the explode method to make a new row for each country.

df_explode = df.explode('Countries')

We can go back to the original data frame by using:

df_reversed = df_explode.reset_index().groupby(['index', 'Continent']).agg(list).reset_index(level=1)

Clean Data Using Regular Expression

Suppose we have a column of values of purchases, but some of them are integers, others are float, and also there are some strings with non-numeric characters like “$” or “,”.

df = pd.DataFrame(
{"Name": ["Alice", "Bob", "Charlie", "Dona"],
"Purchase": [7543.23, 8000, '$9500.0', '€10,000']}
)

We want a clean column of float numbers. We can do that using

df['Purchase_clean'] = df['Purchase'].replace('[^0-9\.]', '', regex=True).astype(float)

The code search for non-numeric characters (that means not digits and not “.”) and remove them by replacing them with an empty string. Finally, it converts the whole series into float.

Use Numpy r_ for Multiple Slices

Suppose you want to get columns [1,2,3,4,7,8,9,10]. There is no one range that returns exactly these numbers. Instead of asking for each of the numbers individually, we can use Numpy r_.

df.iloc[:, np.r_[1:5, 7:11]]. # return data frame with columns 1,2,3,4,7,8,9,10

Making Plots Direct from the Data Frame

You can use the plot method directly from a data frame object:

df = pd.DataFrame({'x': [1,2,3,4,5], 'y': [2, 4 , 6, 8, 10]})
df.plot(x='x', y='y', kind='line')

Add Prefix and Suffix to Column names

You can use the methods add_prefix and add_suffix to add prefix or suffix to the names of the columns (or rows).

df = pd.DataFrame({'x': [1,2,3,4,5], 'y': [2, 4 , 6, 8, 10]})
df.add_prefix('position_')

Select Columns Based on Type

Pandas let you select columns by type. For example, you can ask for all the numeric columns.

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6],
'B': [True, False] * 3,
'C': ['a', 'b', 'c', 'd', 'e', 'f']})

df.select_dtypes(include='number') # return column A
df.select_dtypes(include='bool') # return column B
df.select_dtypes(exclude='object') # return columns A and b

Map Values into New Values

The map method can replace values with new predefined values.

df = pd.DataFrame({'Color': ['red', 'green', 'blue'], 
'Type': [1, 2, 3]})
df['Color'] = df['Color'].map({'red': 'R', 'green': 'G', 'blue': 'B'})

Encode Objects as an Enumerated Type

The method factorize can obtain a numeric representation for objects.

df = pd.DataFrame({'Name': ["Santa's Little Helper", "Snow Ball I", "Snow Ball II", "Stampy"], 
'Animal': ['Dog', 'Cat', 'Cat', 'Elephant']})
df['Animal_code'] = df['Animal'].factorize()[0]

Read Table from the Web

We can use read_html to get a table from the web. This method returns a list of data frames with tables it has found. We can direct it using a regular expression with the keyword match.

df_list = pd.read_html('https://en.wikipedia.org/wiki/Harry_Potter_and_the_Chamber_of_Secrets', 
match='Harry Potter and the Chamber of Secrets')

Filter Data Frame According to Values

We can filter a data frame by checking if a certain entry is contained in a list of values.

df = pd.DataFrame({'Animal': ['Lion', 'Tuna', 'Dolphin', 'Frog', 'Dove'],
'Type': ['Mammal', 'Fish', 'Mammal', 'Amphibian', 'Bird']})
df[df['Type'].isin(['Mammal'])]

--

--

No responses yet