Handling Missing Data in Pandas by Jaume Boguñá
Handling Missing Data in Pandas by Jaume Boguñá
Jaume Boguñá
Dive into Python
Handling Missing Data in Pandas
Missing data occurs when information is absent for one or more items.
Jaume Boguñá
df.isnull()
Name Age City Profession
0 False False False False
1 False True False False
2 False False True False
3 False False False True
Jaume Boguñá
Jaume Boguñá
df
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
1 Paula NaN Valencia Doctor
2 David 43.0 None Teacher
3 Berta 17.0 Sevilla None
Jaume Boguñá
df
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
1 Paula NaN Valencia Doctor
2 David 43.0 None Teacher
3 Berta 17.0 Sevilla None
Jaume Boguñá
df
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
1 Paula NaN Valencia Doctor
2 David 43.0 None Teacher
3 Berta 17.0 Sevilla None
df.notnull()
Name Age City Profession
0 True True True True
1 True False True True
2 True True False True
3 True True True False
Jaume Boguñá
df
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
1 Paula NaN Valencia Doctor
2 David 43.0 None Teacher
3 Berta 17.0 Sevilla None
df.dropna()
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
Jaume Boguñá
df
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
1 Paula NaN Valencia Doctor
2 David 43.0 None Teacher
3 Berta 17.0 Sevilla None
df.dropna(axis=1)
Name
0 Jaume
1 Paula
2 David
3 Berta
Jaume Boguñá
df
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
1 Paula NaN Valencia Doctor
2 David 43.0 None Teacher
3 Berta 17.0 Sevilla None
df.dropna(subset=['Age', 'Profession'])
Name Age City Profession
0 Jaume 25.0 Madrid Engineer
2 David 43.0 None Teacher
Jaume Boguñá
df
Maths Science French
Joan 8.0 9.0 NaN
Nadia 7.0 NaN 8.0
Elsa NaN 6.0 5.0
Mario 6.0 7.0 7.0
Jaume Boguñá
df
Maths Science French
Joan 8.0 9.0 NaN
Nadia 7.0 NaN 8.0
Elsa NaN 6.0 5.0
Mario 6.0 7.0 7.0
Jaume Boguñá
df
Maths Science French
Joan 8.0 9.0 NaN
Nadia 7.0 NaN 8.0
Elsa NaN 6.0 5.0
Mario 6.0 7.0 7.0
df.fillna(value=round(df.mean(),1))
Maths Science French
Joan 8.0 9.0 6.7
Nadia 7.0 7.3 8.0
Elsa 7.0 6.0 5.0
Mario 6.0 7.0 7.0
Jaume Boguñá
df
Maths Science French
Joan 8.0 9.0 NaN
Nadia 7.0 NaN 8.0
Elsa NaN 6.0 5.0
Mario 6.0 7.0 7.0
Jaume Boguñá
df
Maths Science French
Joan 8.0 9.0 NaN
Nadia 7.0 NaN 8.0
Elsa NaN 6.0 5.0
Mario 6.0 7.0 7.0
df.replace(to_replace=np.nan, value=0)
Maths Science French
Joan 8.0 9.0 0.0
Nadia 7.0 0.0 8.0
Elsa 0.0 6.0 5.0
Mario 6.0 7.0 7.0
Jaume Boguñá
df
Maths Science French
Joan 8.0 9.0 NaN
Nadia 7.0 NaN 8.0
Elsa NaN 6.0 5.0
Mario 6.0 7.0 7.0
df.replace(to_replace=np.nan, value=df.min())
Maths Science French
Joan 8.0 9.0 5.0
Nadia 7.0 6.0 8.0
Elsa 6.0 6.0 5.0
Mario 6.0 7.0 7.0
Jaume Boguñá
Jaume Boguñá
Aerospace Engineer | Data Scientist