Data Structure
 Networking
 RDBMS
 Operating System
 Java
 MS Excel
 iOS
 HTML
 CSS
 Android
 Python
 C Programming
 C++
 C#
 MongoDB
 MySQL
 Javascript
 PHP
- Selected Reading
 - UPSC IAS Exams Notes
 - Developer's Best Practices
 - Questions and Answers
 - Effective Resume Writing
 - HR Interview Questions
 - Computer Glossary
 - Who is Who
 
Write a program in Python to find which column has the minimum number of missing values in a given dataframe
Assume, you have a dataframe and the minimum number of missing value column is,
DataFrame is: Id Salary Age 0 1.0 20000.0 22.0 1 2.0 NaN 23.0 2 3.0 50000.0 NaN 3 NaN 40000.0 25.0 4 5.0 80000.0 NaN 5 6.0 NaN 25.0 6 7.0 350000.0 26.0 7 8.0 55000.0 27.0 8 9.0 60000.0 NaN 9 10.0 70000.0 24.0 lowest missing value column is: Id
To solve this, we will follow the steps given below −
Solution
Define a dataframe with three columns Id,Salary and Age
Set df.apply() inside lambda function to check the sum of null values from all rows
df = df.apply(lambda x: x.isnull().sum(),axis=0)
Finally, print the lowest value from the df using df.idxmin()
df.idxmin()
Example
Let’s see the below code to get a better understanding −
import pandas as pd
import numpy as np
df = pd.DataFrame({'Id':[1,2,3,np.nan,5,6,7,8,9,10],
'Salary':[20000,np.nan,50000,40000,80000,np.nan,350000,55000,60000,70000],
            'Age': [22,23,np.nan,25,np.nan,25,26,27,np.nan,24]
         })
print("DataFrame is:\n",df)
df = df.apply(lambda x: x.isnull().sum(),axis=0)
print("lowest missing value column is:",df.idxmin())
Output
DataFrame is: Id Salary Age 0 1.0 20000.0 22.0 1 2.0 NaN 23.0 2 3.0 50000.0 NaN 3 NaN 40000.0 25.0 4 5.0 80000.0 NaN 5 6.0 NaN 25.0 6 7.0 350000.0 26.0 7 8.0 55000.0 27.0 8 9.0 60000.0 NaN 9 10.0 70000.0 24.0 lowest missing value column is: Id
Advertisements