Cleaning data
Let's move on to the 3-cleaning_data.ipynb notebook for our discussion of data cleaning. As usual, we will begin by importing pandas and reading in our data. For this section, we will be using the nyc_temperatures.csv file, which contains the maximum daily temperature (TMAX), minimum daily temperature (TMIN), and the average daily temperature (TAVG) from the LaGuardia Airport station in New York City for October 2018:
>>> import pandas as pd
>>> df = pd.read_csv('data/nyc_temperatures.csv')
>>> df.head()
We retrieved long format data from the API; for our analysis, we want wide format data, but we will address that in the Pivoting DataFrames section, later in this chapter:
Figure 3.12 – NYC temperature data
For now, we will focus on making little tweaks to the data that will make it easier for us to use: renaming columns, converting each column into the most appropriate data type, sorting...