Python Pandas - Flatten nested JSON
Last Updated :
05 Dec, 2023
It is general practice to convert the JSON data structure to a Pandas Dataframe as it can help to manipulate and visualize the data more conveniently. In this article, let us consider different nested JSON data structures and flatten them using inbuilt and custom-defined functions.
Python Pandas.json_normalize() Syntax
Pandas have a nice inbuilt function called json_normalize() to flatten the simple to moderately semi-structured nested JSON structures to flat tables.
Syntax: pandas.json_normalize(data, errors='raise', sep='.', max_level=None)
Parameters:
- data - dict or list of dicts
- errors - {‘raise’, ‘ignore’}, default ‘raise’
- sep - str, default ‘.’ nested records will generate names separated by a specified separator.
- max_level - int, default None. Max number of levels(depth of dict) to normalize.
Python Pandas Flatten Nested JSON Examples
Most of the data extracted from the web through scraping are in the form of JSON datatype because JSON is a preferred datatype for transmitting data in web applications. The reason JSON is preferred is that it's extremely lightweight to send back and forth in HTTP requests and responses due to the small file size. Below are the examples by which we can flatten nested json in Python:
Example 1: Pandas json_normalize
Function
Consider a list of nested dictionaries that contains details about the students and their marks as shown. Use pandas json_normalize on this JSON data structure to flatten it to a flat table as shown
Python3
import pandas as pd
data = [
{"Roll no": 1,
"student": {"first_name": "Ram", "last_name": "kumar"}
},
{"student": {"English": "95", "Math": "88"}
},
{"Roll no": 2,
"student": {"first_name": "Joseph", "English": "90", "Science": "82"}
},
{"Roll no": 3,
"student": {"first_name": "abinaya", "last_name": "devi"},
"student": {"English": "91", "Math": "98"}
},
]
df = pd.json_normalize(data)
print(df)
Output:
Roll no student.first_name student.last_name student.English student.Math student.Science
0 1 Ram kumar NaN NaN NaN
1 NaN NaN NaN 95 88 NaN
2 2 Joseph NaN 90 NaN 82
3 3 abinaya devi 91 98 NaN
Example 2: Flatten Nested Json in Pandas using max_level=0
Now let us make use of the max_level option to flatten a slightly complicated JSON structure to a flat table. For this example, we have considered the max_level of 0, which means flattening only the first level of JSON and can experiment with the results.
Here, we have considered an example of the health records of different individuals in JSON format.
Python3
import pandas as pd
data = [
{
"id": 1,
"candidate": "Roberto mathews",
"health_index": {"bmi": 22, "blood_pressure": 130},
},
{"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
{
"id": 2,
"candidate": "Bruce tommy",
"health_index": {"bmi": 31, "blood_pressure": 190},
},
]
pd.json_normalize(data, max_level=0)
Output:
As we have used only 1 level of flattening, The second level is retained as a key-value pair as shown
id candidate health_index.bmi health_index.blood_pressure
0 1 Roberto mathews 22 130
1 NaN Shane wade 28 160
2 2 Bruce tommy 31 190
Example 3: Flatten Nested Json in Pandas using max_level=0
Now let us use the same JSON data structure as above, with max_level of 1, which means flattening the first two levels of JSON and can experiment with the results.
Python3
import pandas as pd
data = [
{
"id": 1,
"candidate": "Roberto mathews",
"health_index": {"bmi": 22, "blood_pressure": 130},
},
{"candidate": "Shane wade", "health_index": {"bmi": 28, "blood_pressure": 160}},
{
"id": 2,
"candidate": "Bruce tommy",
"health_index": {"bmi": 31, "blood_pressure": 190},
},
]
pd.json_normalize(data, max_level=1)
Output:
Here, unlike the before example we flatten the first two rows of the JSON which gives a complete structure for the flat table.
id candidate health_index.bmi health_index.blood_pressure
0 1 Roberto mathews 22 130
1 NaN Shane wade 28 160
2 2 Bruce tommy 31 190
Example 4: Passing Meta Arguments to json_normalize
Finally, let us consider a deeply nested JSON structure that can be converted to a flat table by passing the meta arguments to the json_normalize function as shown below.
Here, in the below code, we have passed the chronological order in which the JSON has to be parsed to a flat table. In the below code, we first suggest the department key to be parsed followed by the company and tagline, then, we pass the management and CEO key as a nested list indicating that they have to be parsed as a single field.
Python3
import pandas as pd
data = [
{
"company": "Google",
"tagline": "Dont be evil",
"management": {"CEO": "Sundar Pichai"},
"department": [
{"name": "Gmail", "revenue (bn)": 123},
{"name": "GCP", "revenue (bn)": 400},
{"name": "Google drive", "revenue (bn)": 600},
],
},
{
"company": "Microsoft",
"tagline": "Be What's Next",
"management": {"CEO": "Satya Nadella"},
"department": [
{"name": "Onedrive", "revenue (bn)": 13},
{"name": "Azure", "revenue (bn)": 300},
{"name": "Microsoft 365", "revenue (bn)": 300},
],
},
]
result = pd.json_normalize(
data, "department", ["company", "tagline", ["management", "CEO"]]
)
result
Output:
name revenue (bn) company tagline management.CEO
0 Gmail 123 Google Dont be evil Sundar Pichai
1 GCP 400 Google Dont be evil Sundar Pichai
2 Google drive 600 Google Dont be evil Sundar Pichai
3 Onedrive 13 Microsoft Be What's Next Satya Nadella
4 Azure 300 Microsoft Be What's Next Satya Nadella
5 Microsoft 365 300 Microsoft Be What's Next Satya Nadella
Similar Reads
Python Tutorial | Learn Python Programming Language
Python Tutorial â Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly.Python is:A high-level language, used in web development, data science, automatio
10 min read
Python Interview Questions and Answers
Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Non-linear Components
In electrical circuits, Non-linear Components are electronic devices that need an external power source to operate actively. Non-Linear Components are those that are changed with respect to the voltage and current. Elements that do not follow ohm's law are called Non-linear Components. Non-linear Co
11 min read
Python OOPs Concepts
Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p
11 min read
Python Projects - Beginner to Advanced
Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Python Exercise with Practice Questions and Solutions
Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs
Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Spring Boot Tutorial
Spring Boot is a Java framework that makes it easier to create and run Java applications. It simplifies the configuration and setup process, allowing developers to focus more on writing code for their applications. This Spring Boot Tutorial is a comprehensive guide that covers both basic and advance
10 min read
Class Diagram | Unified Modeling Language (UML)
A UML class diagram is a visual tool that represents the structure of a system by showing its classes, attributes, methods, and the relationships between them. It helps everyone involved in a projectâlike developers and designersâunderstand how the system is organized and how its components interact
12 min read
Steady State Response
In this article, we are going to discuss the steady-state response. We will see what is steady state response in Time domain analysis. We will then discuss some of the standard test signals used in finding the response of a response. We also discuss the first-order response for different signals. We
9 min read