0% found this document useful (0 votes)

231 views

DWV Unit1

This document discusses data wrangling and visualization using Python and pandas. It covers topics like the pandas Series and DataFrame data structures, reading and writing different file formats with pandas, and loading data. The agenda includes 10 classes and 3 labs on topics from the textbook chapter 5 such as the Series and DataFrame structures, indexing objects, and why pandas is useful for data analysis. Students will complete projects and be prepared to answer questions for assignments, vivas, and interviews.

Uploaded by

Ujwal mudhiraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

231 views

DWV Unit1

Uploaded by

Ujwal mudhiraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 102

Data Wrangling & Visualization

Dr Tilottama Goswami
Professor
Department of Artificial Intelligence, Anurag University
tilottamagoswami.co.in
TEXT BOOK
Wes McKinney. Python for Data Analysis: Data Wrangling with pandas,
NumPy and I Python, O'Reilly, 2017, 2nd Edition

TOTAL 10 CLASSES & 3 LABS

Agenda – Part 1 (Refer Ch5 from TB)
1. Series Data Sructure
2. DataFrame Data Structure
3. Index Objects
4. Why PANDAS for Data Analysis

Projects for Assignments

Viva & Interview Questions
PANDAS DATA STRUCTURE
• SERIES
• A Series is a one-dimensional array-like object containing a
sequence of values

• DATA FRAME
• A DataFrame represents a rectangular table of data and
contains an ordered collection of columns, each of which can
be a different value type (numeric, string, boolean, etc.).
To install pandas, command is

pip install pandas

S
E
R
I
E
S
DEFINITION
Create Series Data Structure with customized label/index name
SELECT VALUE(S)USING LABELS
Filter/ Apply Math Function
SERIES AS ORDERED DICTIONARY & Vice Versa
How to Override the Sorted Order in Dictionary
How to Check the Missing Data? Method 1

How to Check the Missing Data? Method 2

How to Check the Missing Data? Method 3

Arithmetic Operations with Series Data – Automatic Index alignment
Series & Index – NAME attribute
INDEX modification using Assignment
Q/A Part- A

• Q1. What do you mean by Pandas in Python?

• Q2. Name three data structures available in Pandas.
• Q3. What do you mean by Series in Python?
• Q4. Create an Example of a series containing names of students
• Q5. Write a program in Python to create series of vowels.
• Q6. T/F
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.
• Q7 Write the output of the following : import pandas as pd
S1 = pd.Series(12, index = [4, 6, 8])
print(S1)
Q/A – Part A

The pandas name itself is derived from panel data, an econometrics term for multidimensional
structured datasets
import pandas as pd
S1 = pd.Series([“Ram”,”Abir”,”Anaya”])
print(S1)
Q6
import pandas as pd
S1 = pd.Series(["a","e","i","o","u"])
print(S1)

Q7
A Pandas Series is like a column in a table. (T)
It is a one-dimensional array holding data of any type.(T)
References
Text Books

1. Wes McKinney. Python for Data Analysis: Data Wrangling with

pandas, NumPy and I Python, O'Reilly, 2017, 2nd Edition
2. Jacqueline Kazil and Katharine Jarmul, Data Wrangling with Python,
O'Reilly, 2016

VIVA QUESTIONS
https://csiplearninghub.com/pandas-series-class-12-ip-important-questions/

Exercises
https://www.w3schools.com/python/pandas/pandas_series.asp
https://towardsdatascience.com/a-practical-introduction-to-pandas-series-
9915521cdc69
D A T

A F R

A M E
A DataFrame represents a rectangular table of data and contains an ordered collection
of columns, each of which can be a different value type (numeric, string,
boolean, etc.).

The DataFrame has both a row and column index;

The DataFrame can be thought of as a dict of Series all sharing the same index.

The data is stored as one or more two-dimensional blocks rather than a list, dict, or some
other collection of one-dimensional arrays.
How to construct a Data Frame?
If a Data Frame is huge, how to view parts?
What happens if the column name is not in the dictionary?
Can we assign a column that doesn’t exist?

Add a new column “eastern” of boolean values

where the state column equals 'Ohio'

New columns cannot be created with the frame2.eastern syntax

How to retrieve a Column from Data Frame?

Retrieved as Series

frame2[column] works for any column name,

but frame2.column only works when the column

name is a valid Python variable
name.
Retrieved as Attribute
How to retrieve a Row from Data Frame?

Rows can also be retrieved by position or name with the special loc attribute
How to modify column values ?

Assign a scalar value

Assign a array of values

What care must be taken to assign a series or an array to a column in Data Frame?

I) When you are assigning lists or arrays to a column,

: the value’s length must match the length of the DataFrame

II) If you assign a Series,

: its labels will be realigned exactly to the DataFrame’s index, inserting
missing values in any holes
The column returned from indexing a DataFrame is a view on the underlying data, not a
copy. Thus, any in-place modifications to the Series will be reflected in the DataFrame. The
del method can then be used to remove this column
Create DataFrame from nested dict of dicts
If the nested dict is passed to the DataFrame,
pandas will interpret
the outer dict keys as the columns
and
the inner keys as the row indices:

Transpose a DataFrame
values attribute returns the data contained in the DataFrame as a two-dimensional ndarray
I
INDEX
OBJECTS
N
D
E
X
INDEX DEFINITION
Properties of Index objects

Index objects are immutable and thus can’t be modified by the user:

Unlike Python sets, a pandas Index can contain duplicate labels:

Selections with duplicate labels will select all occurrences of that label.
How to create index objects and assign them to Series
WHY PANDAS FOR DATA ANALYSIS
Hence PANDAS are popularly used for Data Analysis
Exercise
• https://csiplearninghub.com/important-pandas-dataframe-questions-12-ip/
LAB PROJECT 1-W1,W2
(TWO TASKS )
PROJECT- Part 1
Attendance
MID 1 Marks

Residence
and
Gender
PROJECT- Part 1
All Mid Marks
Grp- Roll
G1. 1-10
G2. 11-20
G3. 21-30
G4. 31-40
G5. 41-50
G6. 51-60
---------
61-G1
62-G2
Residence and 63-G3
Gender 64-G4
65-G5
66-G6
PROJECT- Part 1

Residence
and
Gender
TASKS- PART 1 [10 Marks]
Write Roll number, Grp Number, Section, comments for
questions and upload the .ipynb in Google Classroom
1. Create Series for each column
2. Create Dictionary for each column and make a series from it
3. Customize the index to AU Roll numbers
4. Select the values of Mid greater than 4 / Select the values of assignment greater than 2
5. Select students from Villages / Select the DayScholars who are girls
6. Check if there is any missing data
7. Create a Data frame for the given input using arrays/ dictionaries
8. Add a Name Column for the given input
9. Create an index object of your choice and customize the data structure given to you
10. Check if the index has any duplicate values
Agenda –Part2 (Refer Ch6 from TB)
Data Loading, Storage and File Formats
1. Read Text File into DataFrame
2. Read Text Files in Pieces
3. Write Data to Text Format
4. Working with Delimited Formats
5. JSON Data

Projects for Assignments

Viva & Interview Questions
DATA LOADING
STORAGE
FILE FORMATS https://realpython.com/pandas-read-write-files/#write-a-csv-file
Pandas provides so many options of reading data into a DataFrame

Data stored in XML and JSON documents, CSV files, and Excel files is all unstructured. XML and JSON are also considered file
formats that represent semi-structured data, because both of them represent data in a hierarchical (tree-like) structure

Accessing data is a necessary first step for using most of the tools in data analysis.

Data input and output using pandas,

Numerous tools in other libraries to help with reading and writing data in various formats.

Input and output typically falls into a few main categories:

Reading text files and other more efficient on-disk formats
Loading data from databases
Interacting with network sources like web APIs.
Parameters ?
50
parameters
Messy Data

50 parameters
But WHY ?

Type Inference
CONVERT TEXT DATA INTO DATAFRAME

Categories for Optional Arguments for the functions mentioned in the last page
Feather
Hierarchical Data Formats is a fast, lightweight, and easy-
The Hierarchical Data Format version 5 (HDF5), to-use binary file format for
is an open source file format that supports large, storing data frames. Feather
complex, heterogeneous data provides binary columnar
serialization for data frames
Java Script Object Notation (JSON) computer data interchange format Message Pack (MsgPack)
Example : Read .csv file
Contents

a,b,c,d,message
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
Customize File Header = Column Names

Allow pandas to assign default column names

Customize the column Names

Customize the Index Column with other Column option
Hierarchical Index
From
Multiple Columns
How to read a Table with variable amount of whitespace delimiter in data ? Using REGULAR EXPRESSION
SKIP ROWS WHILE READING A CSV FILE

Skip the first, third, and fourth rows of a file with skiprows
How to Handle Missing Values in a csv file?

By default, pandas uses a set of commonly

occurring sentinels, such as NA and NULL
What is Missing data ?
1) Not present (empty string) , ‘ ‘
The na_values option can take either a list
2) Marked by some sentinel value, eg NA
or set of strings to consider missing
values
Customization of different NA sentinels can be specified for each column in a dict

Dictionary

Dictionary
Few more are there..
Processing Large Text Files 10000 rows x 5 columns

1) Read in a small piece of a file

or
2) Iterate through smaller chunks of the file.

One time Pandas settings

TextParser is also equipped
with a get_chunk method
that enables you to read
pieces of an arbitrary size.

The object returned is not a data frame but an iterator.

To get the data will need to iterate through this object.

Exercise
https://www.geeksforgeeks.org/how-to-load-a-massive-file-as-small-chunks-in-pandas/
T
E
X
T

P
A
R
S
E
R
Writing Data to Text Format Using DataFrame’s to_csv method

Delimiters : Comma Seprated, |

Represent Missing values by Empty
Strings or other Sentinel Values

*Use cat or type command depends on linux/windows

Writing Text Data to Console, sys.stdout Using DataFrame’s to_csv method

Delimiters : Comma Seprated, |

Represent Missing values by Empty Strings or other Sentinel Values

Row and Column Labels Can Be Disabled

Choose Subset of Columns from a DataFrame

Write only a subset of the columns,

and in an order of your choosing
Series also has a to_csv method
Use csv.Dialect

read_table()may fail in case a file with one or more malinformed lines are present. CSV files come in many different
flavors. To define a new format with a different delimiter, string quoting convention, or line terminator, we define a
simple subclass of csv.Dialect

class my_dialect(csv.Dialect): reader = csv.reader(f, dialect=my_dialect)

lineterminator = '\n'
delimiter = ';'
reader = csv.reader(f, delimiter='|') Only one option
quotechar = '"'
quoting = csv.QUOTE_MINIMAL

To write delimited files manually, you can use csv.writer. It accepts an open,
writable file object and the same dialect and format options as csv.reader:

with open('mydata.csv', 'w') as f:

writer = csv.writer(f, dialect=my_dialect)

writer.writerow(('one', 'two', 'three'))

writer.writerow(('1', '2', '3'))
writer.writerow(('4', '5', '6'))
writer.writerow(('7', '8', '9'))
JSON
JSON (short for JavaScript Object Notation) has become one of the standard formats for sending data by
HTTP request between web browsers and other applications. It is a much more free-form data format than a
tabular text form like CSV.

•JSON stands for JavaScript Object Notation

•JSON is a lightweight data-interchange format
•JSON is plain text written in JavaScript object notation
•JSON is used to send data between computers
•JSON is language independent *
•JSON is a text format for storing and transporting data

The file type for JSON files is ".json"

The MIME type for JSON text is "application/json"
JSON syntax is derived from JavaScript object notation
syntax:
•Data is in name/value pairs
•Data is separated by commas
•Curly braces hold objects
•Square brackets hold arrays
JSON defines only two data structures: objects and arrays.

An object is a set of name-value pairs,

and
An array is a list of values.

JSON defines seven value types:

string, number, object, array, true, false, and null.

Application

Commonly used for transmitting data in web applications (e.g., sending some
data from the server to the client, so it can be displayed on a web page, or
vice versa)

JavaScript Object Notation (JSON) is unstructured, flexible, and readable

by humans. Basically, you can dump data into the database however it
comes, without having to adapt it to any specialized database language (like
SQL)
IMPORTANT : import json

1)Convert JSON String to Python Dictionary : json.loads()

json.load() takes a file object and returns the json object

2)Convert a Python object to JSON : json.dumps()

Convert a JSON object or list of objects to a DataFrame: pd.DataFrame(..)

3)Convert JSON into a Series or DataFrame: pandas.read_json()

4)Export Data from pandas DataFrame(df) to JSON:

df.to_json() or df.to_json(orient='records'),
CONVERT CSVDATAFRAME JSON IN TWO WAYS – COLUMNWISE & RECORDWISE
df=pd.read_csv(r"C:\Users\Tilottama\OneDrive\DataWrangling\Lab\tg1.csv",header=None)
tg1.csv df

df.to_json(r'C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1json.json') STORE JSON COLUMNWISE

tg1json.json

df.to_json(r'C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1Recordjson.json', orient="records")

STORE JSON RECORDWISE
tg1Recordjson.json
CONVERT JSON file DATAFRAME CSV
Convert JSON String to Python Form : json.loads()
json.load() takes a file object and returns the json object
import json
res=open(r'C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1json.json')
print(type(res))
ogdata = json.load(res)
print(ogdata)
print(type(ogdata))

<class '_io.TextIOWrapper'>
{'0': {'0': 1, '1': 2}, '1': {'0': ' Pushpa', '1': ' Flower'}, '2': {'0': ' H1', '1': ' H2'}, '3': {'0': ' HRN1',
'1': ' HRN2'}}
<class 'dict'>

import pandas as pd
dforg=pd.DataFrame(ogdata)
print(dforg)
print(type(dforg))
dforg.to_csv(r"C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1backtoCSV.csv")
dforg.to_csv(r"C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1backtoCSVNoHeader.csv",header=None)
dforg.to_csv(r"C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1backtoCSVNoHeaderNoIndex.csv",header=None,index=None)
JSON LOAD() Vs LOADS()
Further Reading
• http://www.datasciencelovers.com/tag/read-file/
Exercises
• Python JSON Exercise with Solution (pynative.com)
Q/A Part 2
Q1) List any 5 functions a Data Analyst should know, to read and save data in a
particular format
Q2) Why Pandas for Reading file formats?
Q3) Whats are the Issues with dot notation – df.name? Instead we can use df[‘name’]
Q4) What is a file format?
Q5) Why should a data scientist understand different file formats?
Q6) Compare and Contrast CSV and JSON file formats
Q7) Write JSON for:
Q8) a)Write a csv file for :

b) Write JSON file for the above table.

Answer 1
Answer 2:
• Pandas provides so many options of reading data into a DataFrame
• Pandas comes with 18 readers for different sources of data. They include readers for CSV,
JSON, Parquet files and ones that support reading from SQL databases or even HTML
documents.

https://gretel.ai/blog/a-guide-to-load-almost-anything-into-a-dataframe

Answer 3:
Issues with the dot notation
There are three issues with using dot notation. It doesn’t work in the following situations:
•When there are spaces in the column name, eg df.favorite food
insead use df['favorite food']
•When the column name is the same as a DataFrame method, eg df.count, use df['count']
•When the column name is a variable
Eg > col = 'height'
> df[col]
Ans 4) A file format is a standard way in which information is encoded for storage in a file.
First, the file format specifies whether the file is a binary or ASCII file. Second, it shows
how the information is organized. For example, comma-separated values (CSV) file format
stores tabular data in plain text

Ans 5) Usually, the files you will come across will depend on the application you are
building. As a data scientist, you need to understand the underlying structure of various file
formats, their advantages and dis-advantages. Unless you understand the underlying
structure of the data, you will not be able to explore it. Also, at times you need to make
decisions about how to store data. Choosing the optimal file format for storing data can
improve the performance of your models in data processing. For example, in an image
processing system, you need image files as input and output. So you will mostly see files in
jpeg, gif or png format.
SR.NO JSON CSV
Ans 6
1. JSON stands for JavaScript Object Notation. CSV stands for Comma separated value.

It is used as the syntax for storing and exchanging the It is a plain text format with a series of
2.
data. values separated by commas.

3. JSON file saved with extension as .json. CSV file saved with extension as .csv.

4. It is more versatile. It is less versatile.

It is used for for key, value store and supports arrays, It is a standard of saving tabular
5.
objects as values. information into a delimited text file.

6. It mainly uses the JavaScript data types. It does not have any data types.

7. It is less secured. It is more secured.

8. It consumes more memory as compared to CSV. It consumes less memory.

It support a lot of scalability in terms of adding and It does not support a lot in terms of
9.
editing the content. scalability.
It is more compact than other file
10. It is less compact as compared to CSV file .
formats
Ans 7) JSON
Data Wrangling & Visualization
PROJECT-2
Dr Tilottama Goswami
Professor
Department of Artificial Intelligence, Anurag University
tilottamagoswami.co.in
Agenda
PART 1
1. Read CSV File
2. Clean Data – Missing Values
3. Filter Data – Relevant information to be captured
4. Write CSV File
5. Convert the given CSV File to JSON File
Rules
PROJECT
GROUPS
Grp- Roll
• Sets of Tasks given each week G1. 1-10
G2. 11-20
• According to Roll Number the student is assigned to the Task Set G3. 21-30
G4. 31-40
• Each Week – assigned 10 marks. G5. 41-50
• Upload the code file to google classroom within the due date G6. 51-60
---------
• Viva will be conducted at the end of each Project Part 61-G1
62-G2
63-G3
64-G4
65-G5
66-G6
PROJECTS II – PART A
Project1: Residence and Gender Project2 : Residence and Gender Project3 : Residence and Gender
RESIDENCE-SET-1.csv RESIDENCE-SET-2.csv RESIDENCE-SET-3.csv

G1,G2
PROJECT 1 G3,G4
PROJECT 2

G5,G6
PROJECT 3
Snapshot of the csv file. The File(s) are uploaded in GoogleClassroom
TASKS- for All the Projects[1,2,3][10 Marks]
Write Roll number, Grp Number, Project number, Section, comments
for questions and upload the .ipynb in Google Classroom
PROJECT 1 : RESIDENCE-SET-1.csv
PROJECT 2 : RESIDENCE-SET-2.csv
PROJECT 3 : RESIDENCE-SET-3.csv
1. Read the csv file
2. Data Clean- Remove the rows with missing data
3. Store the clean data in CleanDataResidence.json file
4. Consider the clean data and create a csv files based on gender basis
5. Consider the clean data and create a csv file for girls students who are from villages
6. Find the count of girls and boys from villages, and save it in json file
PROJECTS II – PART B
Project4: All Mid Marks Project5 : All Mid Marks Project6 : All Mid Marks
RESIDENCE-SET-1.csv RESIDENCE-SET-2.csv RESIDENCE-SET-3.csv

All Mid Marks

G1,G2
PROJECT 1 G3,G4
PROJECT 2

G5,G6
PROJECT 3
Snapshot of the csv file. The File(s) are uploaded in GoogleClassroom
TASKS- for All the Projects[4,5,6][10 Marks]
Write Roll number, Grp Number, Project number, Section, comments
for questions and upload the .ipynb in Google Classroom
PROJECT 1 : MIDMARKS-SET-1.csv
PROJECT 2 : MIDMARKS-SET-2.csv
PROJECT 3 : MIDMARKS-SET-3.csv
1. Read the csv file
2. Data Clean- Remove the rows with missing data
3. Store the clean data in CleanDataResidence.json file
4. Consider the clean data and create csv files based on subject wise for all mids.
5. Find the number of students who got less than 10 in all subjects and also >14 in all subjects in Mid1 ,
and save it in MidDataAnalysis.json file
Count CS1 DM DS PP JP P&S
<10
>14
If you don’t reveal some insights soon, I’m
going to be forced to slice, dice and drill !!

CIF Delta Report - Error 152 For Planned Orders
No ratings yet
CIF Delta Report - Error 152 For Planned Orders
10 pages
JS-A.L.U01 (Types and Coercion)
No ratings yet
JS-A.L.U01 (Types and Coercion)
92 pages
C++ Practical File
60% (5)
C++ Practical File
61 pages
Rotor-Cuda 1.0.7 Tutorial
No ratings yet
Rotor-Cuda 1.0.7 Tutorial
17 pages
Scala Cheatsheet
No ratings yet
Scala Cheatsheet
2 pages
Unit 1 Python
No ratings yet
Unit 1 Python
12 pages
Exception Handling
No ratings yet
Exception Handling
28 pages
18.1OOP's Part - 1 STUDY MATERIAL PDF
No ratings yet
18.1OOP's Part - 1 STUDY MATERIAL PDF
31 pages
Advanced Python
No ratings yet
Advanced Python
204 pages
BCA 428 Oracle
No ratings yet
BCA 428 Oracle
142 pages
06 Linux Shell Programming
No ratings yet
06 Linux Shell Programming
59 pages
Javalab File
No ratings yet
Javalab File
167 pages
Unit-2 Dbms
No ratings yet
Unit-2 Dbms
74 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Sqoop Commands - Latest
No ratings yet
Sqoop Commands - Latest
4 pages
AWS Notes
No ratings yet
AWS Notes
1 page
String Data Type PDF
No ratings yet
String Data Type PDF
24 pages
C++ Online Quiz - TutorialsPoint
No ratings yet
C++ Online Quiz - TutorialsPoint
3 pages
Data Science Assignment 1
No ratings yet
Data Science Assignment 1
20 pages
Unit 2 PDF
No ratings yet
Unit 2 PDF
46 pages
Code With Harry
No ratings yet
Code With Harry
80 pages
Private Methods in Interfaces
No ratings yet
Private Methods in Interfaces
7 pages
VBA MACRO Course Training Syllabus PDF
No ratings yet
VBA MACRO Course Training Syllabus PDF
1 page
Strings PDF
No ratings yet
Strings PDF
14 pages
Jntu Mca Linux Lab Programs
No ratings yet
Jntu Mca Linux Lab Programs
23 pages
Acceleo User Guide
No ratings yet
Acceleo User Guide
56 pages
Basics: Showing Output To User
No ratings yet
Basics: Showing Output To User
17 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Chapter - 11 - Regular Expressions
100% (1)
Chapter - 11 - Regular Expressions
10 pages
Pandas Guide
No ratings yet
Pandas Guide
64 pages
Data Science Course Content
No ratings yet
Data Science Course Content
8 pages
02 Python Statements
No ratings yet
02 Python Statements
28 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
Revision Point - Series
No ratings yet
Revision Point - Series
5 pages
RegularExpression Notes
No ratings yet
RegularExpression Notes
17 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Set Data Structure PDF
No ratings yet
Set Data Structure PDF
7 pages
Chapter 3: Structured Types, Mutability and Higher-Order Functions
100% (1)
Chapter 3: Structured Types, Mutability and Higher-Order Functions
34 pages
Python Pyramid Program
No ratings yet
Python Pyramid Program
4 pages
Durga Core Java
50% (2)
Durga Core Java
2 pages
Packages: DURGASOFT, # 202, 2 Floor, HUDA Maitrivanam, Ameerpet, Hyderabad - 500038
No ratings yet
Packages: DURGASOFT, # 202, 2 Floor, HUDA Maitrivanam, Ameerpet, Hyderabad - 500038
4 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
UNIT4
No ratings yet
UNIT4
7 pages
Unix and Shell Programming Solutions
100% (2)
Unix and Shell Programming Solutions
54 pages
What Is LINQ PDF
No ratings yet
What Is LINQ PDF
145 pages
Decode Java With Dsa Pwskills
No ratings yet
Decode Java With Dsa Pwskills
14 pages
Introduction To Programming Using Python
No ratings yet
Introduction To Programming Using Python
11 pages
Unix File System
No ratings yet
Unix File System
95 pages
Python Generators PDF
No ratings yet
Python Generators PDF
6 pages
Complete Guide To Spark Memory Management 1726709042
No ratings yet
Complete Guide To Spark Memory Management 1726709042
11 pages
27 Python Database Programming Study Material PDF
100% (1)
27 Python Database Programming Study Material PDF
17 pages
Java AWT and Swing
No ratings yet
Java AWT and Swing
93 pages
Java Means Durga Soft: DURGA SOFTWARE SOLUTIONS, 202 HUDA Maitrivanam, Ameerpet, Hyd. PH: 040-64512786
No ratings yet
Java Means Durga Soft: DURGA SOFTWARE SOLUTIONS, 202 HUDA Maitrivanam, Ameerpet, Hyd. PH: 040-64512786
19 pages
Je Golper Sesh Nei by Debprasad Chattopadhyay (Amarboi - Com) PDF
No ratings yet
Je Golper Sesh Nei by Debprasad Chattopadhyay (Amarboi - Com) PDF
113 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
12 Ip
No ratings yet
12 Ip
5 pages
Python Notes
No ratings yet
Python Notes
49 pages
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Hands-On Microservices with JavaScript: Build scalable web applications with JavaScript, Node.js, and Docker
From Everand
Hands-On Microservices with JavaScript: Build scalable web applications with JavaScript, Node.js, and Docker
Tural Suleymani
No ratings yet
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
Oracle VM Manager 2.1.2
From Everand
Oracle VM Manager 2.1.2
Tarry Singh
No ratings yet
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
OOP1 Unit-4
No ratings yet
OOP1 Unit-4
101 pages
Sample IMRAD Research - PROG Strand
No ratings yet
Sample IMRAD Research - PROG Strand
65 pages
CS193X: Web Programming Fundamentals: Spring 2017 Victoria Kirst (Vrk@stanford - Edu)
No ratings yet
CS193X: Web Programming Fundamentals: Spring 2017 Victoria Kirst (Vrk@stanford - Edu)
74 pages
Design and Fabrication of A Programmable 5-DOF Autonomous Robotic Arm Journal
No ratings yet
Design and Fabrication of A Programmable 5-DOF Autonomous Robotic Arm Journal
6 pages
SQL Module 2 Assignment
No ratings yet
SQL Module 2 Assignment
40 pages
C Language Preliminaries: C-Programming Tutorial 1. Introduction and Basics of C
No ratings yet
C Language Preliminaries: C-Programming Tutorial 1. Introduction and Basics of C
13 pages
Synopse MORMot Framework SAD 1.18
No ratings yet
Synopse MORMot Framework SAD 1.18
1,055 pages
7 - Part III MFC's Document-View Architecture
No ratings yet
7 - Part III MFC's Document-View Architecture
136 pages
6 H Data With Hive Big Data Analytics B.tech. Final Year
No ratings yet
6 H Data With Hive Big Data Analytics B.tech. Final Year
24 pages
AAبهيه حسين سعد-1-2
No ratings yet
AAبهيه حسين سعد-1-2
4 pages
SAS Syntax Handout 1
No ratings yet
SAS Syntax Handout 1
2 pages
Text Twist
No ratings yet
Text Twist
7 pages
Senior Design Verification Engineer in Phoenix AZ Resume Lloyd Heath
No ratings yet
Senior Design Verification Engineer in Phoenix AZ Resume Lloyd Heath
3 pages
Lecture02 IDB
No ratings yet
Lecture02 IDB
30 pages
Notes SYBBACA 304-Angular-Js
No ratings yet
Notes SYBBACA 304-Angular-Js
127 pages
Baba Borra Java Resume
No ratings yet
Baba Borra Java Resume
5 pages
Microprocessing and Interfacing Lab 1
No ratings yet
Microprocessing and Interfacing Lab 1
23 pages
Jwjsnsns
No ratings yet
Jwjsnsns
40 pages
Conversion Program For 168 Infotype - SAP HR ABAP
No ratings yet
Conversion Program For 168 Infotype - SAP HR ABAP
5 pages
Stanford Dsa
No ratings yet
Stanford Dsa
52 pages
Pascal - Exercise
No ratings yet
Pascal - Exercise
18 pages
Unit - IV Testing Application
No ratings yet
Unit - IV Testing Application
19 pages
Traffic Simulation: A Project Report On
No ratings yet
Traffic Simulation: A Project Report On
17 pages
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
No ratings yet
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
11 pages
QSTNS - AI For Pasting
No ratings yet
QSTNS - AI For Pasting
10 pages
Program 12 12. Write A Program To Perform Multiplication of Matrices
No ratings yet
Program 12 12. Write A Program To Perform Multiplication of Matrices
5 pages
Chapter - 2: Object Oriented System Analysis & Design (OOAD)
No ratings yet
Chapter - 2: Object Oriented System Analysis & Design (OOAD)
49 pages

DWV Unit1

Uploaded by

DWV Unit1

Uploaded by

Data Wrangling & Visualization

TOTAL 10 CLASSES & 3 LABS

Projects for Assignments

pip install pandas

How to Check the Missing Data? Method 2

How to Check the Missing Data? Method 3

• Q1. What do you mean by Pandas in Python?

1. Wes McKinney. Python for Data Analysis: Data Wrangling with

The DataFrame has both a row and column index;

Add a new column “eastern” of boolean values

New columns cannot be created with the frame2.eastern syntax

frame2[column] works for any column name,

but frame2.column only works when the column

Assign a scalar value

Assign a array of values

I) When you are assigning lists or arrays to a column,

II) If you assign a Series,

Unlike Python sets, a pandas Index can contain duplicate labels:

Projects for Assignments

Data input and output using pandas,

Input and output typically falls into a few main categories:

Allow pandas to assign default column names

Customize the column Names

By default, pandas uses a set of commonly

1) Read in a small piece of a file

One time Pandas settings

The object returned is not a data frame but an iterator.

Delimiters : Comma Seprated, |

*Use cat or type command depends on linux/windows

Delimiters : Comma Seprated, |

Row and Column Labels Can Be Disabled

Write only a subset of the columns,

class my_dialect(csv.Dialect): reader = csv.reader(f, dialect=my_dialect)

with open('mydata.csv', 'w') as f:

writer = csv.writer(f, dialect=my_dialect)

writer.writerow(('one', 'two', 'three'))

•JSON stands for JavaScript Object Notation

The file type for JSON files is ".json"

An object is a set of name-value pairs,

JSON defines seven value types:

JavaScript Object Notation (JSON) is unstructured, flexible, and readable

1)Convert JSON String to Python Dictionary : json.loads()

2)Convert a Python object to JSON : json.dumps()

Convert a JSON object or list of objects to a DataFrame: pd.DataFrame(..)

4)Export Data from pandas DataFrame(df) to JSON:

df.to_json(r'C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1json.json') STORE JSON COLUMNWISE

df.to_json(r'C:\Users\Tilottama\OneDrive\Data Wrangling\Lab\tg1Recordjson.json', orient="records")

b) Write JSON file for the above table.

4. It is more versatile. It is less versatile.

7. It is less secured. It is more secured.

8. It consumes more memory as compared to CSV. It consumes less memory.

All Mid Marks

You might also like