0% found this document useful (0 votes)
19 views

Pandas 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Pandas 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

from pandas import DataFrame

# Create DataFrame
cart = {'Product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'Price': [20000, 28000, 22000, 19000, 45000],
'Year': [2014, 2015, 2016, 2017, 2018]
}
df = DataFrame(cart, columns = ['Product', 'Price', 'Year'])

# Original DataFrame
print("Original DataFrame:\n", df)

output
Original DataFrame:
Product Price Year
0 Mobile 20000 2014
1 AC 28000 2015
2 Mobile 22000 2016
3 Sofa 19000 2017
4 Laptop 45000 2018
Get the Descriptive Statistics for Pandas
DataFrame
Below are the examples from which we can understand
about descriptive statistics in Pandas in Python:
 Descriptive Statistics in Pandas of Price Column
 Descriptive Statistics in Pandas of Year Column
 Descriptive Statistics of Whole DataFrame
 Descriptive Statistics in Pandas of Data Individually
Descriptive Statistics in Pandas of Price Column
In this example, a DataFrame is created with product details, prices,
and years. Descriptive statistics, including count, mean, and
standard deviation of the ‘Price’ column, are then computed and
displayed using describe() method.
 Python3

# Describing descriptive statistics of Price

print("\nDescriptive statistics of Price:\n")

stats = df['Price'].describe()

print(stats)

Output:

1
Descriptive statistics of Price:
count 5.000000
mean 26800.000000
std 9986.532963
min 19000.000000
25% 20000.000000
50% 22000.000000
75% 28000.000000
max 45000.000000
Name: Price, dtype: float64
Descriptive Statistics in Pandas of Year Column
In this example, a DataFrame is created to represent products with
their prices and respective years. The descriptive statistics, such as
count, mean, and standard deviation of the ‘Year’ column, are
computed and printed.
 Python3

# Describing descriptive statistics of Year

print("\nDescriptive statistics of year:\n")

stats = df['Year'].describe()

print(stats)

Output:
Descriptive statistics of year:
count 5.000000
mean 2016.000000
std 1.581139
min 2014.000000
25% 2015.000000
50% 2016.000000
75% 2017.000000
max 2018.000000
Name: Year, dtype: float64
Descriptive Statistics of Whole DataFrame
In this example, a DataFrame is constructed with product details,
prices, and years. The entire DataFrame’s descriptive statistics,
encompassing all columns, are computed and displayed, including
count, unique values, top value, and frequency for categorical
columns, and mean, standard deviation, and quartile information for
numerical columns.
 Python3

2
# Describing descriptive statistics of whole dataframe

print("\nDescriptive statistics of whole dataframe:\n")

stats = df.describe(include='all')

print(stats)

Output:
Descriptive statistics of whole dataframe:
Product Price Year
count 5 5.000000 5.000000
unique 4 NaN NaN
top Mobile NaN NaN
freq 2 NaN NaN
mean NaN 26800.000000 2016.000000
std NaN 9986.532963 1.581139
min NaN 19000.000000 2014.000000
25% NaN 20000.000000 2015.000000
50% NaN 22000.000000 2016.000000
75% NaN 28000.000000 2017.000000
max NaN 45000.000000 2018.000000
Descriptive Statistics in Pandas of Data Individually
Let’s print all the descriptive statistical data individually. In this
example, a DataFrame named df is created containing product
names, their respective prices, and purchase years. Various
statistics related to the ‘Price’ column, such as count, mean,
maximum value, and standard deviation, are calculated and printed.
 Python3

# Count of Price

print("\nCount of Price:")

counts = df['Price'].count()

print(counts)

# Mean of Price

print("\nMean of Price:")

3
m = df['Price'].mean()

print(m)

# Maximum value of Price

print("\nMaximum value of Price:")

mx = df['Price'].max()

print(mx)

# Standard deviation of Price

print("\nStandard deviation of Price:")

sd = df['Price'].std()

print(sd)

Output:
Count of Price:
5
Mean of Price:
26800.0
Maximum value of Price:
45000
Standard deviation of Price:
9986.53296259569

How to Read and Write Files Using Pandas


4 mins read2.1K ViewsComment

Pandas is a very popular Python library that offers a set of functions and data
structures that aid in data analysis more efficiently. The Pandas package is
mainly used for data pre-processing purposes such as data cleaning,
manipulation, and transformation. Hence, it is a very handy tool for data
scientists and analysts. Let’s find out how to read and write files using
pandas.
We will cover the following sections:
 Data Structures in Pandas
 Writing a File Using Pandas
4
 Reading a File Using Pandas
 Importing a CSV File into the DataFrame
 Endnotes
Data Structures in Pandas
There are two main types of Data Structures in Pandas –
 Pandas Series: 1D labeled homogeneous array, size-immutable
 Pandas DataFrame: 2D labeled tabular structure, size-mutable
Mutability refers to the tendency to change. When we say a value is mutable,
it means that it can be changed.

#Importing Pandas Library


import pandas as pd
Copy code

Creating a Pandas DataFrame

#Creating a Sample DataFrame


data = pd.DataFrame({
'id': [ 1, 2, 3, 4, 5, 6, 7],
'age': [ 27, 32, 23, 41, 37, 31, 49],
'gender': [ 'M', 'F', 'F', 'M', 'M', 'M', 'F'],
'occupation': [ 'Salesman', 'Doctor', 'Manager', 'Teacher', 'Mechanic', 'Lawyer', 'Nurse']
})

data
Copy code

Writing a File Using Pandas


Save the DataFrame we created above as a CSV file using pandas .to_csv() function,
as shown:

5
Want to get exclusive news related to your field for free? Sign up now!

 Helped 25K+ students

 Get news regarding upcoming exams, top colleges and more

News related to which course

Your Email
+91
Your mobile number
Bangalore is your current location

I agree to the Shiksha’s Terms and Conditions and Privacy Policy and provide consent to be
contacted for promotion via whatsapp, sms, mail, etc.
Done
Already have an account? Login

#Writing to CSV file


data.to_csv('data.csv')
Copy code
We can also save the DataFrame as an Excel file using pandas .to_excel() function, as
shown:

#Writing to Excel file


data.to_excel('data2.xlsx')
Copy code
Save the DataFrame we created above as a Text file using the same function that we
use for CSV files:

#Writing to Text file


data.to_csv('data3.txt')

NumPy ufuncs | Universal functions


Last Updated : 01 Feb, 2024



6
NumPy Universal functions (ufuncs in short) are simple
mathematical functions that operate on ndarray (N-dimensional
array) in an element-wise fashion.
It supports array broadcasting, type casting, and several other
standard features. NumPy provides various universal functions
like standard trigonometric functions, functions for
arithmetic operations, handling complex numbers, statistical
functions, etc.
Characteristics of NumPy ufuncs
 These functions operate on ndarray (N-dimensional array) i.e.
NumPy’s array class.
 It performs fast element-wise array operations.
 It supports various features like array broadcasting, type casting,
etc.
 Numpy universal functions are objects that belong
to numpy.ufunc class.
 Python functions can also be created as a universal function using
the frompyfunc library function.
 Some ufuncs are called automatically when the corresponding
arithmetic operator is used on arrays. For example, when the
addition of two arrays is performed element-wise using the ‘+’
operator then np.add() is called internally.

NumPy ufuncs are functions that operate on ndarray objects


in an element-by-element fashion. They provide a way to
execute mathematical, logical, and other operations on
arrays efficiently. Ufuncs support a wide range of arithmetic
operations such as addition, subtraction, multiplication,
division, and more.

Statistical functions

These functions calculate the mean, median, variance, minimum, etc. of array
elements.
They are used to perform statistical analysis of array elements.
It includes functions like:

ufunc’s Statistical Functions in NumPy

Function Description

amin, amax returns minimum or maximum of an

7
ufunc’s Statistical Functions in NumPy

array or along an axis

returns range of values (maximum-


ptp
minimum) of an array or along an axis

calculate the pth percentile of the


percentile(a, p, axis)
array or along a specified axis

compute the median of data along a


median
specified axis

compute the mean of data along a


mean
specified axis

compute the standard deviation of


std
data along a specified axis

compute the variance of data along a


var
specified axis

compute the average of data along a


average
specified axis

import numpy as np

# construct a weight array

weight = np.array([50.7, 52.5, 50, 58, 55.63, 73.25, 49.5, 45])

# minimum and maximum

print('Minimum and maximum weight of the students: ')

8
print(np.amin(weight), np.amax(weight))

# range of weight i.e. max weight-min weight

print('Range of the weight of the students: ')

print(np.ptp(weight))

# percentile

print('Weight below which 70 % student fall: ')

print(np.percentile(weight, 70))

# mean

print('Mean weight of the students: ')

print(np.mean(weight))

# median

print('Median weight of the students: ')

print(np.median(weight))

# standard deviation

print('Standard deviation of weight of the students: ')

print(np.std(weight))

# variance

print('Variance of weight of the students: ')

print(np.var(weight))

# average

print('Average weight of the students: ')

print(np.average(weight))

9
Output
Minimum and maximum weight of the students:
45.0 73.25

Range of the weight of the students:


28.25

Weight below which 70 % student fall:


55.317

Mean weight of the students:


54.3225

Median weight of the students:


51.6

Standard deviation of weight of the students:


8.05277397857

Variance of weight of the students:


64.84716875

Average weight of the students:


54.3225
Simple Arithmetic

ufuncs in NumPy allow for performing simple arithmetic operations on arrays


efficiently.
import numpy as np
array_a = np.array([10, 20, 30, 38])
array_b = np.array([2, 4, 6, 8])
# Addition using the add() ufunc
addition_result = np.add(array_a, array_b)
# Subtraction using the subtract() ufunc
subtraction_result = np.subtract(array_a, array_b)
# Multiplication using the multiply() ufunc
multiplication_result = np.multiply(array_a, array_b)
# Division using the divide() ufunc
division_result = np.divide(array_a, array_b)
# Finding power using the power() ufunc
power_result = np.power(array_a, array_b)

10
# Finding remainder using the mod() and remainder() ufunc
mod_result = np.mod(array_a, array_b)
remainder_result = np.remainder(array_a, array_b)
# Finding both the quotient and the the mod using divmod()ufunc
quotient_result = np.divmod(array_a, array_b)
print("Array A:", array_a)
print("Array B:", array_b)
print("Addition Result:", addition_result)
print("Subtraction Result:", subtraction_result)
print("Multiplication Result:", multiplication_result)
print("Division Result:", division_result)
print("Power Result:", power_result)
print("Mod Result:", mod_result)
print("Remainder Result:", remainder_result)
print("Quotient Result:", quotient_result)
Output:

Array A: [10 20 30 38]

Array B: [2 4 6 8]

Addition Result: [12 24 36 46]

Subtraction Result: [ 8 16 24 30]

Multiplication Result: [ 20 80 180 304]

Division Result: [5. 5. 5. 4.75]

Power Result: [ 100 160000 729000000 4347792138496]

Mod Result: [0 0 0 6]

Remainder Result: [0 0 0 6]

Quotient Result: (array([5, 5, 5, 4]), array([0, 0, 0, 6]))

sort

The Numpy unique() function is used to return the sorted unique elements
of an array. It can also optionally return the indices of the input array that
give the unique values and the counts of each unique value.

This function is useful for removing duplicates from an array and


understanding the frequency of elements.

Syntax
Following is the syntax of Numpy unique() function −

numpy.unique(arr, return_index, return_inverse, return_counts)

11
Parameters
Following are the parameters of the Numpy unique() function −

 arr: The input array. Will be flattened if not 1-D array.


 return_index: If True, returns the indices of elements in the input
array.
 return_inverse: If True, returns the indices of unique array, which can
be used to reconstruct the input array.
 return_counts: If True, returns the number of times the element in
unique array appears in the original array.

Example 1
Following is the example of Numpy unique() function in which creating an
array with the unique values of the given input array −

Open Compiler
import numpy as np

# Create a 1D array
a = np.array([5, 2, 6, 2, 7, 5, 6, 8, 2, 9])

print('First array:')
print(a)
print('\n')

# Get unique values in the array


print('Unique values of first array:')
u = np.unique(a)
print(u)
print('\n')

Output

First array:
[5 2 6 2 7 5 6 8 2 9]

Unique values of first array:


[2 5 6 7 8 9]

NumPy GCD Greatest Common Denominator


GCD, or "Greatest Common Divisor," also known as the greatest common factor or
highest common factor, represents the largest positive integer that can evenly divide
two or more integers without resulting in a remainder.
12
import numpy as np
n1 = 6
n2 = 9
gcd = np.gcd(n1, n2)
print(gcd) # Output: 3
To find the GCD of all values in an array, the reduce() method can be used.

import numpy as np
arr = np.array([20, 8, 32, 36, 16])
gcd = np.gcd.reduce(arr)
print(gcd) # Output: 4

NumPy Set Operations


A set is a collection of unique data. That is, elements of a set cannot be
repeated.

NumPy set operations perform mathematical set operations on arrays like


union, intersection, difference, and symmetric difference.

Set Union Operation in NumPy


The union of two sets A and B include all the elements of set A and B.

Set Union in NumPy

In NumPy, we use the np.union1d() function to perform the set union


operation in an array. For example,
import numpy as np

A = np.array([1, 3, 5])
B = np.array([0, 2, 3])
13
# union of two arrays
result = np.union1d(A, B)

print(result)

# Output: [0 1 2 3 5]
Run Code

In this example, we have used the np.union1d(A, B) function to compute the


union of two arrays: A and B .
Here, the function returns unique elements from both arrays.

Note: np.union1d(A,B) is equivalent to A ⋃ B set operation.

Set Intersection Operation in NumPy


The intersection of two sets A and B include the common elements between
set A and B.

Set Intersection in NumPy

We use the np.intersect1d() function to perform the set intersection


operation in an array. For example,
import numpy as np

A = np.array([1, 3, 5])
B = np.array([0, 2, 3])

# intersection of two arrays


result = np.intersect1d(A, B)

14
print(result)

# Output: [3]
Run Code

Note: np.intersect1d(A,B) is equivalent to A ⋂ B set operation.

Set Difference Operation in NumPy


The difference between two sets A and B include elements of set A that are
not present on set B.

Set Difference in NumPy

We use the np.setdiff1d() function to perform the difference between two


arrays. For example,
import numpy as np

A = np.array([1, 3, 5])
B = np.array([0, 2, 3])

# difference of two arrays


result = np.setdiff1d(A, B)

print(result)

# Output: [1 5]
Run Code

Note: np.setdiff1d(A,B) is equivalent to A - B set operation.

15
Set Symmetric Difference Operation in NumPy
The symmetric difference between two sets A and B includes all elements
of A and B without the common elements.

Set Symmetric Difference in NumPy

In NumPy, we use the np.setxor1d() function to perform symmetric


differences between two arrays. For example,
import numpy as np

A = np.array([1, 3, 5])
B = np.array([0, 2, 3])

# symmetric difference of two arrays


result = np.setxor1d(A, B)

print(result)

# Output: [0 1 2 5]
Run Code

Unique Values From a NumPy Array


To select the unique elements from a NumPy array, we use
the np.unique() function. It returns the sorted unique elements of an array. It
can also be used to create a set out of an array.
Let's see an example.

import numpy as np

array1 = np.array([1,1, 2, 2, 4, 7, 7, 3, 5, 2, 5])

# unique values from array1


result = np.unique(array1)
16
print(result)

# Output: [1 2 3 4 5 7]
Run Code

Here, the resulting array [1 2 3 4 5 7] contains only the unique elements of


the original array array1 .

17

You might also like