Pandas 2
Pandas 2
# Create DataFrame
cart = {'Product': ['Mobile', 'AC', 'Mobile', 'Sofa', 'Laptop'],
'Price': [20000, 28000, 22000, 19000, 45000],
'Year': [2014, 2015, 2016, 2017, 2018]
}
df = DataFrame(cart, columns = ['Product', 'Price', 'Year'])
# Original DataFrame
print("Original DataFrame:\n", df)
output
Original DataFrame:
Product Price Year
0 Mobile 20000 2014
1 AC 28000 2015
2 Mobile 22000 2016
3 Sofa 19000 2017
4 Laptop 45000 2018
Get the Descriptive Statistics for Pandas
DataFrame
Below are the examples from which we can understand
about descriptive statistics in Pandas in Python:
Descriptive Statistics in Pandas of Price Column
Descriptive Statistics in Pandas of Year Column
Descriptive Statistics of Whole DataFrame
Descriptive Statistics in Pandas of Data Individually
Descriptive Statistics in Pandas of Price Column
In this example, a DataFrame is created with product details, prices,
and years. Descriptive statistics, including count, mean, and
standard deviation of the ‘Price’ column, are then computed and
displayed using describe() method.
Python3
stats = df['Price'].describe()
print(stats)
Output:
1
Descriptive statistics of Price:
count 5.000000
mean 26800.000000
std 9986.532963
min 19000.000000
25% 20000.000000
50% 22000.000000
75% 28000.000000
max 45000.000000
Name: Price, dtype: float64
Descriptive Statistics in Pandas of Year Column
In this example, a DataFrame is created to represent products with
their prices and respective years. The descriptive statistics, such as
count, mean, and standard deviation of the ‘Year’ column, are
computed and printed.
Python3
stats = df['Year'].describe()
print(stats)
Output:
Descriptive statistics of year:
count 5.000000
mean 2016.000000
std 1.581139
min 2014.000000
25% 2015.000000
50% 2016.000000
75% 2017.000000
max 2018.000000
Name: Year, dtype: float64
Descriptive Statistics of Whole DataFrame
In this example, a DataFrame is constructed with product details,
prices, and years. The entire DataFrame’s descriptive statistics,
encompassing all columns, are computed and displayed, including
count, unique values, top value, and frequency for categorical
columns, and mean, standard deviation, and quartile information for
numerical columns.
Python3
2
# Describing descriptive statistics of whole dataframe
stats = df.describe(include='all')
print(stats)
Output:
Descriptive statistics of whole dataframe:
Product Price Year
count 5 5.000000 5.000000
unique 4 NaN NaN
top Mobile NaN NaN
freq 2 NaN NaN
mean NaN 26800.000000 2016.000000
std NaN 9986.532963 1.581139
min NaN 19000.000000 2014.000000
25% NaN 20000.000000 2015.000000
50% NaN 22000.000000 2016.000000
75% NaN 28000.000000 2017.000000
max NaN 45000.000000 2018.000000
Descriptive Statistics in Pandas of Data Individually
Let’s print all the descriptive statistical data individually. In this
example, a DataFrame named df is created containing product
names, their respective prices, and purchase years. Various
statistics related to the ‘Price’ column, such as count, mean,
maximum value, and standard deviation, are calculated and printed.
Python3
# Count of Price
print("\nCount of Price:")
counts = df['Price'].count()
print(counts)
# Mean of Price
print("\nMean of Price:")
3
m = df['Price'].mean()
print(m)
mx = df['Price'].max()
print(mx)
sd = df['Price'].std()
print(sd)
Output:
Count of Price:
5
Mean of Price:
26800.0
Maximum value of Price:
45000
Standard deviation of Price:
9986.53296259569
Pandas is a very popular Python library that offers a set of functions and data
structures that aid in data analysis more efficiently. The Pandas package is
mainly used for data pre-processing purposes such as data cleaning,
manipulation, and transformation. Hence, it is a very handy tool for data
scientists and analysts. Let’s find out how to read and write files using
pandas.
We will cover the following sections:
Data Structures in Pandas
Writing a File Using Pandas
4
Reading a File Using Pandas
Importing a CSV File into the DataFrame
Endnotes
Data Structures in Pandas
There are two main types of Data Structures in Pandas –
Pandas Series: 1D labeled homogeneous array, size-immutable
Pandas DataFrame: 2D labeled tabular structure, size-mutable
Mutability refers to the tendency to change. When we say a value is mutable,
it means that it can be changed.
data
Copy code
5
Want to get exclusive news related to your field for free? Sign up now!
Your Email
+91
Your mobile number
Bangalore is your current location
I agree to the Shiksha’s Terms and Conditions and Privacy Policy and provide consent to be
contacted for promotion via whatsapp, sms, mail, etc.
Done
Already have an account? Login
6
NumPy Universal functions (ufuncs in short) are simple
mathematical functions that operate on ndarray (N-dimensional
array) in an element-wise fashion.
It supports array broadcasting, type casting, and several other
standard features. NumPy provides various universal functions
like standard trigonometric functions, functions for
arithmetic operations, handling complex numbers, statistical
functions, etc.
Characteristics of NumPy ufuncs
These functions operate on ndarray (N-dimensional array) i.e.
NumPy’s array class.
It performs fast element-wise array operations.
It supports various features like array broadcasting, type casting,
etc.
Numpy universal functions are objects that belong
to numpy.ufunc class.
Python functions can also be created as a universal function using
the frompyfunc library function.
Some ufuncs are called automatically when the corresponding
arithmetic operator is used on arrays. For example, when the
addition of two arrays is performed element-wise using the ‘+’
operator then np.add() is called internally.
Statistical functions
These functions calculate the mean, median, variance, minimum, etc. of array
elements.
They are used to perform statistical analysis of array elements.
It includes functions like:
Function Description
7
ufunc’s Statistical Functions in NumPy
import numpy as np
8
print(np.amin(weight), np.amax(weight))
print(np.ptp(weight))
# percentile
print(np.percentile(weight, 70))
# mean
print(np.mean(weight))
# median
print(np.median(weight))
# standard deviation
print(np.std(weight))
# variance
print(np.var(weight))
# average
print(np.average(weight))
9
Output
Minimum and maximum weight of the students:
45.0 73.25
10
# Finding remainder using the mod() and remainder() ufunc
mod_result = np.mod(array_a, array_b)
remainder_result = np.remainder(array_a, array_b)
# Finding both the quotient and the the mod using divmod()ufunc
quotient_result = np.divmod(array_a, array_b)
print("Array A:", array_a)
print("Array B:", array_b)
print("Addition Result:", addition_result)
print("Subtraction Result:", subtraction_result)
print("Multiplication Result:", multiplication_result)
print("Division Result:", division_result)
print("Power Result:", power_result)
print("Mod Result:", mod_result)
print("Remainder Result:", remainder_result)
print("Quotient Result:", quotient_result)
Output:
Array B: [2 4 6 8]
Mod Result: [0 0 0 6]
Remainder Result: [0 0 0 6]
sort
The Numpy unique() function is used to return the sorted unique elements
of an array. It can also optionally return the indices of the input array that
give the unique values and the counts of each unique value.
Syntax
Following is the syntax of Numpy unique() function −
11
Parameters
Following are the parameters of the Numpy unique() function −
Example 1
Following is the example of Numpy unique() function in which creating an
array with the unique values of the given input array −
Open Compiler
import numpy as np
# Create a 1D array
a = np.array([5, 2, 6, 2, 7, 5, 6, 8, 2, 9])
print('First array:')
print(a)
print('\n')
Output
First array:
[5 2 6 2 7 5 6 8 2 9]
import numpy as np
arr = np.array([20, 8, 32, 36, 16])
gcd = np.gcd.reduce(arr)
print(gcd) # Output: 4
A = np.array([1, 3, 5])
B = np.array([0, 2, 3])
13
# union of two arrays
result = np.union1d(A, B)
print(result)
# Output: [0 1 2 3 5]
Run Code
A = np.array([1, 3, 5])
B = np.array([0, 2, 3])
14
print(result)
# Output: [3]
Run Code
A = np.array([1, 3, 5])
B = np.array([0, 2, 3])
print(result)
# Output: [1 5]
Run Code
15
Set Symmetric Difference Operation in NumPy
The symmetric difference between two sets A and B includes all elements
of A and B without the common elements.
A = np.array([1, 3, 5])
B = np.array([0, 2, 3])
print(result)
# Output: [0 1 2 5]
Run Code
import numpy as np
# Output: [1 2 3 4 5 7]
Run Code
17