PP Unit 4 Q&A
PP Unit 4 Q&A
1A)
a)
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects
and a collection of routines for processing those arrays. Using NumPy, mathematical and logical
operations on arrays can be performed.
Using NumPy, a developer can perform the following operations −
• Mathematical and logical operations on arrays.
• Fourier transforms and routines for shape manipulation.
• Operations related to linear algebra. NumPy has in-built functions for linear algebra
and random number generation.
We will discuss four operations that can be performed on NumPy arrays, they are:
add(): -
add() function is used to when we want to calculate the sum of two NumPy arrays.
Example:
subtract(): -
subtract() function is used to when we want to calculate the difference between two NumPy arrays.
Example:
multiply(): -
multiply() function is used to when we want to calculate the element-wise product of two NumPy
arrays. This is not the same as matrix multiplication.
divide(): -
dividet() function is used to when we want to divide two NumPy arrays element wise.
Example:
b)
Universal functions in NumPy are simple mathematical functions. It is just a term that we gave to
mathematical functions in the NumPy library. NumPy provides various universal functions that
cover a wide variety of operations.
These functions include standard trigonometric functions, functions for arithmetic operations,
handling complex numbers, statistical functions, etc
np.deg2rad() converts degrees to radians and np.sin() calculates the sine values of the array.
np.amin() returns the smallest element in the array and np.amax() returns the largest element in
the array.
np.bitwise_and() perform “bitwise and” on two arrays. This is shown in the following example:
a) Explain in detail about aggregation with examples.
2.
b) Explain about fancy indexing with examples.
2A)
a)
In the Python numpy module, we have many aggregate functions or statistical functions to
work with a single-dimensional or multi-dimensional array. The Python numpy aggregate
functions are sum, min, max, mean, average, product, etc.
sum(): -
Python numpy sum function calculates the sum of values in an array.
Syntax: - array.sum()
Example:
average(): -
Python numpy average function returns the average of a given array.
Syntax: - np.average(array)
Example:
min(): -
The Python numpy min function returns the minimum value in an array or a given axis.
Syntax: - array.min()
Example:
max(): -
The Python numpy max function returns the maximum number from a given array.
Syntax: - array.max()
Example:
b)
Fancy indexing is like the simple indexing we’ve already seen, but we pass arrays of indices in place
of single scalars. This allows us to very quickly access and modify complicated subsets of an array’s
values.
With fancy indexing, the shape of the result reflects the shape of the index arrays rather than the
shape of the array being indexed.
In this program, we first define an numpy array of random integers using np.random.randit() and
assign to the variable ‘array-1’.
Then, we assign an numpy array consisting of index values in a 2x2 matrix form, to the variable
‘ind’. We then print this array_1[ind] and it will return the values from the from the array in 2x2
matrix form. This is how fancy indexing works.
3A)
a)
In some cases, we require a sorted array for computation. For this purpose, the numpy module of
Python provides a function called numpy.sort(). This function gives a sorted copy of the source array
or input array.
Syntax:
Parameters:
a: array
This parameter defines the axis along which the sorting is performed. If this parameter is None, the
array will be flattened before sorting, and by default, this parameter is set to -1, which sorts the array
along the last axis.
This parameter is used to define the sorting algorithm, and by default, the sorting is performed
using 'quicksort'.
When an array is defined with fields, its order defines the fields for making a comparison in first,
second, etc. Only the single field can be specified as a string, and not necessarily for all fields.
However, the unspecified fields will still be used, in the order in which they come up in the dtype, to
break the ties.
This function returns a sorted copy of the source array, which will have the same shape and type as
a source array.
Example:
b)
NumPy’s Structured Array is similar to Struct in C. It is used for grouping data of different types
and sizes. Structure array uses data containers called fields. Each data field can contain data of any
type and size. Array elements can be accessed with the help of dot notation. Arrays with named
fields that can contain data of various types and sizes.
Example:
In this example, we define the field name and its data type as a list of tuples and assign it to the
variable ‘data_type’ .
We then define a numpy array in which we give the values following the sequence in which we
defined the data type. After giving the values, we use ‘dtype’ attribute available in the np.array()
function and assign ‘data_type’ to it.
When we print the data, we get the output in the form of list of tuples. This is how structured
arrays work.
4A)
a)
A Data Frame is a two-dimension collection of data. It is a data structure where data is stored in tabular
form. Datasets are arranged in rows and columns; we can store multiple datasets in the data frame. We can
perform various arithmetic operations, such as adding column/row selection and columns/rows in the data
frame.
An empty dataframe
We can create a basic empty Dataframe. The dataframe constructor needs to be called to create the
DataFrame.
Example:
The dict of ndarray/lists can be used to create a dataframe, all the ndarray must be of the same
length. The index will be a range(n) by default; where n denotes the array length.
Example:
b)
The Python and NumPy indexing operators "[ ]" and attribute operator "." provide quick and easy
access to Pandas data structures across a wide range of use cases. Pandas supports three types of
Multi-axes indexing; the three types are mentioned in the following table −
1
.loc()
Label based
2
.iloc()
Integer based
.loc()
Pandas provide various methods to have purely label based indexing. When slicing, the start bound
is also included. Integers are valid labels, but they refer to the label and not the position.
.loc() has multiple access methods like −
Example:
.iloc()
Pandas provide various methods in order to get purely integer based indexing. Like python and
numpy, these are 0-based indexing.
The various access methods are as follows −
• An Integer
• A list of integers
• A range of values
Example:
a) Explain data indexing & data selection for DataFrame object in Pandas.
5.
b) Explain various universal functions performed on Series object in Pandas.
5A)
a)
The Python and NumPy indexing operators "[ ]" and attribute operator "." provide quick and easy
access to Pandas data structures across a wide range of use cases. Pandas supports three types of
Multi-axes indexing; the three types are mentioned in the following table −
1
.loc()
Label based
2
.iloc()
Integer based
.loc()
Pandas provide various methods to have purely label based indexing. When slicing, the start bound
is also included. Integers are valid labels, but they refer to the label and not the position.
.loc() has multiple access methods like −
Example:
.iloc()
Pandas provide various methods in order to get purely integer based indexing. Like python and
numpy, these are 0-based indexing.
The various access methods are as follows −
• An Integer
• A list of integers
• A range of values
Example:
b)
One of the essential pieces of NumPy is the ability to perform quick element-wise operations, both
with basic arithmetic and with more sophisticated operations. Pandas inherits much of this
functionality from NumPy, and the ufuncs.
When these operations are used on panda series, it follows certain principles like index preservation
and index alignment.
Index Preservation
When we use general NumPy functions on panda series, the result will be a panda series with the
operation performed on it and the indices assigned to the values before the operation is preserved
after the operation too.
Example:
Index Alignment:
For binary operations on two Series objects, Pandas will align indices in the process of performing
the operation. This is very convenient when you are working with incomplete data.
Example:
In the example, you can see that the first series doesn’t contain index ‘3’ and the second series
doesn’t contain index ‘0’ yet, when we add the two series, there is no error instead there is a ‘NaN’
value present at index ‘0’ and index ‘3’.
Similarly, many arithmetic, trigonometrical, statistical, etc. operations can be used on Panda series
object.
6A)
a)
One of the essential pieces of NumPy is the ability to perform quick element-wise operations, both
with basic arithmetic and with more sophisticated operations. Pandas inherits much of this
functionality from NumPy, and the ufuncs.
When these operations are used on panda DataFrames, it follows certain principles like index
preservation and index alignment.
Index Preservation
When we use general NumPy functions on panda DataFrames, the result will be a panda DataFrame
with the operation performed on it and the indices assigned to the values before the operation is
preserved after the operation too.
Example:
Index Alignment:
For binary operations on two DataFrame objects, Pandas will align indices in the process of
performing the operation. This is very convenient when you are working with incomplete data.
Example:
In the example, you can see that the first DataFrame doesn’t contain index ‘C’ and column ‘C’, but
after the operation there is no error raised instead, every value in the row and column involving row
‘C’ and column ‘C’ is replaced with ‘NaN’ value.
Similarly, many arithmetic, trigonometrical, statistical, etc. operations can be used on Panda series
object.
b)
The difference between data found in many tutorials and data in the real world is that real-world
data is rarely clean and homogeneous. In particular, many interesting datasets will have some
amount of data missing. To make matters even more complicated, different data sources may
indicate missing data in different ways.
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both
function help in checking whether a value is NaN or not. This function can also be used in Pandas
Series in order to find null values in a series.
We use functions like fillna(), replace(), etc. to handle missing data in python.
fillna()
This function fills the missing values with the value we pass in the parenthesis.
Example:
replace()
The replace() function takes in two attributes ‘to_replace’ and ‘value’. ‘to_replace’ is used to specify
which value is to be replaced and ‘value’ is used to specify with what should the value be replaced
with.
Example:
a) In hierarchical indexing what are the various methods of Multi-index Creation?
7.
b) Explain in brief about concat and append methods in regards to combining datasets.
7A)
a)
For enhancing the capabilities of Data Processing, we have to use some indexing that helps to sort
the data based on the labels. So, Hierarchical indexing is coming into the picture and defined as an
essential feature of pandas that helps us to use the multiple index levels.
Example:
Creating multi-index using list of tuples:
For more flexibility in how the index is constructed, you can instead use the class method
constructors available in the pd.MultiIndex. You can construct it from a list of tuples, giving the
multiple index values of each point.
Example:
b)
Pandas is capable of combining Series, DataFrame, and Panel objects through different kinds of set
logic for the indexes and the relational algebra functionality.
concat()
The concat() function is responsible for performing concatenation operation along an axis in the
DataFrame.
Syntax:
pd.concat(objs, axis=0, ignore_index = True\False)
Parameters:
o objs: It is a sequence or mapping of series or DataFrame objects.
Example:
append()
The append method is defined as a useful shortcut to concatenate the Series and DataFrame.
Syntax:
dataframe_1.append(dataframe_2)
Example:
a) ) Explain in brief about merge and join methods in regards to combining datasets.
8. b) Explain in brief about aggregation and grouping.
8A)
a)
Pandas DataFrame.merge()
Pandas merge() is defined as the process of bringing the two datasets together into one and aligning
the rows based on the common attributes or columns. It is an entry point for all standard database
join operations between DataFrame objects:
Syntax:
Parameters:
It is a column or index level names to join on. It must be found in both the left and right
DataFrames. If on is None and not merging on indexes, then this defaults to the intersection
of the columns in both DataFrames.
Example:
Pandas DataFrame.join()
Another method to combine DataFrames is to use columns in each dataset that contain common
values. The method of combining the DataFrame using common fields is called "joining". The
method that we use for combining the DataFrame is a join() method. The columns that contain
common values are called "join key".
Syntax:
Parameters:
lsuffix: It refers to a string object that has the default value ''. It uses the Suffix from the left
frame's overlapping columns.
rsuffix: It refers to a string value, that has the default value ''. It uses the Suffix from the right
frame's overlapping columns.
Example:
b)
Aggregation in Pandas
Aggregation in pandas provides various functions that perform a mathematical or logical operation
on our dataset and returns a summary of that function. Aggregation can be used to get a summary
of columns in our dataset like getting sum, minimum, maximum, etc. from a particular column of
our dataset. The function used for aggregation is agg(), the parameter is the function we want to
perform.
Example:
Grouping in Pandas
Grouping is used to group data using some criteria from our dataset. It is used as split-apply-
combine strategy.
We use groupby() function to group the data on “Maths” value. It returns the object as result.
Applying groupby() function to group the data on “Maths” value. To view result of formed groups
use first() function.
Example: