0% found this document useful (0 votes)
58 views46 pages

R23 2-1 Python Lab 4 J5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views46 pages

R23 2-1 Python Lab 4 J5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

PYTHON LAB

UNIT – IV
Sample Experiments:
1) Write a program to sort words in a file and put them in another file. The output file
should have only lower-case words, so any upper-case words from source must be
lowered?
Program :
# Function to sort words in a file and write to another file
def sort_words_in_file(input_file, output_file):
try:
# Open the input file and read the content
with open(input_file, 'r') as infile:
# Read all lines, split into words, and convert to lowercase
words = infile.read().split()
words = [word.lower() for word in words]

# Sort the words alphabetically


words.sort()

# Open the output file and write the sorted words


with open(output_file, 'w') as outfile:
for word in words:
outfile.write(word + '\n')

print(f"Words have been sorted and written to {output_file}")

except FileNotFoundError:
print(f"The file {input_file} does not exist.")
except Exception as e:
print(f"An error occurred: {e}")

# Example usage
input_file = 'source.txt' # Input file containing words
output_file = 'sorted_words.txt' # Output file to store sorted words
sort_words_in_file(input_file, output_file)

Explanation:
1. Reading the File: The program reads all the words from the input_file, splits them into
a list of words, and converts them to lowercase.
2. Sorting: The list of words is sorted alphabetically.
3. Writing to Output File: The sorted words are written to the output_file, one word per
line.
4. Error Handling: The program checks for file not found errors and handles other
potential exceptions.
You can modify the file paths (input_file, output_file) to match your setup.

2) Python program to print each line of a file in reverse order?


Program:
# Function to print each line of a file in reverse order
def print_reverse_lines(file_name):
try:
# Open the file in read mode
with open(file_name, 'r') as file:
# Read all lines from the file
lines = file.readlines()

# Iterate through each line


for line in lines:
# Strip newline characters and reverse the line
reversed_line = line.rstrip()[::-1]
print(reversed_line)

except FileNotFoundError:
print(f"The file {file_name} does not exist.")
except Exception as e:
print(f"An error occurred: {e}")

# Example usage
file_name = 'source.txt' # Input file name
print_reverse_lines(file_name)
Explanation:
1. Reading the File: The program reads the file line by line using readlines().
2. Reversing the Line: For each line, it removes the newline character (rstrip()) and
reverses the string using slicing ([::-1]).
3. Printing: The reversed line is printed out.
4. Error Handling: It handles the case where the file may not exist and catches any
other errors.
To use this program, set the file_name variable to the path of your input file.
3) Python program to compute the number of characters, words and lines in a file?
Program:
# Function to compute characters, words, and lines in a file
def count_file_details(file_name):
try:
# Initialize counters for characters, words, and lines
num_characters = 0
num_words = 0
num_lines = 0

# Open the file in read mode


with open(file_name, 'r') as file:
# Iterate through each line in the file
for line in file:
# Increment line count
num_lines += 1

# Count characters in the line (including spaces and newline characters)


num_characters += len(line)

# Split the line into words and count them


words = line.split()
num_words += len(words)

# Print the results


print(f"Number of characters: {num_characters}")
print(f"Number of words: {num_words}")
print(f"Number of lines: {num_lines}")

except FileNotFoundError:
print(f"The file {file_name} does not exist.")
except Exception as e:
print(f"An error occurred: {e}")

# Example usage
file_name = 'source.txt' # Input file name
count_file_details(file_name)
Explanation:
1. Reading the File: The file is read line by line using a for loop.
2. Character Counting: The length of each line (including spaces and newline
characters) is added to num_characters.
3. Word Counting: Each line is split into words using split(), and the number of words is
updated.
4. Line Counting: For each line in the file, num_lines is incremented.
5. Error Handling: The code handles cases where the file may not exist.
To run the program, set file_name to the path of the file you want to analyze.
4) Write a program to create, display, append, insert and reverse the order of the items
in the array?
Program:
from array import array

# Function to create and return an array


def create_array(elements):
arr = array('i', elements) # 'i' represents an array of integers
return arr

# Function to display array


def display_array(arr):
print("Array Contents: ", arr.tolist())

# Function to append an item to the array


def append_item(arr, item):
arr.append(item)
print(f"Appended {item} to the array.")
return arr

# Function to insert an item at a specific position in the array


def insert_item(arr, index, item):
arr.insert(index, item)
print(f"Inserted {item} at index {index}.")
return arr

# Function to reverse the order of items in the array


def reverse_array(arr):
arr.reverse()
print("Array reversed.")
return arr

# Example usage
if __name__ == "__main__":
# Create an array with initial values
elements = [10, 20, 30, 40, 50]
arr = create_array(elements)

# Display the array


display_array(arr)

# Append an item to the array


arr = append_item(arr, 60)
display_array(arr)

# Insert an item at a specific position in the array


arr = insert_item(arr, 2, 25) # Insert 25 at index 2
display_array(arr)

# Reverse the array


arr = reverse_array(arr)
display_array(arr)
Explanation:
1. create_array(): Creates an array of integers ('i' denotes integer type) from a list of
elements.
2. display_array(): Displays the contents of the array.
3. append_item(): Appends a new element to the end of the array.
4. insert_item(): Inserts an element at a specified index in the array.
5. reverse_array(): Reverses the order of the elements in the array.
Output:
javascript
Copy code
Array Contents: [10, 20, 30, 40, 50]
Appended 60 to the array.
Array Contents: [10, 20, 30, 40, 50, 60]
Inserted 25 at index 2.
Array Contents: [10, 20, 25, 30, 40, 50, 60]
Array reversed.
Array Contents: [60, 50, 40, 30, 25, 20, 10]

This program handles integer arrays and provides the basic operations you requested. You
can easily modify it to work with other data types if needed.

5) Write a python program to create a class that represents a shape. Include methods to
calculate its area and perimeter. Implement subclasses for different shapes like circle, triangle
and square?

Program:

import math

# Base class for shapes

class Shape:

def area(self):

pass

def perimeter(self):

pass

# Subclass for Circle

class Circle(Shape):

def __init__(self, radius):

self.radius = radius
def area(self):

return math.pi * self.radius ** 2

def perimeter(self):

return 2 * math.pi * self.radius

# Subclass for Triangle

class Triangle(Shape):

def __init__(self, a, b, c):

self.a = a

self.b = b

self.c = c

def area(self):

# Using Heron's formula to calculate area

s = (self.a + self.b + self.c) / 2

return math.sqrt(s * (s - self.a) * (s - self.b) * (s - self.c))

def perimeter(self):

return self.a + self.b + self.c

# Subclass for Square

class Square(Shape):

def __init__(self, side):

self.side = side
def area(self):

return self.side ** 2

def perimeter(self):

return 4 * self.side

# Example usage

if __name__ == "__main__":

# Create a Circle object

circle = Circle(5)

print(f"Circle: Area = {circle.area():.2f}, Perimeter = {circle.perimeter():.2f}")

# Create a Triangle object

triangle = Triangle(3, 4, 5)

print(f"Triangle: Area = {triangle.area():.2f}, Perimeter = {triangle.perimeter():.2f}")

# Create a Square object

square = Square(4)

print(f"Square: Area = {square.area():.2f}, Perimeter = {square.perimeter():.2f}")

Explanation:
1. Shape (Base Class): This is the base class that defines the structure for other shapes.
It has two methods, area() and perimeter(), which are placeholders and do nothing in
the base class.
2. Circle (Subclass): Inherits from Shape. It overrides the area() and perimeter()
methods to calculate the area and perimeter of a circle, given its radius.
3. Triangle (Subclass): Inherits from Shape. It implements area() using Heron’s
formula, which calculates the area based on the lengths of the three sides, and
perimeter() calculates the sum of the sides.
4. Square (Subclass): Inherits from Shape. It implements area() and perimeter() for a
square given its side length.
Output:
Circle: Area = 78.54, Perimeter = 31.42
Triangle: Area = 6.00, Perimeter = 12.00
Square: Area = 16.00, Perimeter = 16.00
You can create more subclasses for other shapes by extending the Shape class and
implementing their specific formulas for area and perimeter.
UNIT – V
INTRODUCTION TO DATA SCIENCE
Functional Programming
Functional programming is a paradigm in software development that treats computation as
the evaluation of mathematical functions and avoids changing-state or mutable data. In the
context of data science, functional programming offers several advantages, such as cleaner
code, fewer bugs, and easier parallelization, making it particularly suited for handling large-
scale data analysis.

Key Concepts of Functional Programming in Data Science


1. Pure Functions: A core principle of functional programming is the use of pure
functions, which always produce the same output for the same input and have no side
effects (i.e., they do not modify variables outside their scope). This predictability is
beneficial in data science, where complex transformations often require traceability
and reproducibility.
Example in Python:
# Pure function to normalize a dataset column
def normalize(column):
min_value = min(column)
max_value = max(column)
return [(x - min_value) / (max_value - min_value) for x in column]
2. First-Class and Higher-Order Functions: In functional programming, functions are
treated as "first-class citizens," meaning they can be passed around like any other value. A
higher-order function is a function that takes other functions as arguments or returns a
function as a result.
In data science, this is useful for operations like applying transformations to datasets, filtering
data, or aggregating results.
Example using map and filter:
data = [1, 2, 3, 4, 5, 6]

# Using map to apply a function to all elements


squares = list(map(lambda x: x ** 2, data))

# Using filter to filter out even numbers


even_numbers = list(filter(lambda x: x % 2 == 0, data))
print(squares) # Output: [1, 4, 9, 16, 25, 36]
print(even_numbers) # Output: [2, 4, 6]
3. Immutability: Functional programming emphasizes immutability, meaning data structures
are never modified once created. Instead, new data structures are created. This ensures that
functions do not have unintended side effects, making the code more reliable and easier to
debug.
In data science, immutability is critical when dealing with parallel or distributed
computations (e.g., in Hadoop or Spark), where mutating shared data can lead to
inconsistencies and hard-to-detect bugs.
4. Recursion: Functional programming often uses recursion instead of iterative loops. While
recursion can be less efficient due to stack size limits, functional languages optimize tail-
recursive calls to mitigate this. In data science, recursion can be used in algorithms like tree
traversals or divide-and-conquer techniques (e.g., quicksort, merge sort).
Example:
# Recursive function to calculate factorial
def factorial(n):
if n == 0:
return 1
else:
return n * factorial(n - 1)

print(factorial(5)) # Output: 120


5. Lazy Evaluation: Functional programming languages often employ lazy evaluation, where
expressions are not evaluated until their values are needed. This can improve performance,
particularly when working with large datasets, as it prevents unnecessary computations.
In Python, this can be done using generators, which are iterators that yield values one at a
time and only when required.
Example:
# Generator to produce squares of numbers lazily
def lazy_squares(n):
for i in range(n):
yield i ** 2
squares = lazy_squares(10)
for square in squares:
print(square)
6. Functional Libraries for Data Science:
 Pandas: While not purely functional, many Pandas methods allow for functional-style
operations, such as using apply, map, and transform to work on DataFrames.
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})


df['A_normalized'] = df['A'].apply(lambda x: (x - df['A'].min()) / (df['A'].max() -
df['A'].min()))
print(df)

 Toolz and CyToolz: These are Python libraries that bring many functional
programming tools, such as curry, compose, and partial, which help in chaining and
composing functions.

from toolz import curry

@curry
def add(x, y):
return x + y

add_five = add(5)
print(add_five(10)) # Output: 15

Advantages of Functional Programming in Data Science


1. Modularity and Reusability: Functional code is modular, and functions are reusable.
This makes it easier to test, maintain, and scale data science projects.
2. Parallelization: Since functional programming avoids mutable state, it’s easier to
parallelize computations, which is useful for big data applications like machine
learning and distributed computing.
3. Ease of Testing: Pure functions are easier to test since they are deterministic. The
same input always gives the same output, so unit tests can be written more
confidently.
4. Data Transformation: Functional programming excels at transforming data through
maps, filters, and reduces. This paradigm aligns well with data processing workflows
in data science.
Functional Programming in Tools like Spark
In big data frameworks like Apache Spark, the functional programming paradigm is
embraced. For example, Spark’s RDD (Resilient Distributed Dataset) API heavily
uses functional constructs like map, filter, and reduce for distributed data processing:

# Spark example using map and filter


from pyspark import SparkContext

sc = SparkContext()
data = sc.parallelize([1, 2, 3, 4, 5])

# Using map and filter


squared = data.map(lambda x: x ** 2)
filtered = squared.filter(lambda x: x > 10)

print(filtered.collect()) # Output: [16, 25]

Functional programming is highly applicable in data science due to its emphasis on


immutability, pure functions, and higher-order functions, which lead to more
predictable and easier-to-reason-about code. By integrating functional techniques,
data scientists can write efficient, modular, and parallelizable code, which is crucial
for analyzing large datasets and building scalable data pipelines.

JSON and XML with Python, Numpy with Python, Pandas


JSON and XML in Data Science with Python
In data science, JSON (JavaScript Object Notation) and XML (eXtensible Markup
Language) are widely used formats for data exchange and storage. Python provides powerful
libraries like json for JSON handling and xml.etree.ElementTree for XML parsing, making it
easy to work with structured data.
JSON in Python
JSON is a lightweight, human-readable data format often used for data interchange in web
applications. Python’s json module makes it simple to work with JSON data.
 Loading JSON from a file:
import json
# Reading JSON data from a file
with open('data.json', 'r') as file:
data = json.load(file)
print(data)
Converting a dictionary to a JSON string:
# Creating a dictionary
person = {
"name": "Alice",
"age": 30,
"city": "New York"
}

# Convert to JSON string


json_str = json.dumps(person)
print(json_str)
Writing JSON data to a file:
# Writing JSON to a file
with open('output.json', 'w') as file:
json.dump(person, file)
XML in Python
XML is used to store and transport data, often in a hierarchical structure. Python’s
xml.etree.ElementTree library provides tools to parse XML documents.
 Parsing an XML file:

import xml.etree.ElementTree as ET

# Parsing XML data from a file


tree = ET.parse('data.xml')
root = tree.getroot()

# Printing the root tag


print(root.tag)

# Iterating through the XML elements


for child in root:
print(child.tag, child.attrib)

Creating and writing an XML file:


# Creating an XML structure
root = ET.Element("person")
name = ET.SubElement(root, "name")
name.text = "Alice"
age = ET.SubElement(root, "age")
age.text = "30"

# Converting to a string and saving to a file


tree = ET.ElementTree(root)
tree.write("output.xml")

NumPy in Python
NumPy is the fundamental package for scientific computing in Python. It adds support for
large, multi-dimensional arrays and matrices, along with a large collection of mathematical
functions to operate on these arrays.
 Creating a NumPy array:
import numpy as np

# Creating an array from a list


arr = np.array([1, 2, 3, 4])
print(arr)
 Array operations:
arr = np.array([1, 2, 3, 4])

# Adding 10 to each element


arr = arr + 10
print(arr) # Output: [11 12 13 14]

 Basic operations with arrays:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
c=a+b
print(c) # Output: [5 7 9]

 Array reshaping:
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Reshape into 3x2
reshaped = arr.reshape(3, 2)
print(reshaped)
 Matrix multiplication:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

# Matrix multiplication
result = np.dot(a, b)
print(result)

Pandas in Python
Pandas is a powerful data manipulation library in Python that provides data structures
like DataFrames and Series for working with structured data.

 Creating a DataFrame:
import pandas as pd

# Creating a DataFrame from a dictionary


data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["New York", "San Francisco", "Los Angeles"]
}

df = pd.DataFrame(data)
print(df)

 Reading data from a CSV file:


# Reading a CSV file into a DataFrame
df = pd.read_csv('data.csv')
print(df.head()) # Print first 5 rows

 Filtering DataFrame:
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)

 Writing DataFrame to a CSV file:


# Write DataFrame to CSV
df.to_csv('output.csv', index=False)
 Handling missing data:
# Replace NaN values with a specific value
df.fillna(0, inplace=True)
print(df)

 Basic operations:
# Adding a new column
df['Salary'] = [50000, 60000, 70000]

# Calculating the mean age


mean_age = df['Age'].mean()
print(f"Mean Age: {mean_age}")

Combining Pandas, NumPy, JSON, and XML


 Reading JSON into a Pandas DataFrame:
df = pd.read_json('data.json')
print(df)
 Converting a DataFrame to JSON:
json_data = df.to_json(orient='records')
print(json_data)

 Using NumPy with Pandas:


df['Age'] = np.log(df['Age']) # Apply NumPy function to a DataFrame column
print(df)

 Exporting DataFrame to XML:


import dicttoxml

# Convert DataFrame to dictionary


data_dict = df.to_dict(orient='records')

# Convert dictionary to XML


xml_data = dicttoxml.dicttoxml(data_dict)
print(xml_data)

 JSON and XML are essential formats for data interchange in data science. Python
offers robust support for both using the json and xml modules.
 NumPy is a key library for numerical operations, making it invaluable in data
analysis and machine learning.
 Pandas is the go-to library for data manipulation and analysis, providing flexible
data structures for handling diverse types of data.
Visual Aids for Exploratory Data Analysis (EDA)
Exploratory Data Analysis (EDA) is a crucial step in data analysis that allows data scientists
and analysts to summarize the main characteristics of the data, often using visual methods.
Visualizations help uncover patterns, trends, relationships, and anomalies in the data. This
guide provides an overview of technical requirements, various types of visualizations, and
considerations for choosing the best chart for EDA using the Seaborn library in Python.
Technical Requirements
To perform EDA with visualizations using Seaborn, ensure you have the following libraries
installed:
1. Python: A programming language that you’ll be using for data analysis.
2. Pandas: For data manipulation and analysis.
3. Matplotlib: A plotting library for creating static, animated, and interactive
visualizations.
4. Seaborn: A statistical data visualization library based on Matplotlib that provides a
high-level interface for drawing attractive graphics.
You can install these libraries using pip:
bash
pip install pandas matplotlib seaborn
Visualizations in EDA
1. Line Chart
Description: A line chart is used to display data points over time, showcasing trends in a time
series dataset.
When to Use: Ideal for visualizing continuous data, especially time series data.
Example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = {'Year': [2017, 2018, 2019, 2020, 2021],
'Sales': [200, 250, 300, 350, 400]}
df = pd.DataFrame(data)
# Line chart
sns.lineplot(x='Year', y='Sales', data=df)
plt.title('Sales Over Years')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.show()
2. Bar Chart
Description: A bar chart represents categorical data with rectangular bars, where the length
of each bar is proportional to the value it represents.
When to Use: Suitable for comparing different categories.
Example:
# Sample data
data = {'Product': ['A', 'B', 'C', 'D'],
'Sales': [150, 200, 250, 300]}
df = pd.DataFrame(data)

# Bar chart
sns.barplot(x='Product', y='Sales', data=df)
plt.title('Sales by Product')
plt.xlabel('Product')
plt.ylabel('Sales')
plt.show()
3. Scatter Plot
Description: A scatter plot displays values for typically two variables for a set of data,
showing the relationship between them.
When to Use: Useful for identifying relationships or correlations between two numerical
variables.
Example:
# Sample data
data = {'Height': [5.1, 5.5, 6.0, 5.8, 5.7],
'Weight': [100, 150, 180, 175, 160]}
df = pd.DataFrame(data)

# Scatter plot
sns.scatterplot(x='Height', y='Weight', data=df)
plt.title('Height vs. Weight')
plt.xlabel('Height (inches)')
plt.ylabel('Weight (lbs)')
plt.show()
4. Polar Chart
Description: A polar chart represents data in a circular format, where each point is
determined by an angle and a radius.
When to Use: Ideal for visualizing data with a cyclical nature, such as wind direction or
periodic functions.
Example:
import numpy as np

# Sample data
categories = ['A', 'B', 'C', 'D']
values = [4, 2, 5, 3]

# Polar chart
angles = np.linspace(0, 2 * np.pi, len(categories), endpoint=False).tolist()
values += values[:1]
angles += angles[:1]

fig, ax = plt.subplots(figsize=(6, 6), subplot_kw=dict(polar=True))


ax.fill(angles, values, color='blue', alpha=0.25)
ax.set_yticklabels([])
ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories)
plt.title('Polar Chart Example')
plt.show()
5. Histogram
Description: A histogram is a graphical representation of the distribution of numerical data,
showing the number of data points that fall within specified ranges (bins).
When to Use: Useful for understanding the distribution and frequency of numerical data.
Example:
# Sample data
data = {'Values': [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6]}
df = pd.DataFrame(data)

# Histogram
sns.histplot(df['Values'], bins=5, kde=True)
plt.title('Value Distribution')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.show()
Choosing the Best Chart
Selecting the appropriate chart depends on the data type and the analysis objective:
1. Line Chart: Use for continuous data, especially time series.
2. Bar Chart: Use for comparing categories or discrete data.
3. Scatter Plot: Use for examining relationships between two quantitative variables.
4. Polar Chart: Use for cyclical data or displaying relationships in a circular format.
5. Histogram: Use for displaying the distribution of a single quantitative variable.
Visualization plays a critical role in EDA, enabling better understanding and interpretation of
data. Tools like Seaborn provide a powerful way to create meaningful visualizations with just
a few lines of code. By using the appropriate type of visualization, analysts can effectively
communicate their findings and insights derived from the data.

Sample Experiments:
1) Python program to check whether a JSON string contains complex object or not?
Program:
import json
def is_complex_object(obj):
"""Check if the object is a complex JSON object."""
# Check if the object is a dictionary or a list
return isinstance(obj, dict) or isinstance(obj, list)

def check_json_complexity(json_string):
"""Check if a JSON string contains complex objects."""
try:
# Parse the JSON string
parsed_json = json.loads(json_string)

# Check if the parsed JSON contains complex objects


if is_complex_object(parsed_json):
return True
else:
return False
except json.JSONDecodeError:
print("Invalid JSON string.")
return None

# Example usage
json_string_1 = '{"name": "John", "age": 30, "city": "New York"}' # Simple object
json_string_2 = '{"name": "John", "age": 30, "address": {"street": "123 Main St", "city":
"New York"}}' # Complex object
json_string_3 = '[1, 2, 3, 4]' # List (complex object)
json_string_4 = '"Just a string"' # Simple string

print(check_json_complexity(json_string_1)) # Output: False


print(check_json_complexity(json_string_2)) # Output: True
print(check_json_complexity(json_string_3)) # Output: True
print(check_json_complexity(json_string_4)) # Output: False
Explanation
1. Function is_complex_object:
o This function checks if the provided object is either a dictionary or a list,
which are considered complex JSON objects.
2. Function check_json_complexity:
o This function takes a JSON string as input.
o It attempts to parse the JSON string using json.loads().
o If parsing is successful, it checks whether the resulting object is complex using
the is_complex_object function.
o If the JSON string is invalid, it catches the JSONDecodeError and prints a
message.
3. Example Usage:
o Several JSON strings are provided as examples to demonstrate the
functionality, checking both simple and complex objects.

Output

When you run the program, it will print:

False

True

True

False

This indicates whether each JSON string contains complex objects.


2) Python program to demonstrate Numpy arrays creation using array() function?
Program:
import numpy as np

# 1. Creating a 1D NumPy array


array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:")
print(array_1d)
# 2. Creating a 2D NumPy array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:")
print(array_2d)

# 3. Creating a 3D NumPy array


array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D Array:")
print(array_3d)

# 4. Creating an array with different data types


array_float = np.array([1, 2, 3], dtype=float)
print("\nArray with float data type:")
print(array_float)

# 5. Creating an array of zeros


array_zeros = np.array([[0, 0, 0], [0, 0, 0]])
print("\nArray of Zeros:")
print(array_zeros)

# 6. Creating an array of ones


array_ones = np.array([[1, 1, 1], [1, 1, 1]])
print("\nArray of Ones:")
print(array_ones)

# 7. Creating an array with a specific data type


array_string = np.array(['a', 'b', 'c'], dtype=str)
print("\nArray with string data type:")
print(array_string)
# 8. Creating an empty array
array_empty = np.empty((2, 3)) # shape (2, 3)
print("\nEmpty Array:")
print(array_empty)
Explanation
1. Import NumPy: The program starts by importing the NumPy library as np.
2. Creating 1D Array: A one-dimensional array is created using a list of integers.
3. Creating 2D Array: A two-dimensional array is created using a list of lists.
4. Creating 3D Array: A three-dimensional array is created using nested lists.
5. Creating Array with Different Data Types: An array is created with a specified data
type (float in this case).
6. Creating Array of Zeros: An array filled with zeros is created.
7. Creating Array of Ones: An array filled with ones is created.
8. Creating an Array with a Specific Data Type: An array with string data type is
created.
9. Creating an Empty Array: An empty array is created, which will contain
uninitialized data.
Output
When you run the program, it will produce output similar to the following:
1D Array:
[1 2 3 4 5]

2D Array:
[[1 2 3]
[4 5 6]]

3D Array:
[[[1 2]
[3 4]]

[[5 6]
[7 8]]]

Array with float data type:


[1. 2. 3.]

Array of Zeros:
[[0 0 0]
[0 0 0]]

Array of Ones:
[[1 1 1]
[1 1 1]]

Array with string data type:


['a' 'b' 'c']

Empty Array:
[[0. 0. 0.]
[0. 0. 0.]]
This output demonstrates the different types of NumPy arrays that can be created using the
array() function.
3) Python program to demonstrate use of ndim, shape, size, dtype?
Program:
import numpy as np

# Creating a 1D NumPy array


array_1d = np.array([1, 2, 3, 4, 5])
print("1D Array:")
print(array_1d)
print("Number of dimensions (ndim):", array_1d.ndim)
print("Shape:", array_1d.shape)
print("Size:", array_1d.size)
print("Data type (dtype):", array_1d.dtype)

# Creating a 2D NumPy array


array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:")
print(array_2d)
print("Number of dimensions (ndim):", array_2d.ndim)
print("Shape:", array_2d.shape)
print("Size:", array_2d.size)
print("Data type (dtype):", array_2d.dtype)

# Creating a 3D NumPy array


array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D Array:")
print(array_3d)
print("Number of dimensions (ndim):", array_3d.ndim)
print("Shape:", array_3d.shape)
print("Size:", array_3d.size)
print("Data type (dtype):", array_3d.dtype)

# Creating an array with a specific data type


array_float = np.array([1, 2, 3], dtype=float)
print("\nArray with float data type:")
print(array_float)
print("Number of dimensions (ndim):", array_float.ndim)
print("Shape:", array_float.shape)
print("Size:", array_float.size)
print("Data type (dtype):", array_float.dtype)
# Creating a multi-dimensional array of zeros
array_zeros = np.zeros((2, 3)) # shape (2, 3)
print("\nArray of Zeros:")
print(array_zeros)
print("Number of dimensions (ndim):", array_zeros.ndim)
print("Shape:", array_zeros.shape)
print("Size:", array_zeros.size)
print("Data type (dtype):", array_zeros.dtype)
Explanation
1. Import NumPy: The program starts by importing the NumPy library as np.
2. Creating Arrays:
o A 1D array is created and its properties are printed.
o A 2D array is created and its properties are printed.
o A 3D array is created and its properties are printed.
o An array with a specific data type (float) is created, and its properties are
printed.
o An array of zeros with a specified shape (2x3) is created, and its properties
are printed.
3. Accessing Properties:
o ndim: Returns the number of dimensions of the array.
o shape: Returns a tuple representing the dimensions of the array.
o size: Returns the total number of elements in the array.
o dtype: Returns the data type of the array's elements.
Output
When you run the program, it will produce output similar to the following:
1D Array:
[1 2 3 4 5]
Number of dimensions (ndim): 1
Shape: (5,)
Size: 5
Data type (dtype): int64
2D Array:
[[1 2 3]
[4 5 6]]
Number of dimensions (ndim): 2
Shape: (2, 3)
Size: 6
Data type (dtype): int64

3D Array:
[[[1 2]
[3 4]]

[[5 6]
[7 8]]]
Number of dimensions (ndim): 3
Shape: (2, 2, 2)
Size: 8
Data type (dtype): int64

Array with float data type:


[1. 2. 3.]
Number of dimensions (ndim): 1
Shape: (3,)
Size: 3
Data type (dtype): float64

Array of Zeros:
[[0. 0. 0.]
[0. 0. 0.]]
Number of dimensions (ndim): 2
Shape: (2, 3)
Size: 6
Data type (dtype): float64
This output demonstrates the different properties of the NumPy arrays that can be accessed
using ndim, shape, size, and dtype.
4) Python program to demonstrate basic slicing, integer, and boolean indexing?
Program:
import numpy as np

# Creating a 1D NumPy array


array_1d = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
print("1D Array:")
print(array_1d)

# Basic slicing
print("\nBasic Slicing:")
print("Elements from index 2 to 5:", array_1d[2:6]) # Slicing from index 2 to 5

# Integer indexing
print("\nInteger Indexing:")
indices = [0, 2, 4]
print("Elements at indices 0, 2, 4:", array_1d[indices]) # Fetching elements at specified
indices

# Creating a 2D NumPy array


array_2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print("\n2D Array:")
print(array_2d)
# Basic slicing for 2D array
print("\nBasic Slicing for 2D Array:")
print("First two rows and all columns:")
print(array_2d[:2, :]) # Slicing first two rows and all columns

# Integer indexing in 2D array


print("\nInteger Indexing in 2D Array:")
print("Elements at positions (0, 1), (1, 2):")
print(array_2d[[0, 1], [1, 2]]) # Fetching elements at specified row and column indices

# Boolean indexing
print("\nBoolean Indexing:")
boolean_mask = (array_2d > 5) # Create a boolean mask where elements greater than 5 are
True
print("Original Array:")
print(array_2d)
print("Boolean Mask:")
print(boolean_mask)
print("Elements greater than 5:")
print(array_2d[boolean_mask]) # Fetching elements that satisfy the boolean condition
Explanation
1. Import NumPy: The program starts by importing the NumPy library as np.
2. Creating Arrays:
o A 1D array is created with integer values.
o A 2D array is created as a 3x3 matrix.
3. Basic Slicing:
o For the 1D array, it demonstrates slicing by selecting elements from index 2 to
5 (inclusive of 2 and exclusive of 6).
o For the 2D array, it slices the first two rows and all columns.
4. Integer Indexing:
o It retrieves specific elements from the 1D array using an array of indices.
o In the 2D array, it uses integer indexing to fetch elements at specific row and
column indices.
5. Boolean Indexing:
o A boolean mask is created by checking which elements in the 2D array are
greater than 5.
o It then uses this boolean mask to retrieve only those elements from the array
that satisfy the condition.
Output
1D Array:
[10 20 30 40 50 60 70 80 90]

Basic Slicing:
Elements from index 2 to 5: [30 40 50 60]

Integer Indexing:
Elements at indices 0, 2, 4: [10 30 50]

2D Array:
[[1 2 3]
[4 5 6]
[7 8 9]]

Basic Slicing for 2D Array:


First two rows and all columns:
[[1 2 3]
[4 5 6]]

Integer Indexing in 2D Array:


Elements at positions (0, 1), (1, 2):
[2 6]
Boolean Indexing:
Original Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Boolean Mask:
[[False False False]
[False False True]
[ True True True]]
Elements greater than 5:
[6 7 8 9]
This output demonstrates how to perform basic slicing, integer indexing, and boolean
indexing on NumPy arrays.
5) Create a dictionary with at least five keys and each key represents value as a list
where this list contains at least ten values and convert this dictionary as a pandas data
frame and explore the data through the data frame as follow:
a) Apply head() function to the pandas data frame
b) Perform various data selection operations on Data Frame
Program:
import pandas as pd

# Create a dictionary with five keys and each key has a list of ten values
data_dict = {
'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'B': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'C': [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
'D': [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
'E': [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
}
# Convert the dictionary into a Pandas DataFrame
df = pd.DataFrame(data_dict)

# Display the DataFrame


print("DataFrame:")
print(df)

# a) Apply head() function to the DataFrame


print("\nHead of the DataFrame:")
print(df.head()) # Display the first 5 rows

# b) Perform various data selection operations on DataFrame

# Selecting a specific column


print("\nSelecting column 'B':")
print(df['B'])

# Selecting multiple columns


print("\nSelecting columns 'A' and 'C':")
print(df[['A', 'C']])

# Selecting a specific row by index


print("\nSelecting row with index 2:")
print(df.iloc[2]) # Select the third row (index starts from 0)

# Selecting multiple rows by index


print("\nSelecting rows with index 1 to 3:")
print(df.iloc[1:4]) # Select rows from index 1 to 3 (exclusive)

# Selecting rows based on a condition


print("\nSelecting rows where values in column 'A' are greater than 5:")
print(df[df['A'] > 5]) # Select rows where values in column 'A' are greater than 5
Explanation
1. Import Pandas: The program starts by importing the Pandas library.
2. Create a Dictionary: A dictionary named data_dict is created with five keys (A, B, C,
D, E), each containing a list of ten integers.
3. Convert to DataFrame: The dictionary is converted to a Pandas DataFrame using
pd.DataFrame(data_dict).
4. Display the DataFrame: The DataFrame is printed to the console.
5. Using head() Function: The program uses the head() function to display the first five
rows of the DataFrame.
6. Data Selection Operations:
o Select a Specific Column: It demonstrates how to select a single column (B).
o Select Multiple Columns: It shows how to select multiple columns (A and C).
o Select a Specific Row: It selects a specific row by its index (the third row).
o Select Multiple Rows: It selects a range of rows (index 1 to 3).
o Select Rows Based on Condition: It demonstrates how to select rows based
on a condition (where values in column A are greater than 5).
Output
DataFrame:
A B C D E
0 1 11 21 31 41
1 2 12 22 32 42
2 3 13 23 33 43
3 4 14 24 34 44
4 5 15 25 35 45
5 6 16 26 36 46
6 7 17 27 37 47
7 8 18 28 38 48
8 9 19 29 39 49
9 10 20 30 40 50
Head of the DataFrame:
A B C D E
0 1 11 21 31 41
1 2 12 22 32 42
2 3 13 23 33 43
3 4 14 24 34 44
4 5 15 25 35 45

Selecting column 'B':


0 11
1 12
2 13
3 14
4 15
5 16
6 17
7 18
8 19
9 20
Name: B, dtype: int64

Selecting columns 'A' and 'C':


A C
0 1 21
1 2 22
2 3 23
3 4 24
4 5 25
5 6 26
6 7 27
7 8 28
8 9 29
9 10 30

Selecting row with index 2:


A 3
B 13
C 23
D 33
E 43
Name: 2, dtype: int64

Selecting rows with index 1 to 3:


A B C D E
1 2 12 22 32 42
2 3 13 23 33 43
3 4 14 24 34 44

Selecting rows where values in column 'A' are greater than 5:


A B C D E
5 6 16 26 36 46
6 7 17 27 37 47
7 8 18 28 38 48
8 9 19 29 39 49
9 10 20 30 40 50
This output demonstrates how to create a DataFrame from a dictionary and perform various
data selection operations on it.
6) Apply different visualization techniques using sample dataset
a) Line Chart b)Bar Chart c) Scatter Plot d)Bubble Port
Program:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample dataset
data = {
'Year': [2015, 2016, 2017, 2018, 2019, 2020],
'Sales': [150, 200, 250, 300, 350, 400],
'Profit': [30, 50, 70, 90, 120, 150],
'Expenses': [120, 150, 180, 210, 230, 250]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Set the style of seaborn


sns.set(style='whitegrid')

# a) Line Chart
plt.figure(figsize=(10, 6))
plt.plot(df['Year'], df['Sales'], marker='o', label='Sales', color='blue')
plt.plot(df['Year'], df['Profit'], marker='o', label='Profit', color='green')
plt.title('Sales and Profit Over Years')
plt.xlabel('Year')
plt.ylabel('Amount')
plt.legend()
plt.grid()
plt.show()

# b) Bar Chart
plt.figure(figsize=(10, 6))
plt.bar(df['Year'], df['Sales'], color='blue', alpha=0.6, label='Sales')
plt.bar(df['Year'], df['Expenses'], color='red', alpha=0.6, label='Expenses')
plt.title('Sales and Expenses Over Years')
plt.xlabel('Year')
plt.ylabel('Amount')
plt.legend()
plt.show()

# c) Scatter Plot
plt.figure(figsize=(10, 6))
plt.scatter(df['Sales'], df['Profit'], color='purple', s=100)
plt.title('Sales vs Profit')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.grid()
plt.show()

# d) Bubble Plot
plt.figure(figsize=(10, 6))
plt.scatter(df['Sales'], df['Profit'], s=df['Expenses'], color='orange', alpha=0.5,
edgecolor='black')
plt.title('Bubble Plot of Sales vs Profit')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.grid()
plt.show()
Explanation
1. Import Libraries: The program imports the necessary libraries: pandas,
matplotlib.pyplot, and seaborn.
2. Create Sample Dataset: A sample dataset is created using a dictionary, which
includes 'Year', 'Sales', 'Profit', and 'Expenses'. This dataset is then converted into a
Pandas DataFrame.
3. Line Chart:
o A line chart is created to show the trend of sales and profit over the years.
o plt.plot() is used to plot the data, and markers are added to indicate data points.
4. Bar Chart:
o A bar chart is created to compare sales and expenses over the years.
o Two bar plots are drawn on the same axes using plt.bar().
5. Scatter Plot:
o A scatter plot is created to visualize the relationship between sales and profit.
o The size of each point is fixed, and the points are colored in purple.
6. Bubble Plot:
o A bubble plot is created to show the relationship between sales and profit,
where the size of each bubble represents the expenses.
o The s parameter in plt.scatter() controls the size of the bubbles.
Output
When you run the program, it will generate four different plots:
1. Line Chart: Displays sales and profit trends over the years.
2. Bar Chart: Compares sales and expenses for each year.
3. Scatter Plot: Shows the relationship between sales and profit.
4. Bubble Plot: Displays the relationship between sales and profit, with bubble sizes
representing expenses.
Note:
To run this code, you will need to have pandas, matplotlib, and seaborn installed in your
Python environment. You can install them using pip if they are not already installed:
pip install pandas matplotlib seaborn
7) Generate Scatter Plot using seaborn library for iris dataset?
Program:
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Iris dataset


iris = sns.load_dataset('iris')
# Display the first few rows of the dataset (optional)
print(iris.head())

# Create a scatter plot


plt.figure(figsize=(10, 6))
sns.scatterplot(data=iris, x='sepal_length', y='sepal_width', hue='species', style='species',
palette='deep', s=100)

# Set plot title and labels


plt.title('Scatter Plot of Sepal Length vs Sepal Width')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.grid()
plt.legend(title='Species')
plt.show()
Explanation
Import Libraries: The code imports the necessary libraries: seaborn for data visualization
and matplotlib.pyplot for additional plotting functions.
1. Load the Iris Dataset: The Iris dataset is loaded using sns.load_dataset('iris'). This
dataset contains the following columns:
o sepal_length: Length of the sepal (in cm)
o sepal_width: Width of the sepal (in cm)
o petal_length: Length of the petal (in cm)
o petal_width: Width of the petal (in cm)
o species: Species of the iris (setosa, versicolor, virginica)
2. Optional Display of Dataset: The first few rows of the dataset are printed using
iris.head(). This is optional and can be removed if you only want the plot.
3. Create a Scatter Plot:
o A scatter plot is created using sns.scatterplot(), where:
 data=iris: The data source.
 x='sepal_length': X-axis variable (sepal length).
 y='sepal_width': Y-axis variable (sepal width).
 hue='species': Points are colored based on the species of iris.
 style='species': Points are styled based on the species of iris.
 palette='deep': The color palette for different species.
 s=100: The size of the points.
4. Set Plot Title and Labels: Titles and labels for the axes are set using plt.title(),
plt.xlabel(), and plt.ylabel().
5. Show Plot: Finally, the plot is displayed using plt.show().
Output
When you run this program, you should see a scatter plot displaying the relationship between
sepal length and sepal width, with different colors and styles representing the three species of
iris.
Note:
To run this code, make sure you have Seaborn and Matplotlib installed in your Python
environment. If you haven't installed them yet, you can do so using pip:
pip install seaborn matplotlib
8) Apply following visualization techniques for a sample dataset
a) Area Plot b) Stacked Plot c) Pie Chart d) Table Chart
Program:
import pandas as pd
import matplotlib.pyplot as plt

# Sample dataset
data = {
'Category': ['A', 'B', 'C', 'D'],
'2020': [10, 20, 30, 40],
'2021': [15, 25, 35, 45],
'2022': [20, 30, 25, 50]
}

# Create a DataFrame
df = pd.DataFrame(data)
# Set the category as the index
df.set_index('Category', inplace=True)

# Plotting
plt.figure(figsize=(12, 8))

# a) Area Plot
plt.subplot(2, 2, 1)
df.plot.area(alpha=0.5)
plt.title('Area Plot')
plt.ylabel('Values')
plt.xlabel('Categories')
plt.grid()

# b) Stacked Plot
plt.subplot(2, 2, 2)
df.plot(kind='bar', stacked=True)
plt.title('Stacked Bar Plot')
plt.ylabel('Values')
plt.xlabel('Categories')
plt.grid()

# c) Pie Chart
plt.subplot(2, 2, 3)
df['2022'].plot.pie(autopct='%1.1f%%', startangle=90, cmap='viridis')
plt.title('Pie Chart for Year 2022')
plt.ylabel('') # Hide the y-label

# d) Table Chart
plt.subplot(2, 2, 4)
plt.axis('tight')
plt.axis('off')
table = plt.table(cellText=df.values, colLabels=df.columns, cellLoc = 'center', loc='center')
table.auto_set_font_size(False)
table.set_fontsize(12)
table.scale(1.2, 1.2)
plt.title('Table Chart')

plt.tight_layout()
plt.show()
Explanation
1. Import Libraries: The code imports the necessary libraries: pandas for data
manipulation and matplotlib.pyplot for plotting.
2. Sample Dataset: A sample dataset is created with categories (A, B, C, D) and values
for the years 2020, 2021, and 2022.
3. Create a DataFrame: A pandas DataFrame is created from the sample dataset, and
the 'Category' column is set as the index.
4. Plotting:
o Figure Size: The figure size is set to (12, 8) for better visibility.
o Area Plot:
 A subplot is created for the area plot, where the plot.area() method is
used to create an area plot for the DataFrame.
 The plot is labeled and gridded.
o Stacked Plot:
 Another subplot is created for the stacked bar plot, where
plot(kind='bar', stacked=True) is used to generate a stacked bar chart.
 The plot is labeled and gridded.
o Pie Chart:
 A subplot is created for the pie chart using the values for the year 2022.
The plot.pie() method is used, and percentages are displayed on the
chart with autopct='%1.1f%%'.
o Table Chart:
 A subplot is created for the table chart using plt.table() to display the
DataFrame values in a tabular format. The axis is turned off for better
appearance.
5. Display Plots: The plt.tight_layout() method is called to adjust the spacing between
plots, and plt.show() is used to display the plots.
Output
When you run the above program, it will display four visualizations in a single window:
 An area plot representing the values over the years.
 A stacked bar plot to show how values accumulate in each category.
 A pie chart illustrating the distribution of values for the year 2022.
 A table chart showing the values in a tabular format.
Note:
To run this code, make sure you have pandas and matplotlib installed in your Python
environment. You can install them using pip if you haven't done so:
pip install pandas matplotlib

You might also like