0% found this document useful (0 votes)
4 views5 pages

Python Unit 2 Question Bank (2)

The document is a question bank for a Python unit focusing on Pandas, covering topics such as data structures, data importing and exporting, data cleaning, data manipulation, and working with dates and times. It includes theoretical questions and practical coding exercises to assess understanding of key concepts and functionalities in Pandas. The questions range from basic definitions to more complex operations and data handling techniques.

Uploaded by

Sneha Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Python Unit 2 Question Bank (2)

The document is a question bank for a Python unit focusing on Pandas, covering topics such as data structures, data importing and exporting, data cleaning, data manipulation, and working with dates and times. It includes theoretical questions and practical coding exercises to assess understanding of key concepts and functionalities in Pandas. The questions range from basic definitions to more complex operations and data handling techniques.

Uploaded by

Sneha Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Python UNIT 2 QUESTION BANK

1. Introduction to Pandas
Understanding Data Structures: Series and DataFrames

1. What is a Pandas Series? How is it different from a Python list or NumPy array?
2. Write Python code to create a Series with numbers from 1 to 5 and custom labels ['a',
'b', 'c', 'd', 'e'].
3. What is a DataFrame in Pandas? How is it structured?
4. Create a DataFrame from the following dictionary and display its structure:

python
Copy code
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}

5. How can you access:


o a single column
o multiple columns
o a specific row
o a specific value in a DataFrame?

📥 2. Data Importing and Exporting


Reading and Writing Files

6. How do you read a CSV file named sales_data.csv into a Pandas DataFrame? What
parameters can you use to handle headers or missing values?
7. How do you read an Excel file with multiple sheets? Provide example code to read a
specific sheet.
8. What are some common parameters in pd.read_csv() that are useful for reading messy
files (e.g., sep, header, skiprows, na_values)?
9. Write code to export a DataFrame named df to a file called output.xlsx without writing
the index.
10. What file formats can Pandas handle for reading and writing data?
🧹 3. Data Cleaning
Handling Missing Data and Duplicates

11. Write code to:


o Check for missing values in all columns.
o Count missing values per column.
o Drop rows with missing values.
o Fill missing values in a column with a specific value (e.g., 0).
12. Explain the difference between dropna(), fillna(), and isnull().
13. How do you identify and remove duplicate rows from a DataFrame? What if you want to
keep the last duplicate instead of the first?
14. How can you replace all blank string entries in a DataFrame with NaN?

Filtering Data

15. Given a DataFrame df with columns ['name', 'age', 'salary'], write code to filter:
o All rows where age is above 30.
o All rows where salary is not null and greater than 50,000.
o All rows where name is either “Alice” or “Bob”.

🛠️4. Data Manipulation


Sorting, Indexing, Grouping, Merging, Concatenating

16. How do you sort a DataFrame by multiple columns (e.g., sort by age ascending and
salary descending)?
17. What is the difference between .loc[] and .iloc[]? Provide examples for both.
18. What does reset_index() do, and when would you use it?
19. Write code to group a dataset by the department column and calculate the average
salary.
20. What is the difference between groupby() and pivot_table() in Pandas?

Merging and Concatenating

21. Given two DataFrames with a common column employee_id, how do you merge them
using an inner join?
22. How do you concatenate two DataFrames vertically and horizontally?
23. What is the purpose of the on, how, and suffixes parameters in the merge() function?
🗓️5. Working with Dates and Times
Date Handling and Time Operations

24. How do you convert a column with date strings to actual datetime objects in Pandas?
25. Write code to extract the following from a datetime column:
o Year
o Month
o Day of the week
26. How would you filter rows that fall within a specific date range?
27. What method do you use to set a datetime column as the DataFrame index? Why is this
useful?
28. How do you generate a date range from January 1 to January 10, 2023, with daily
frequency?

Some theory Questions

Answers need to be elaborated as per the marks in question paper.

1. Introduction to Pandas

Q1: What are the key differences between a Pandas Series and a DataFrame?
A: A Series is a one-dimensional labeled array that can hold any data type, while a DataFrame is a two-
dimensional labeled data structure with columns of potentially different types, similar to a table.

Q2: Explain the importance of index labels in Pandas data structures.


A: Index labels provide a way to uniquely identify rows (and optionally columns), which facilitates data
alignment, selection, and operations like joins and filtering.

Q3: How does Pandas differ from native Python data structures like lists and dictionaries?
A: Pandas provides high-performance, easy-to-use data structures optimized for tabular data,
supporting powerful data manipulation and analysis features not found in basic Python structures.

2. Data Importing and Exporting

Q1: What are the advantages of using Pandas for reading and writing data files?
A: Pandas simplifies file I/O with intuitive functions, handles large files efficiently, and supports many
formats (CSV, Excel, JSON, SQL, etc.) while automatically converting data into usable DataFrames.

Q2: How does Pandas handle missing values during data import?
A: By default, Pandas interprets blank or NA strings as NaN, providing methods like fillna(), dropna(), and
parameters in read functions (e.g., na_values) to customize this behavior.
Q3: Compare the functions read_csv() and read_excel(). When would you use each?
A: read_csv() is used to read comma-separated value files and is faster for plain text files, while
read_excel() is used for Excel files and requires additional libraries like openpyxl or xlrd.

3. Data Cleaning

Q1: What methods does Pandas offer to detect and handle missing values?
A: Use isna(), notna() to detect; fillna(), dropna() to handle; and methods like forward/backward fill
(ffill(), bfill()).

Q2: Explain how the drop_duplicates() function works and when you might use it.
A: It removes duplicate rows based on specified columns. It's useful in data cleaning to ensure data
uniqueness and integrity.

Q3: Why is data filtering important before analysis, and how is it achieved in Pandas?
A: Filtering ensures only relevant, clean data is analyzed. It's achieved using boolean indexing or
methods like query() and loc[].

4. Data Manipulation

Q1: Describe how indexing enhances data retrieval in Pandas.


A: Indexing allows efficient data selection, slicing, and labeling of rows/columns, enabling faster and
more readable data operations.

Q2: What is the difference between merge() and concat() in Pandas?


A: merge() combines DataFrames based on common keys (like SQL joins), while concat() stacks them
vertically or horizontally without considering matching keys.

Q3: How does the groupby() function work, and what types of operations can it perform?
A: It splits data into groups based on some criteria and allows aggregation (sum(), mean()),
transformation, or filtration on each group.

5. Working with Dates and Times

Q1: How does Pandas store and represent date and time data?
A: Using datetime64 and Timedelta types, often converted using pd.to_datetime() for consistency and
performance.

Q2: What are the common operations you can perform on datetime objects in Pandas?
A: Extracting parts (dt.year, dt.month), date arithmetic (adding/subtracting dates), and resampling or
frequency conversion in time series.

Q3: Why is it important to correctly handle date and time data in time series analysis?
A: Correct datetime handling ensures accurate sorting, filtering, resampling, and analysis of trends over
time.

You might also like