Python Unit 2 Question Bank (2)
Python Unit 2 Question Bank (2)
1. Introduction to Pandas
Understanding Data Structures: Series and DataFrames
1. What is a Pandas Series? How is it different from a Python list or NumPy array?
2. Write Python code to create a Series with numbers from 1 to 5 and custom labels ['a',
'b', 'c', 'd', 'e'].
3. What is a DataFrame in Pandas? How is it structured?
4. Create a DataFrame from the following dictionary and display its structure:
python
Copy code
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
6. How do you read a CSV file named sales_data.csv into a Pandas DataFrame? What
parameters can you use to handle headers or missing values?
7. How do you read an Excel file with multiple sheets? Provide example code to read a
specific sheet.
8. What are some common parameters in pd.read_csv() that are useful for reading messy
files (e.g., sep, header, skiprows, na_values)?
9. Write code to export a DataFrame named df to a file called output.xlsx without writing
the index.
10. What file formats can Pandas handle for reading and writing data?
🧹 3. Data Cleaning
Handling Missing Data and Duplicates
Filtering Data
15. Given a DataFrame df with columns ['name', 'age', 'salary'], write code to filter:
o All rows where age is above 30.
o All rows where salary is not null and greater than 50,000.
o All rows where name is either “Alice” or “Bob”.
16. How do you sort a DataFrame by multiple columns (e.g., sort by age ascending and
salary descending)?
17. What is the difference between .loc[] and .iloc[]? Provide examples for both.
18. What does reset_index() do, and when would you use it?
19. Write code to group a dataset by the department column and calculate the average
salary.
20. What is the difference between groupby() and pivot_table() in Pandas?
21. Given two DataFrames with a common column employee_id, how do you merge them
using an inner join?
22. How do you concatenate two DataFrames vertically and horizontally?
23. What is the purpose of the on, how, and suffixes parameters in the merge() function?
🗓️5. Working with Dates and Times
Date Handling and Time Operations
24. How do you convert a column with date strings to actual datetime objects in Pandas?
25. Write code to extract the following from a datetime column:
o Year
o Month
o Day of the week
26. How would you filter rows that fall within a specific date range?
27. What method do you use to set a datetime column as the DataFrame index? Why is this
useful?
28. How do you generate a date range from January 1 to January 10, 2023, with daily
frequency?
1. Introduction to Pandas
Q1: What are the key differences between a Pandas Series and a DataFrame?
A: A Series is a one-dimensional labeled array that can hold any data type, while a DataFrame is a two-
dimensional labeled data structure with columns of potentially different types, similar to a table.
Q3: How does Pandas differ from native Python data structures like lists and dictionaries?
A: Pandas provides high-performance, easy-to-use data structures optimized for tabular data,
supporting powerful data manipulation and analysis features not found in basic Python structures.
Q1: What are the advantages of using Pandas for reading and writing data files?
A: Pandas simplifies file I/O with intuitive functions, handles large files efficiently, and supports many
formats (CSV, Excel, JSON, SQL, etc.) while automatically converting data into usable DataFrames.
Q2: How does Pandas handle missing values during data import?
A: By default, Pandas interprets blank or NA strings as NaN, providing methods like fillna(), dropna(), and
parameters in read functions (e.g., na_values) to customize this behavior.
Q3: Compare the functions read_csv() and read_excel(). When would you use each?
A: read_csv() is used to read comma-separated value files and is faster for plain text files, while
read_excel() is used for Excel files and requires additional libraries like openpyxl or xlrd.
3. Data Cleaning
Q1: What methods does Pandas offer to detect and handle missing values?
A: Use isna(), notna() to detect; fillna(), dropna() to handle; and methods like forward/backward fill
(ffill(), bfill()).
Q2: Explain how the drop_duplicates() function works and when you might use it.
A: It removes duplicate rows based on specified columns. It's useful in data cleaning to ensure data
uniqueness and integrity.
Q3: Why is data filtering important before analysis, and how is it achieved in Pandas?
A: Filtering ensures only relevant, clean data is analyzed. It's achieved using boolean indexing or
methods like query() and loc[].
4. Data Manipulation
Q3: How does the groupby() function work, and what types of operations can it perform?
A: It splits data into groups based on some criteria and allows aggregation (sum(), mean()),
transformation, or filtration on each group.
Q1: How does Pandas store and represent date and time data?
A: Using datetime64 and Timedelta types, often converted using pd.to_datetime() for consistency and
performance.
Q2: What are the common operations you can perform on datetime objects in Pandas?
A: Extracting parts (dt.year, dt.month), date arithmetic (adding/subtracting dates), and resampling or
frequency conversion in time series.
Q3: Why is it important to correctly handle date and time data in time series analysis?
A: Correct datetime handling ensures accurate sorting, filtering, resampling, and analysis of trends over
time.