0% found this document useful (0 votes)

94 views

IS5312 Mini Project-2

This project involves analyzing an HR dataset containing employee information using Python. Students will practice data manipulation, exploratory data analysis, and predictive modeling skills. The tasks include reading data files, descriptive statistics, data visualization, and partitioning the data for attrition prediction modeling. Students are to submit a Jupyter Notebook with all code and outputs as well as a 4 page report on the analysis results. Completing optional tasks can provide a 10% bonus to the final grade.

Uploaded by

lengbiao111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

IS5312 Mini Project-2

Uploaded by

lengbiao111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IS5312 Analytical Programming with Python

Project Description

Updated on October 24, 2023

1 Project Objectives
In this project, you will practice manipulating data files, processing data, conducting
exploratory data analysis, and making predictions based on data. This project has two
objectives:
Ø By conducting this project, students can review and comprehensively practice
most Python programming skills learned from this course:

• Numbers and variables

• Input and output
• Relational operations
• Strings
• List and tuple
• Set and dictionary
• If-else control flow
• For and while loop control flow
• File processing
• Functions
• Module/Package
• Class
• Numpy
• Pandas
• Visualization/Matplotlib

Ø We also include tasks a little beyond the above listed points but still with
reasonable difficulty level, specifically, Task 3.3. The rationale is that when
programming, it is very usual to come across new problems you have never seen
before, especially considering the rapid development of programming
technology and tools. Hence, it is necessary to train students to solve new
problems creatively. Task 3.3 is designed to induce students to train their
creative problem solving skills when facing new programming tasks. With the
help of references from books, papers, and Internet, students can solve these
tasks successfully.

2 Data Description
This project has two data files. The first data file named variables.txt is detailed
definitions of the variables in the data set. The second data file named
HR_Analytics.csv offers a comprehensive and varied analysis of an organization's
employees. It contains 1470 observations and 35 variables. The variables could be
further classified into 4 types, which include Personal factors, Financials factors, Job-
related factors, and Attrition factors. The Personal factors consist of demographic
factors such as age, gender, and so on. The Financials factors include employees’
salary-related factors like Monthly Income, Hourly Rate, etc. The Job-related factors
include variables related to the job characteristic, while the Attrition factors only
include NumCompaniesWorked and Attrition.

3 Tasks
You need to write Python programs to finish the following tasks and manual
manipulations do not count in your score. For those tasks labeled by Optional, students
can freely decide to do or not. The optional tasks do not count in the total score. But
successfully finishing the optional tasks will obtain a grade bonus of 10% for each task.

3.1 Read data from txt & csv file (30%)

Ø (6%) Please write code to read data stored in the file named variables.txt.

Ø (10%) Please write code to delete the brief introduction content at the first
several lines of the file, delete the column of Definition and Types, only keep
Variable Names as a list.

Ø (8%) Please read data HR_Analytics.csv as a dataframe, add column names to

the dataframe with the above Variable Names list.

Ø (6%) Please store the combined dataset into a new CSV file named
dataforanalysis.csv.

3.2 Exploratory data analysis (60%)

To overview the distribution of data in the dataset, you need to conduct the
descriptive statistics:

Ø (10%) Please find all numerical variables, conduct descriptive statistics and
draw histograms of the variables.

Ø (8%) Please find and print all the values of all categorical variables as the figure
1 shows (partial example).
Figure 1 Partial example of categorical variables

You are also required to explore the valuable information from the dataset, such
as:

Ø (6%) Please calculate the monthly income of each education level and draw a
line chart to show how average monthly income vary with educational
attainment.

Ø (8%) Please calculate the turnover rates by department and gender, print the
results in a table and draw a bar chat to show the turnover of employees of
different genders in different departments.

Ø (8%) Please calculate the number of employees in each department with

monthly salary higher than the average monthly income of the whole company,
and draw a pie chart to show the distribution.

Ø (8%) Please create a pie chart to show the proportion of different levels of
monthly income in attrition group like Figure 2, you can choose any color
combination for the chart.
Figure 2 Attrition by income group

Ø (12%) Given an age group list [‘<25’, ‘25-35’, ‘35-45’, ‘45-60’], please add a
column named age_group to the dataframe, filling values by the division of the
age group list, then draw a pie chart to display the distribution of employees
counts by age group. Calculate and print a table to show within each age group
the number of employees leave within one year if they did not get a promotion.

3.3 Partitioning data and predicting attrition (10%+20% bonus optional)

Ø (Optional, 10%) To make the subsequent analysis more convenient. Transform
categorical variables to numerical type, and print the first 5 lines (you can try
LabelEncoder).

Ø (10%) Partitioning data set into train data set and test data set. The train data set
should be about 80% of all data points and the test data set should be 20% of
them. Print their rows:

Ø (Optional, 10%) Predict the attrition based on other variables in the file. The
prediction model can be decision tree or any other feasible one. Evaluating the
performance and print the accuracy. Here is an example (you can try
accuracy_score):

4 Submission Files
To obtain scores of the project, you need to submit these files:

Ø A outputted .html file containing all your source codes and the running results
by the order of tasks from the Jupyter Notebook. To name this file, please follow
this format: studentNameStudentNumberprojectcode.html. For example, if you
are CHAN Wai Ting and your student No. is 55664332, then your submitted
source code file should be named as CHANWaiTing55664332projectcode.html.
Please put all source codes into one file for the ease of grading.

Ø The CSV file generated during Tasks 3.1.

Ø A report on the results of exploratory data analysis (Task 3.2) and attrition
prediction (Task 3.3), with four pages at most. To name this file, please follow
this format: studentNameStudentNumberprojectreport.pdf. For example, if you
are CHAN Wai Ting and your student No. is 55664332, then your submitted
report file should be named as CHANWaiTing55664332projectreport.pdf.

References
Karanth, M. (2020). Tabular summary of HR analytics dataset. [Data set]. Zenodo.
https://doi.org/10.5281/zenodo.4088439

College Management Full Document
75% (129)
College Management Full Document
56 pages
Case Management
100% (1)
Case Management
22 pages
Python practice questions (1)
No ratings yet
Python practice questions (1)
5 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Matplotlib Project Report AIPT (2)
No ratings yet
Matplotlib Project Report AIPT (2)
6 pages
HR Analyst (Data Analyst)
No ratings yet
HR Analyst (Data Analyst)
11 pages
PySpark_slides
No ratings yet
PySpark_slides
30 pages
CC7182 - Programming For Data Analytics
No ratings yet
CC7182 - Programming For Data Analytics
9 pages
Exercise 1
No ratings yet
Exercise 1
2 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Data Science
No ratings yet
Data Science
18 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
Module 7 _ Advanced Python Tools Assignment DS
No ratings yet
Module 7 _ Advanced Python Tools Assignment DS
3 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Practical File Questions
No ratings yet
Practical File Questions
2 pages
DIVP PYQ 2023
No ratings yet
DIVP PYQ 2023
7 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Project
No ratings yet
Data Project
12 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Employee Turnover
No ratings yet
Employee Turnover
19 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
PR LIST DSBDA
No ratings yet
PR LIST DSBDA
2 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Crystal Reports Introduction: Versions 2008-2016
From Everand
Crystal Reports Introduction: Versions 2008-2016
Seth Bonder
No ratings yet
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
[email protected]
No ratings yet
[email protected]
13 pages
Exp 8_LM
No ratings yet
Exp 8_LM
10 pages
Topics
No ratings yet
Topics
11 pages
Coding Notes Data Science
No ratings yet
Coding Notes Data Science
4 pages
22067515 Kushal Kadayat
No ratings yet
22067515 Kushal Kadayat
33 pages
Python Practical Questions@Subas
No ratings yet
Python Practical Questions@Subas
7 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Course: Applied Statistics Projects: Bui Anh Tuan March 1, 2022
No ratings yet
Course: Applied Statistics Projects: Bui Anh Tuan March 1, 2022
9 pages
EMPLOYEE PERFORMANCE ANALYSIS
No ratings yet
EMPLOYEE PERFORMANCE ANALYSIS
3 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
dav 2024 pyq
No ratings yet
dav 2024 pyq
7 pages
manishadav
No ratings yet
manishadav
27 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
XII - IP - Practical - List 2023-24
No ratings yet
XII - IP - Practical - List 2023-24
4 pages
SMARAN HR Analytics - Ipynb - Colab
No ratings yet
SMARAN HR Analytics - Ipynb - Colab
65 pages
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
No ratings yet
Cs Sem III Dav Upc 2343012002 Sl. No. Qp. 1673 Dec '23
12 pages
Singh_Project1_Report
No ratings yet
Singh_Project1_Report
12 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
Ml Lab Manual 2024
No ratings yet
Ml Lab Manual 2024
41 pages
Ai Class 12 Practical
No ratings yet
Ai Class 12 Practical
21 pages
Assignment 1 DA_E Oct 2023 V1-1 (3)
No ratings yet
Assignment 1 DA_E Oct 2023 V1-1 (3)
3 pages
CLS - Xii - Ip - Practical & Project - 2022-23
No ratings yet
CLS - Xii - Ip - Practical & Project - 2022-23
6 pages
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
Cracking J2ME Applications
No ratings yet
Cracking J2ME Applications
16 pages
Programming in LaTex
No ratings yet
Programming in LaTex
6 pages
OOAD Q Bank IT (All Units)
No ratings yet
OOAD Q Bank IT (All Units)
5 pages
WWW - Manaresults.co - In: MAY/JUNE-2023 Dcme - Fifth Semester Examination Board Diploma Examination, (C-20)
No ratings yet
WWW - Manaresults.co - In: MAY/JUNE-2023 Dcme - Fifth Semester Examination Board Diploma Examination, (C-20)
3 pages
Synopsis
No ratings yet
Synopsis
4 pages
DOP Trial Guideline
No ratings yet
DOP Trial Guideline
8 pages
Plant Simulation 14 Manual
No ratings yet
Plant Simulation 14 Manual
4 pages
Positive Cinemas Field Comparison
No ratings yet
Positive Cinemas Field Comparison
6 pages
Java Study Guide
No ratings yet
Java Study Guide
24 pages
Spring Net Reference
No ratings yet
Spring Net Reference
499 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
LAB 03b-Manage Azure Resources by Using ARM Templates
No ratings yet
LAB 03b-Manage Azure Resources by Using ARM Templates
3 pages
Releases Marmelab:react-Admin
No ratings yet
Releases Marmelab:react-Admin
11 pages
Senior Technology Architect Resume
No ratings yet
Senior Technology Architect Resume
6 pages
Profile
No ratings yet
Profile
4 pages
Omnet
No ratings yet
Omnet
30 pages
Download Full The Python Book 1st Edition Rob Mastrodomenico PDF All Chapters
100% (12)
Download Full The Python Book 1st Edition Rob Mastrodomenico PDF All Chapters
66 pages
A_Pragmatic_Comparison_of_Four_Different_Programmi
No ratings yet
A_Pragmatic_Comparison_of_Four_Different_Programmi
15 pages
Visual Basic Array Tutorial
No ratings yet
Visual Basic Array Tutorial
8 pages
Sample Exam Paper
No ratings yet
Sample Exam Paper
3 pages
221002504-221902030_CSE302_Project Proposal
No ratings yet
221002504-221902030_CSE302_Project Proposal
5 pages
Introduction To Cybercrime and Environmental Laws and Protection
100% (1)
Introduction To Cybercrime and Environmental Laws and Protection
33 pages
Bukutermux by Orang
No ratings yet
Bukutermux by Orang
14 pages
MS For 2023 MJ v22
No ratings yet
MS For 2023 MJ v22
5 pages
Kinetic ReleaseNotes 2021.2.10
No ratings yet
Kinetic ReleaseNotes 2021.2.10
34 pages
An Airport Flights Database System Report
No ratings yet
An Airport Flights Database System Report
25 pages
Aman Arora: Junior Research Programmer
No ratings yet
Aman Arora: Junior Research Programmer
2 pages
Windows Hardware Drivers
No ratings yet
Windows Hardware Drivers
164 pages

IS5312 Mini Project-2

Uploaded by

IS5312 Mini Project-2

Uploaded by

IS5312 Analytical Programming with Python

Updated on October 24, 2023

• Numbers and variables

3.1 Read data from txt & csv file (30%)

Ø (8%) Please read data HR_Analytics.csv as a dataframe, add column names to

3.2 Exploratory data analysis (60%)

Ø (8%) Please calculate the number of employees in each department with

3.3 Partitioning data and predicting attrition (10%+20% bonus optional)

Ø The CSV file generated during Tasks 3.1.

You might also like