Case Study Real Estate

Uploaded by

Vishwas tomar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views4 pages

Case Study Real Estate

Uploaded by

Vishwas tomar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Business understanding and Data understanding are very critical first couple of steps for any

data science project. Read the information given below and also refer to the data dictionary
provided separately in an excel file to build your understanding.

Problem Statement:

A banking institution requires actionable insights from the perspective of Mortgage-Backed

Securities, Geographic Business Investment and Real Estate Analysis.

The objective is to identify white spaces/potential business in the mortgage loan. The
mortgage bank would like to identify potential monthly mortgage expenses for each of
region based on factors which are primarily monthly family income in a region and rented
value of the real estate. Some of the regions are growing rapidly and Competitor banks are
selling mortgage loans to subprime customers at a lower interest rate. The bank is
strategizing for better market penetration and targeting new customers. A statistical model
needs to be created to predict the potential demand in dollars amount of loan for each of
the region in the USA. Also, there is a need to create a dashboard which would refresh
periodically post data retrieval from the agencies. This would help to monitor the key
metrics and trends.

The dashboard must demonstrate relationships and trends for the key metrics as follows:
number of loans, average rental income, monthly mortgage and owner’s cost, family income
vs mortgage cost comparison across different regions. The metrics are described not to limit
the dashboard to these few only.

Dataset Description
Following are the themes the fields fall under Home Owner Costs: Sum of utilities, property
taxes.

 Second Mortgage: Households with a second mortgage statistics.

 Home Equity Loan: Households with a Home equity Loan statistics.
 Debt: Households with any type of debt statistics.
 Mortgage Costs: Statistics regarding mortgage payments, home equity loans, utilities
and property taxes
 Home Owner Costs: Sum of utilities, property taxes statistics
 Gross Rent: Contract rent plus the estimated average monthly cost of utility features
 Gross Rent as Percent of Income Gross rent as the percent of income very interesting
 High school Graduation: High school graduation statistics.
 Population Demographics: Population demographic statistics.
 Age Demographics: Age demographic statistics.
 Household Income: Total income of people residing in the household.
 Family Income: Total income of people related to the householder.
Approach :

Following pointers will be helpful to structure your findings.

1. Import data

2. Figure out the primary key and look for the requirement of indexing

3. Gauge the fill rate of the variables and devise plans for missing value treatment. Please
explain explicitly the reason for the treatment chosen for each variable.

(Note: If you are using SAS, please ignore the portion of the question where visualization is
required. Please write a macro for all data wrangling steps instead.)

4. Understanding homeowner costs are incredibly valuable because it is positively

correlated to consumer spending which drives the economy through disposable income.
Perform debt analysis. You may want to follow the following steps:

 Explore the top 2,500 locations where the percentage of households with a second
mortgage is the highest and percent ownership is above 10%. Visualize using geo-
map. You may keep the upper limit for the percent of households with a second
mortgage to roughly 50%.
 Bad debt is the debt you should avoid at all costs such as a second mortgage or
home equity loan. Conversely, Good debt is all other debt not including second
mortgage or home equity loan.
Bad Debt Equation:
Bad Debt = P (Second Mortgage ∩ Home Equity Loan)
Bad Debt = second_mortgage + home_equity - home_equity_second_mortgage
 Create pie charts (Venn diagram) to show overall debt (% bad and good debt) and
bad debt (2 mortgage and home equity loan).
 Create Box and whisker plot and analyze the distribution for 2nd mortgage, home
equity, good debt and bad debt for different cities.
 Create a collated income distribution chart for family income, house hold income
and remaining income.

5. Perform EDA and come out with insights into population density and age. You may
require deriving new fields (Make sure to weight averages for accurate measurements):

 Population density (hint-use ‘pop’ and ‘Aland’ to calculate)

 median age (hint-use the variables ‘male_age_median’, ‘female_age_median’,
‘male_pop’, ‘female_pop’)
Visualize the findings using appropriate chart type.

6. Create bins for population into a new variable by selecting appropriate class interval so
that the no of categories(bins) don’t exceed 5 for the ease of analysis. Analyze the
married, separated and divorced population for these population brackets. Visualize
using appropriate chart type.
7. Please detail your observations for rent as a percentage of income at an overall level and
for different states.

8. Perform correlation analysis for all the relevant variables by creating a heatmap.
Describe your findings.

9. The economic multivariate data has a significant number of measured variables. The
goal is to find where the measured variables depend on a number of smaller unobserved
common factors or latent variables. Each variable is assumed to depend on a linear
combination of the common factors, and the coefficients are known as loadings. Each
measured variable also includes a component due to independent random variability,
known as "specific variance" because it is specific to one variable. Obtain the common
factors and then plot the loadings. Use factor analysis to find latent variables in our
dataset and gain insight into the linear relationships in the data
 Highschool graduation rates
 Median population age
 Second Mortgage Statistics
 Percent Own
 Bad Debt Expense

10. Build a linear Regression model to predict the total monthly expenditure for home
mortgages loan; please refer - ‘deplotment_RE.xlsx’.
Column hc_mortgage_mean is predicted variable. This is mean monthly mortgage and
owner costs of specified geographical location.
Note: Exclude loans from prediction model which have NaN values for
hc_mortgage_mean. NaN represents not a number/missing values.

 Run a model at a Nation level. If the accuracy levels and R square are not
satisfactory proceed to below step
 Run another model at State level. There are 52 states in USA.

Considerations: Keep below considerations while building a linear regression model

 Variables should have significant impact on predicting Monthly mortgage and owner
costs
 Utilize all predictor variable to start with initial hypothesis
 R square of 60% and above should be achieved
 Ensure Multi-collinearity does not exist in dependent variables
 Test if predicted variable is normally distributed

11. Create a dashboard in tableau by choosing appropriate chart types and metrics useful
for the business. The dashboard must entail the following:

a) Box plot of distribution of average rent by type of place (Village, urban, town etc.)
b) Pie charts (Venn diagram) to show overall debt (% bad and good debt) and bad debt
(2 mortgage and home equity loan)
c) Explore the top 2,500 locations where the percentage of households with a second
mortgage is the highest and percent ownership is above 10%. Visualize using geo-
map.
d) Heat map for correlation matrix
e) Pie chart to show the population distribution across different types of places (Village,
urban, town etc.)

Linear Regression Assignment
0% (2)
Linear Regression Assignment
8 pages
Thera Bank-Project
100% (12)
Thera Bank-Project
26 pages
Census Income Project
No ratings yet
Census Income Project
4 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Making predictions
No ratings yet
Making predictions
13 pages
Real Estate
No ratings yet
Real Estate
10 pages
Prediction
100% (1)
Prediction
10 pages
Explatory data Analysis
No ratings yet
Explatory data Analysis
18 pages
Mini Project-Data Mining
No ratings yet
Mini Project-Data Mining
25 pages
Dawit House
No ratings yet
Dawit House
49 pages
Assignment 01
No ratings yet
Assignment 01
2 pages
Economic Data Analysis (Finance Analyst)
No ratings yet
Economic Data Analysis (Finance Analyst)
38 pages
Business Report Project Data Mining
No ratings yet
Business Report Project Data Mining
50 pages
Internship_Entrance_Test
No ratings yet
Internship_Entrance_Test
2 pages
Module 2
No ratings yet
Module 2
20 pages
MA SecA Group7
No ratings yet
MA SecA Group7
20 pages
Project 1
No ratings yet
Project 1
3 pages
Linear Models Reading
No ratings yet
Linear Models Reading
26 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
AML-2203 Advanced Python AI and ML Tools Assignment
No ratings yet
AML-2203 Advanced Python AI and ML Tools Assignment
19 pages
Capstone_Project
No ratings yet
Capstone_Project
33 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Project On Data Mining-Raveendra Babu Gaddam
No ratings yet
Project On Data Mining-Raveendra Babu Gaddam
29 pages
Capastone Project Taiwan Customer Default
67% (3)
Capastone Project Taiwan Customer Default
36 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Data Science Assignment
No ratings yet
Data Science Assignment
2 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
FML PROJECT diya (1) (1)
No ratings yet
FML PROJECT diya (1) (1)
9 pages
Course Project - Credit Risk Analyser
No ratings yet
Course Project - Credit Risk Analyser
8 pages
Week2 Excel Problem Statement Real Estate-1
No ratings yet
Week2 Excel Problem Statement Real Estate-1
2 pages
Cart Project
75% (4)
Cart Project
17 pages
Intro To R Introspection
No ratings yet
Intro To R Introspection
24 pages
Scalable Data Processing in R
No ratings yet
Scalable Data Processing in R
8 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
House Prices
No ratings yet
House Prices
5 pages
Data Preparation
No ratings yet
Data Preparation
4 pages
PFDA(programming for Data analysis) APU
No ratings yet
PFDA(programming for Data analysis) APU
60 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
ml project part a 1
No ratings yet
ml project part a 1
6 pages
Customer
No ratings yet
Customer
20 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages
Project1 Report
No ratings yet
Project1 Report
21 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
100% (1)
Project Presentation On House Price Prediction System: Presented by Name: Simran B Solanki Roll No: 19020
32 pages
Drop - Merge Merge 1:1 Aacode Using Ehs08 - Satis Drop - Merge
No ratings yet
Drop - Merge Merge 1:1 Aacode Using Ehs08 - Satis Drop - Merge
3 pages
ads
No ratings yet
ads
8 pages
House Price Prediction 1
No ratings yet
House Price Prediction 1
27 pages
MGT515AC SP24 GroupAssignment Case
No ratings yet
MGT515AC SP24 GroupAssignment Case
2 pages
Spark Python Course APPLY Project Problem Statement
No ratings yet
Spark Python Course APPLY Project Problem Statement
3 pages
Bank_Loan_ppt
No ratings yet
Bank_Loan_ppt
45 pages
module_2
No ratings yet
module_2
35 pages
AIMLlatestmodule 2Notes Removed
No ratings yet
AIMLlatestmodule 2Notes Removed
33 pages
Tutorial Worksheet wk6
No ratings yet
Tutorial Worksheet wk6
3 pages
2 Agency Theory 1
No ratings yet
2 Agency Theory 1
23 pages
9 Stock Market Efficiency
No ratings yet
9 Stock Market Efficiency
19 pages
Risk Management in Banks
No ratings yet
Risk Management in Banks
11 pages
Sem 1
No ratings yet
Sem 1
2 pages
Game Theory1
No ratings yet
Game Theory1
15 pages
Duality
No ratings yet
Duality
7 pages
Scan Mar 30, 2023 PDF
No ratings yet
Scan Mar 30, 2023 PDF
23 pages