0% found this document useful (0 votes)
53 views

Sas Baseball PROJECT

The document describes analyzing a baseball player performance dataset from 1986 using SAS. It includes: 1) Importing the data and generating descriptive statistics; 2) Finding the top 5 home run players and top 5 highest paid players; 3) Using linear regression to analyze the impact of home runs on salary and identifying other significant factors; 4) Calculating performance scores using a formula and analyzing its impact on salary.

Uploaded by

muskaan bhadada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Sas Baseball PROJECT

The document describes analyzing a baseball player performance dataset from 1986 using SAS. It includes: 1) Importing the data and generating descriptive statistics; 2) Finding the top 5 home run players and top 5 highest paid players; 3) Using linear regression to analyze the impact of home runs on salary and identifying other significant factors; 4) Calculating performance scores using a formula and analyzing its impact on salary.

Uploaded by

muskaan bhadada
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

PROJECT: Baseball Player Performance

The Baseball dataset contains details of baseball players in the year 1986. The data also has
parameters depicting performance of the players and their career records.

Do the following using SAS:

a) Import the data in SAS.

Solution: proc import datafile="/folders/myfolders/baseball.xlsx"

out=work.baseball

DBMS=xlsx

replace;

run;

proc print data=work.baseball;

run;

b) Generate Descriptive Statistics of the entire data.

Solution: proc means data=work.baseball;

run;
c) Generate a list of the top 5 Home Run Players.

Solution: proc sort data=work.baseball

out=baseball_data;

by descending nHome;

run;

data top_5H;

set baseball_data (obs=5);

run;

Title "Top 5 Home Run Scorer";

proc print data=top_5H;

run;

d) Generate a list of the top 5 paid Players.


Solution: proc sort data=work.baseball

out=baseball2;

by descending Salary;

run;

data Top_paid;

set baseball2 (obs=5);

run;

title "Top 5 paid Player";

proc print data=top_paid;

run;

e) Find the impact of Home Runs on Salary using Linear Regression.

Solution: proc reg data=work.baseball;

Model Salary=nHome;

output out= Predicted predicted=Pred_Salary;

title "Regression analysis(Salary~nHome)";

run;
f) Add more explanatory variables nAtBat, nHits, nHome, nRuns, nRB, nBB, NBB, nOuts, nError.

Solution: proc reg data=work.baseball;

Model Salary=nHome nAtBat nHits nRuns nRBI nBB nOuts nError;

output out=Pred_Salary residual=resid Predicted=Pred;

title "Regression analysis 2";

run;
g) Identify from the results, which factors have high impact on Salary in comparison to Home
Runs.

Solution: From the above results we can see that nHits, Nbb, nOuts,nAtBat are significant
factors that have impact on salary as p value for thaem is less than 0.05 While p-value for
nHome is 0.7838 (>0.05). So nHome is insignificant and does not impact the Salary.Also For
Factors like nRuns ,Nrbi and nError p-value >0.05 So these factors are also insignificant. So
nHits, Nbb, nOuts,nAtBat have high impact on Salary as compared to nHome.

h) Calculate performance scores (ps) by applying the following formula:


ps= 3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB + 2*nOuts - 1*nError

Solution: data Performance_score;

set work.baseball;

Do ps=3*nHome + 0.5*nHits + 1*nRuns +1* nAtBat - 1*nRBI + 0.3*nBB +


2*nOuts - 1*nError;

end;

run;

proc print data=Performance_score;

run;
i) Calculate the impact of Performance Scores (ps) on Salary.

Solution: proc reg data=performance_score;

model Salary=ps;

output out=performance_score Predicted=Pred;

run;
j) Explain the results.

Solution: From the above results we can see that although ps is significant as p-value for ps
(<0.0001) is less than 0.05 but adjusted R-square value is 0.1573 i.e. adjusted R-square <0.7 so
the regression model is insignificant this implies that salary is correlated with ps but ps does
not explain much of variability in salary.

You might also like