RM Lab Kartik Bba (B&i)
RM Lab Kartik Bba (B&i)
PRACTICAL FILE
Submitted by
NAME : KARTIK RATERIA
ENROLLMENT NO. : 01417701817
1
INDEX
Topic Page No.
Functions
Count 06
CountA 07
Count Blank 08
Sum 08
Max 09
Min 09
Average 10
CountIf 11
Average If 12
Sum If 13
Concatenate 14
VlookUp 15
HLookUp 16
Vlookup+ Dropdown 17
Other tools
Transpose table 20
Text to Column 21
Conditional Formatting – Highlight Cell rules (greater than, less than, between, 23
equal to, text that contains, a date occurring, duplicate values)
Conditional Formatting – Top/ Bottom rules 28
Conditional Formatting – Color Scales 29
Conditional Formatting – Data Bars 30
Format as Tables 30
Format Cells – Number, Alignment, Font, Border, Fill 31
Cell Styles 34
Data validation – settings ( any value, number , custom) 35
Data validation – input message 38
Data validation – error alert 42
Customization – quick access toolbar 45
Customization- ribbon 46
backstage view 47
save as adobe pdf 47
Data Visualization and Analysis
Frequency 51
Relative frequency 51
Percentage frequency 52
Bar Graph 52
Histogram using Graph tab 53
Pivot Table and its tools 54
Pivot Chart and its tools 57
2
Histogram frequency distribution 58
Histogram – Chart output 60
Histogram – Pareto (sorted diagram) 61
Histogram – Cumulative percentage 61
Descriptive statistics 62
Descriptive statistics for various scales 63
Correlation 66
Hypothesis Testing
One sample t test using dummy (one-tailed) 69
One sample t test using dummy (two-tailed) 70
One sample t test using test average (one- tailed) 71
One sample t test using test average (two- tailed) 73
t test using function (all combinations) 74
Two sample - Independent sample t test 75
Two sample - Paired Sample t test 78
One sample z test 79
Two sample z test 80
ANOVA – Single Factor 82
ANOVA – Two Factor without replication 84
ANOVA – Two Factor with replication 88
F test 89
Chi square test 94
Introduction to R
Downloading R 99
Four Panes in R 102
Import of Data Sheet in Excel 103
Descriptive Statistics 106
Correlation 108
Hypothesis Testing 109
One Sample T Test 110
Two Sample- Independent Sample T Test 110
Two Sample- Paired Sample T Test 114
One Way ANOVA 119
F Test 122
Chi Square Test 124
3
FUNCTIONS
Application of basic functions in MS Excel.
4
Considering this Data
5
The COUNT function counts the number of cells that contain numbers, and counts numbers
within the list of arguments. For example, you can enter the following formula to count the
numbers in the range D2:D18 =COUNT (D2:D18). In this example, if all of the cells in the
range contain numbers, the result is 14.
Syntax
Value1 Required. The first item, cell reference, or range within which you want to
count numbers.
6
2. CountA- To count the number of cells in a range those are not empty.
Syntax
COUNTA (value 1,[value 2]...)
7
3. Count Blank- To count the number of cells in a range that are blank.
Syntax
COUNTBLANK(range)
So, the answer of this function would be 3 as there are 3 cells blank in the score column.
8
5. Maximum Function- Gives the maximum value from the range.
Syntax
MAX( number1,[number2],...[number n])
9
7. Average- Gives the average value of the range.
Syntax
Number1 Required. The first number, cell reference, or range for which you want
the average.
Number2 Optional. Additional numbers, cell references or ranges for which you
want the average, up to a maximum of 255.
10
8. CountIf Function- To count the number of students in “Saffron” house.
So, according to this function, it tells that there are 5 people who belongs to
Saffron house.
11
9. Average If Function- The Microsoft Excel AVERAGEIF function returns the average
(arithmetic mean) of all numbers in a range of cells, based on a given criteria.
Syntax
AVERAGEIF (range, criteria, [average range])
For this we will add one more score column for taking references.
12
10. Sum If Function- is a function to sum cells that meet single criteria. SUMIF can be
used to sum cells based on dates, numbers, and text that match specific criteria. SUMIF
supports logical operators (>,<,<>,=) and wildcards (*,?) for partial matching.
Syntax
SUMIF(range, criteria, [sum range])
For this function, first we will add the two scores applying Sum Function.
For this,we will insert an another Column from Insert option in the Home
section.
13
11. Concatenate Function- This function is used to merge data of two cells.
Syntax
CONCATENATE(text 1, text2,...)
14
12. VLook Up Function- Vlookup (short for 'vertical' lookup) is a built-in
Excel function that is designed to work with data that is organised into columns.
For a specified value, the function finds (or 'looks up') the value in one column
of data, and returns the corresponding value from another column.
Syntax
=VLOOKUP (value, table, col_index, [range lookup])
Definitions-
15
13. HLookUp Function- HLOOKUP is an Excel function to lookup and retrieve data from a
specific row in table. The "H" in HLOOKUP stands for "horizontal", where lookup values appear in
the first row of the table, moving horizontally to the right. HLOOKUP supports approximate and
exact matching, and wildcards (* ?) for finding partial matches.
Syntax
=HLOOKUP (value, table, row_index, [range lookup])
Lookup_value Required. The value to be found in the first row of the table. Lookup_value
can be a value, a reference, or a text string.
o The values in the first row of table_array can be text, numbers, or logical values.
o If range_lookup is TRUE, the values in the first row of table_array must be placed in
ascending order: ...-2, -1, 0, 1, 2,... , A-Z, FALSE, TRUE; otherwise, HLOOKUP may not give the
correct value. If range_lookup is FALSE, table_array does not need to be sorted.
o Sort the values in ascending order, left to right. For more information, see Sort data in
a range or table.
Row_index_num Required. The row number in table_array from which the matching value
will be returned. A row_index_num of 1 returns the first row value in table_array, a row_index_num
of 2 returns the second row value in table_array, and so on. If row_index_num is less than 1,
16
HLOOKUP returns the #VALUE! error value; if row_index_num is greater than the number of rows
on table_array, HLOOKUP returns the #REF! error value.
Now, we can add ad dropdown list through data validation as shown below
17
This function helps to extract data of a specific person or thing from the abundant data from a single
click. It also prevents from writing the syntax again and again.
18
19
OTHER
TOOLS
TRANSPOSE TABLE
If you have a worksheet with data in columns that you need to rotate to rearrange it in rows, use
the Transpose feature. With it, you can quickly switch data from columns to rows, or vice versa.
For example, if your data looks like this, with Sales Regions in the column headings and and Quarters
along the left side:
20
Going to paste special, select transpose
21
Here’s how to do it:
1. Select the range of data you want to rearrange, including any row or column labels,
and press Ctrl+C.
2. Choose a new location in the worksheet where you want to paste the transposed table,
ensuring that there is plenty of room to paste your data. The new table that you paste there
will entirely overwrite any data / formatting that’s already there.
Right-click over the top-left cell of where you want to paste the transposed table, then
choose Transpose
3. After rotating the data successfully, you can delete the original table and the data in
the new table will remain intact.
TEXT TO COLUMN:-
How to Use Text-to-Columns in Excel
1. Open Excel and start a new Blank workbook.
2. Add entries to the first column and select them all.
3. Choose the Data tab atop the ribbon.
4. Select Text to Columns.
5. Ensure Delimited is selected and click Next.
6. Clear each box in the Delimiters section and instead choose Comma and Space.
7. Click Finish.
22
23
Conditional Formatting:-
Highlight Cell rules (greater than, less than, between, equal to, text that contains, a date occurring,
duplicate values)
Viva
Accountin Communicatio Compute La Math Tota Averag - Overal
Name g n r w s l e Voce l
Sonia 16 19 17 15 16 83 16.6 77 160
Kriti 17 19 18 17 15 86 17.2 78 164
Charu 20 15 17 19 16 87 17.4 65 152
Monika 17 17 20 18 15 87 17.4 81 168
Pooja 20 17 15 20 16 88 17.6 75 163
Poonam 17 16 18 15 19 85 17 85 170
Priya 16 19 18 19 18 90 18 60 150
Garima 15 16 15 17 17 80 16 74 154
Charu 19 18 16 20 19 92 18.4 74 166
Sakshi
Kakkar 17 20 15 16 17 85 17 84 169
Garima
Batra 19 19 17 19 18 92 18.4 61 153
Deepika
Jain 19 19 19 20 20 97 19.4 61 158
Soniya 15 15 20 20 19 89 17.8 78 167
Shalini 16 18 16 17 16 83 16.6 73 156
Mona 16 16 20 20 19 91 18.2 82 173
Pooja 18 19 18 16 17 88 17.6 78 166
24
Anita 19 19 19 18 19 94 18.8 59 153
Divya
Gandhi 16 19 15 20 16 86 17.2 78 164
Seema 17 18 19 17 19 90 18 73 163
Taruna
Gosain 20 19 20 20 15 94 18.8 75 169
Taruna 15 16 16 16 16 79 15.8 60 139
Sheetal 19 20 18 18 17 92 18.4 73 165
Mona 20 19 17 16 18 90 18 73 163
Megha
Gupta 16 16 20 16 16 84 16.8 85 169
Kamna 20 18 18 18 16 90 18 61 151
Payal
Ahuja 18 18 15 15 20 86 17.2 74 160
Pooja 19 15 20 17 20 91 18.2 73 164
Kriti
Khera 19 18 19 19 19 94 18.8 84 178
Anju 19 19 20 18 17 93 18.6 86 179
Bhawna 19 15 19 15 19 87 17.4 73 160
Monika 16 18 20 15 18 87 17.4 65 152
Sunita 16 18 20 20 18 92 18.4 82 174
Khushbo
o 19 19 19 20 20 97 19.4 73 170
Heena 15 16 20 20 15 86 17.2 58 144
Charu 18 16 16 19 17 86 17.2 89 175
Sonal 15 19 17 19 19 89 17.8 85 174
Sapna
Kharab 18 20 16 17 20 91 18.2 58 149
Deepika 19 20 17 17 20 93 18.6 56 149
Himani
Hans 16 17 17 16 19 85 17 59 144
Savina 18 16 16 17 17 84 16.8 92 176
25
1. Count the number of students who have got overall marks more than
165.
Less than
26
Between
Equal to
Duplicate
27
2. Highlight cells with green colour where total number of marks
is more than or equal to 90.
28
3. Highlight cells with blue where the name of students start with S.
4. Top/Bottom Rules
29
5. Color Scales
30
6. Data Bars
FORMAT AS TABLES
To format data as a table:
1. Select the cells you want to format as a table.
2. From the Home tab, click the Format as Table command in the Styles group.
3. Select a table style from the drop-down menu.
A dialog box will appear, confirming the selected cell range for the table
31
FORMAT CELLS
To apply number formatting:
1. Select the cells(s) you want to modify. Selecting a cellrange.
2. Click the drop-down arrow next to the Number Formatcommand on the Home tab.
3. The Number formatting drop-down menu will appear.
4. Select the desired formatting option.
5. The selected cells will change to the new formatting style
Following is the list-
1. Number
2. Alignment
3. Font
4. Border
5. Fill.
32
NUMBER
ALIGNMENT
33
FONT
BORDER
34
FILL
CELL STYLES
To apply a cell style:
Select the cell(s) you want to modify. Selecting a cell range.
Click the Cell Styles command on the Home tab, and then choose the
desired style from the drop-down menu. In our example, we'll choose Accent 1. Choosing
a cell style.
The selected cell style will appear. The new cell style.
35
DATA VALIDATION
36
5. On the Settings tab, under Allow, select an option:
6. Set the other required values, based on what you chose for Allow and Data. For
example, if you select between, then select the Minimum: and Maximum: values for the
cell(s).
7. Select the Ignore blank checkbox if you want to ignore blank spaces.
8. If you want to add a Title and message for your rule, select the Input Message tab,
and then type a title and input message.
9. Select the Show input message when cell is selected checkbox to display the
message when the user selects or hovers over the selected cell(s).
10. Select OK.
Now, if the user tries to enter a value that is not valid, a pop-up appears with the message,
“This value doesn’t match the data validation restrictions for this cell.”
Each worksheet is listed below, along with what kind of Data Validation you'll find
37
38
CUSTOM
INPUT MESSAGE
Data Validation Messages
With the options available in data validation, you can display messages to give instructions to
the people who use your spreadsheet. There are two types of data validation messages:
39
1. Select the cells in which you want to apply data validation
2. On the Ribbon, click the Data tab, and click Data Validation
3. (optional) On the Settings tab, choose the data validation settings
4. Click on the Input Message tab, and add a check mark to Show input message when
cell is selected
5. Type your message heading text in the Title box. This text will appear in bold print at
the top of the message.
6. Type a short message in the Input message box. Press the Enter key, to create line
breaks, if you want them.
NOTE: The limit is 255 characters
40
Input Message Size
Although there are 255 characters allowed in the Input Message box, the box has a maximum
height and width, and all the characters might not fit.
NOTE: The size of the message box cannot be changed -- it is automatically set by Excel.
For example, in the message box below, there are 254 "i" characters, with an "X" at the end.
However, in the message box below, there are 254 "W" characters, with an "X" at the end.
Only 126 of the characters appear in full, and the remaining characters are cut off, or not
visible.
41
If the cell is close to the right side of the Excel window, the right border of the input message
will start at the Excel window border.
If there is not enough room below the cell, the input message appears at the right side of the
cell, if there is enough room there.
If there is not enough room below the cell, or to the right, the input message appears at the
left side of the cell.
42
If there is a comment in the cell, the input message appears below the cell, with the right edge
of the message at the middle point of the cell's width. This can cause problems in column A,
where there is no room at the left, and the data validation message is cut off.
The location is only temporary -- the message box will return to its original position,
when you close and reopen the workbook.
ALL input messages on that worksheet will appear in that location, until the
workbook is closed and reopened.
43
Create an Error Alert
When you add data validation to a cell, the Error Alert feature is automatically turned on. It
blocks the users from entering invalid data in the cell.
You can turn Error Alert off, to allow people to enter invalid data. Or, change the type of
Error Alert, by following the instructions below.
44
overtyped.
If the Cancel button is clicked, the invalid entry is deleted, and the cell's
original content is restored.
The user cannot leave the invalid entry in the cell
45
6. Type your message heading text in the Title box. This text will appear in bold print at
the top of the message.
7. Type a short message in the Error message box. The limit is 225 characters
8. Click OK
If the type of data required is date but somehow other rows contain text or number. If
we import excels file with inconsistent types of data, it may cause errors on the other end.
Hence, data validation plays an important role in preventing these types of error.
Data Validation in Excel lets you control the data that can be entered in a cell. You can
restrict the user to enter only a specified range of numbers or text or date.
You can also use data validation functionality to create an Excel drop down list (which is
definitely one of the coolest and most powerful features in Excel)
46
Add a command to the Quick Access Toolbar
On the ribbon, click the appropriate tab or group to display the command that you
want to add to the Quick Access Toolbar.
Right-click the command, and then click Add to Quick Access Toolbar on the
shortcut menu.
CUSTOMIZATION- RIBBON
Click the Office Button;
Click the Excel Option button at the bottom, then you will enter the Excel Option
window;
47
Click the Popular button at the left;
Under Top Option for Working with Excel, check the Show Developer tab in the
Ribbon option.
Click Ok button to finish editing.
BACKSTAGE VIEW
Backstage View that allows you to manipulate aspects of a file. Backstage View is accessible by
clicking on the "File" tab near the top of the application window. The backstage view gives access to
saving, opening, info about the open file (Permissions, Sharing, and Versions), creating a new file,
printing, and recently opened files.
48
49
DATA
VISUALISATION
AND ANALYSIS
50
DATA TO BE CONSIDERED
Student Score
69
Rhea Madsen
81
Jennifer Mendez
69
Brett Broyles
81
Shirley Smith
100
John Brown
81
Michael G. Welch
100
Donald Tse
82
Madeline Stevens
81
Howard Porter
81
Helen Craven
69
Lillie Schultz
78
Emily Li
69
Michael Long
88
Chris Herrman
100
Marshall Sherman
82
William Grindle
69
Pauline Haun
81
Lydia J. Evans
28
James Weaver
51
1. FREQUENCY
The Microsoft Excel FREQUENCY function returns how often values occur within a set of data. It
returns a vertical array of numbers. The FREQUENCY function is a built-in function in Excel that is
categorized as a Statistical Function. It can be used as a worksheet function (WS) in Excel.
Syntax
=FREQUENCY (data_array, bins_array)
2)RELATIVE FREQUENCY
Relative Frequency is the percentage a specific frequency is of the total
frequencies.
52
3.PERCENTAGE FREQUENCy A percentage
frequency distribution is a display of data that specifies the percentage of observations that exist for
each data point or grouping of data points. It is a particularly useful method of expressing the
relative frequency of survey responses and other data.
4.BAR GRAPH
Open Excel. Locate and open the spreadsheet from which you want to make a bar chart.
Select all the data that you want included in the bar chart.
Be sure to include the column and row headers, which will become the labels in the bar chart. If you want
different labels, type them in the appropriate header cells.
Click on the Insert tab and then on Insert Column or BarChartbutton in the Charts group. You'll see many
options when you select this button, such as 2-D columns and 3-D columns, as well as 2-D and 3-D bars. For
these purposes, we're selecting 2-D columns.
The chart will appear. You'll also see horizontal bars giving the names of your headers at the bottom of your
graph.
Next, give your chart a name. Click on the Chart Title section at the top of the graph and the section
becomes editable.
Decide where to place the bar chart. It can be placed on a separate sheet or it can be embedded in the
spreadsheet. Then save it.
If you want to delete the chart and start all over again, place your cursor on the edge of the chart (you'll get a
pop-up that says "chart area") and press your Delete key.
53
5.Histogram using graph tab
A histogram is a specific use of a column chart where each column represents the frequency
of elements in a certain range. In other words, a histogram graphically displays the number of
elements within the consecutive non-overlapping intervals, or bins.
54
6.PIVOT TABLE
A pivot table is a program tool that allows you to reorganize and summarize selected
columns and rows of data in a spreadsheet or database table to obtain a desired report. ... For
example, a store owner might list monthly sales totals for a large number of merchandise
items in an Excel spreadsheet.
The following dialog box appears. Excel automatically selects the data for you. The default
location for a new pivot table is New Worksheet.
Click OK.
55
Consider the data given for Sales report
Date Salesperson Company Product Sales Value
1/31/2010 JJJ North Rental 29,546.00
3/10/2010 BBB North Flexelease 20,132.00
9/6/2010 GGG South Operating Lease 42,214.00
1/10/2010 EEE South Operating Lease 30,123.00
6/10/2010 BBB North Contract Hire 42,939.00
3/1/2010 DDD South Flexelease 68,804.00
1/9/2010 KKK North Contract Hire 41,979.00
6/2/2010 AAA North Contract Hire 41,485.00
2/10/2010 EEE South Capital Lease 63,237.00
7/10/2010 AAA North Operating Lease 66,944.00
1/10/2010 DDD South Rental 32,445.00
10/10/2010 FFF South Flexelease 41,345.00
1/10/2010 EEE South Rental 62,493.00
10/6/2010 GGG South Flexelease 27,628.00
5/1/2010 GGG South Capital Lease 55,421.00
10/10/2010 FFF South Contract Hire 40,622.00
4/10/2010 CCC North Contract Hire 36,208.00
8/10/2010 CCC North Flexelease 33,299.00
12/10/2010 DDD South Capital Lease 36,286.00
4/10/2010 JJJ North Rental 30,289.00
8/10/2010 HHH South Rental 20,805.00
3/10/2010 FFF South Contract Hire 60,837.00
8/10/2010 KKK North Capital Lease 47,350.00
11/10/2010 KKK North Operating Lease 49,368.00
9/10/2010 AAA North Operating Lease 39,292.00
8/8/2010 JJJ North Flexelease 38,261.00
4/10/2010 BBB North Flexelease 72,022.00
9/10/2010 EEE South Capital Lease 59,960.00
3/10/2010 AAA North Rental 71,212.00
8/10/2010 DDD South Contract Hire 58,338.00
1/10/2010 CCC North Flexelease 37,862.00
2/28/2010 AAA North Flexelease 52,639.00
9/10/2010 JJJ North Rental 61,021.00
6/10/2010 EEE South Capital Lease 64,552.00
1/10/2010 DDD South Capital Lease 51,404.00
12/10/2010 CCC North Rental 68,183.00
3/7/2010 JJJ North Operating Lease 74,061.00
12/10/2010 GGG South Capital Lease 65,538.00
5/10/2010 AAA North Rental 52,173.00
3/10/2010 KKK North Operating Lease 40,175.00
6/10/2010 JJJ North Capital Lease 54,463.00
6/10/2010 CCC North Contract Hire 42,500.00
7/10/2010 HHH South Rental 35,866.00
9/10/2010 GGG South Capital Lease 72,784.00
6/8/2010 CCC North Contract Hire 64,475.00
12/10/2010 AAA North Capital Lease 22,924.00
2/10/2010 KKK North Contract Hire 24,145.00
11/10/2010 HHH South Contract Hire 54,353.00
2/10/2010 BBB North Contract Hire 31,127.00
56
Now drag the “Product” title from Row Labels to Column Labels.
57
7.Pivot Chart
A pivot chart is especially useful for user when dealing with tremendous amounts of data. For
example, a society having a large number of employees is maintaining the working hours of
each pupil through Excel, such that at the end of each month, the employee with the highest
number of working hours, would be provided a bonus, due to the sincerity and devotion to the
society. While dealing with the complete list of society members would be very time
consuming and may even be erroneous, a pivot table, or a pivot chart, for that matter, would
allow quickly reorganizing and visualizing data in an understandable manner and facilitate
the entire process.
58
8.Histogram using frequency distribution
1. On the Data tab, in the Analysis group, click the Data Analysis button.
To do this, you can place the cursor in the box, and then simply select the corresponding
range on your worksheet using the mouse. Alternatively, you can click the Collapse
Dialog button , select the range on the sheet, and then click the Collapse
Dialog button again to return to the Histogram dialog box.
Tip. If you included column headers when selecting the input data and bin range, select
the Labels check box.
59
o Select the Output options.
To place the histogram on the same sheet, click Output Range, and then enter the upper-
left cell of the output table.
To paste the output table and histogram in a new sheet or a new workbook, select New
Worksheet Ply or New Workbook, respectively.
60
PARETO
61
DESCRIPTIVE STATISTICS
62
Descriptive statistics are one of the fundamental “must knows” with any set of data. It gives you a general idea
of trends in your data including:
63
Using vlookup, find gender code
64
Repeat the same with qualification and work experience
65
8.CORRELATION
We usually use correlation coefficient (a value between -1 and 1) to display how strongly two variables are related to each other. In
Excel, we also can use the CORREL function to find the correlation coefficient between two variables.
66
67
68
HYPOTHESIS
TESTING
STEPS :
1) Consider the data given of work experiences of 24 people.
2) Create a second column under the name dummy and fill in values as 0
3) Go to data analysis >t test two sample assuming unequal variances.
69
4) Select work experiences as input range 1 and dummy column as input range 2.
5) Enter Hypothesised mean difference as ‘20’ and Alpha as ‘0.05’. Select the output range and click
‘OK’. Click ok.
For one tailed testing, highlight the one tailed p value and strikethrough the two tailed p value.
HYPOTHESIS –
H0 : μ ≤ 20
H1 : μ > 20
DECISION RULE:
1) If t stat is greater than t critical, reject Null.
2) If P value < alpha, reject Null.
INFERENCE: Since, absolute value of t stat is greater than t critical, we will reject null. Therefore,
we can say that, μ > 20
70
Therefore, reject null.
STEPS :
1) Consider the data given of work experiences of 24 people.
2) Create a second column under the name dummy and fill in values as 0
3) Go to data analysis >t test two sample assuming unequal variances.
4) Select work experiences as input range 1 and dummy column as input range 2.
5) Enter Hypothesised mean difference as ‘20’ and Alpha as ‘0.05’. Select the output range and click
‘OK’. Click ok.
For two tailed testing, highlight the one tailed p value and strikethrough the two tailed p value.
HYPOTHESIS:-
H0 : µ ≤ 20
H1 : µ > 20
DECISION RULE:
1) If t stat is greater than t critical, reject Null.
2) If P value < alpha, reject Null.
71
INFERENCE:-
Since p value is smaller than alpha therefore, reject null. i.e, H 1 : µ > 20, ie mean experience is greater
than 20
72
HYPOTHESIS - H0 : µ ≤ 20
H1 : µ > 20
DECISION RULE:
1) If t stat is greater than t critical, reject Null.
2) If P value < alpha, reject Null.
INFERENCE: P value is greater than null, so we’ll accept Null.
73
ONE SAMPLE T – TEST using Test average (TWO
TAILED)
PROBLEM STATEMENT:- To know if the average work experience of workers is less than 20 or
more than 20, taking test average as 20.
STEPS:-
1. Go to descriptive statistics> t test : two sample assuming unequal variances
74
2. Select the input and output range, take alpha as 0.05 and hypothesised mean difference as
0.
3. Hit OK.
HYPOTHESIS.
H0= time spent by full time students in studying statistics is not less than part time students
H1= time spent by full time students in studying statistics is less than part time students
DECISION RULE:- If p value is less than alpha, reject null, and vice versa.
INFERENCE:- since p value is more than alpha, so accept null. That is, time spent by full
time students in studying statistics is not less than part time students.
75
t-Test: Two-Sample Assuming Equal Variances
Independent samples
PROBLEM STATEMENT:-
To determine if there is a relation between marks in different subjects.
HYPOTHESIS:-
H0= There is no relation between marks in two subjects
H1= There exists a significant relation between marks scored in two subjects.
consider the following data
76
STEPS:-
1. Go to descriptive statistics > t test assuming equal variances
2. Select input range and output range. Select hypothesised mean difference as 0 and alpha
as 0.05
3. Click ok.
77
Repeat the steps for other combinations as well.
78
TWO SAMPLE : PAIRED SAMPLE T – TEST
RESEARCH PROBLEM : Determine whether the weight loss diet was effective or not, given the
weights before and after the diet.
BEFOR
E AFTER
162 168
170 136
184 147
164 159
172 143
176 161
159 143
170 145
Hypothesis :
H1 = µbefore - µafter ˂ 0
H0 = µbefore - µafter ≥ 0
Alpha : 0.05
Hypothesized mean difference : 0
79
Therefore, as per the rule, we here reject H0
= Diet was effective.
Z-TEST
A Z-test is a hypothesis test based on the Z-statistic, which follows the standard normal distribution
under the null hypothesis.
80
INFERENCE:- Since P value is more than alpha, accept null.
There is significant evidence that population mean age does not differ form 23
81
Alpha= .01
Sol:
The parameter to be tested is the difference between two means µ1- µ2
The hypotheses to be tested is that the mean annual net return from directly purchased mutual funds
(µ1) is larger than the mean of broker purchased funds. Hence the alternate hypotheses is
H0=µ1-µ2≤0
H1=µ1-µ2≥0
If z stat is greater than z critical (one tail) , reject null
If z stat is less than z critical one tail, accept null.
82
Null- directly purchased mutual funds do not outperform.
Alternate- directly purchased mutual funds outperform.
Inferences :- Since p value is less than alpha so Final answer is to reject null.
Ans- Directly performed mutual funds outperform brokers.
the value of the test statistic is 2.29. the one tail p value is 0.0110
We observe that the p value of the test is small (and the test statistics falls into the rejection region.)
As a result we conclude that there is sufficient evidence to infer that on average directly purchased
mutual funds outperform broker purchased mutual funds.
ANOVA TEST
Analysis of Variance (ANOVA) is a statistical method used to test differences between two or more
means.
ANOVA is used to test general rather than specific differences among means.
PROBLEM STATEMENT:
Here you can find the marks of students in economics, science or history. Determine whether
the Means of marks are equal or not.
83
NT MICS CE RY
A 42 69 35
B 53 54 40
C 49 58 53
D 53 64 42
E 43 64 50
Hypothesis Testing:
H1:At least one of the means is different.
H0 :μ1 = μ2 = μ3
STEPS:-
1. Go to data analysis > anova single factor
2. Put table as Input.
3. Keep alpha as 0.05. Click ok.
84
DECISION RULE:
1) If f stat is greater than f critical, reject Null.
2) If P value < alpha, reject Null.
INFERENCE:
F > F crit , So we will reject Null. This implies that mean marks of all subjects are not equal.
However, this does not tell us the subjects in which the mean marks are different, so for this we will
conduct 3 pairs of t-test assuming equal variances between each pair of subject so as to know the
subjects in which mean marks are different.
Problem statement:
To test whether or not marks of students differ with respect to student and subject both.
85
Hypothesis Testing:
H0 -Row wise: There is no significant difference in marks of students.
H0: There is no significant difference in marks for three subjects- Economics, Science and
History.
H1- Row wise: There is significant difference in marks of students.
H1: There is significant difference in marks for three subjects- Economics, Science and
History.
STEPS:-
1. Go to data analysis> anova: Two factor without replication
2. Put table as Input.
3. Keep alpha as 0.05. click ok.
86
Result:
B 3 147 49 61
D 3 159 53 121
Economics 5 240 48 28
ANOVA
Source of Variation SS df MS F P-value F crit
Rows 60.93333 4 15.23333 0.300263 0.869889 3.837853
Columns 872.1333 2 436.0667 8.595269 0.010172 4.45897
Error 405.8667 8 50.73333
Total 1338.933 14
87
DECISION RULE:
Inference:
Row wise:
Here, F Stat is 0.30 and F critical is 3.83, so Null hypothesis is accepted.
Here, P value is 0.8 which is greater than alpha (5%). Therefore, Null hypothesis is accepted.
Column wise:
Here, F Stat is (8.595) and F critical is (4.458), so Null hypothesis is rejected.
Here, P value is (0.10) which is less than alpha (5%). Therefore, Null hypothesis is rejected.
Conclusion:
Row wise: There is enough evidence that marks of student do not differ significantly.
Column wise: There is enough evidence that marks for three subjects- Economics, Science
and History differ.
Row Wise: There is no significant relation in the marks of the students.
Column Wise: There is significant relation in the marks for three subjects.
88
ANOVA: two factor with replication
Problem statement:-
To check if there is a significant relation between area and tests by using two factor anova.
STEPS:-
1. Go to data analysis > anova Two factor with replication
2. Put table as Input.
3. In rows, write total rows per sample.
4. Keep alpha as 0.05. click ok.
89
HYPOTHESIS:-
µ0 = there is no significant relation between area and tests by using two factor anova.
µ1= there is a significant relation between area and tests by using two factor anova.
Rules:-
1. If p value is less than alpha, reject null.
2. If f value is greater than 5 % , reject null.
Inference:-
Since p value is more than alpha, so accept null. That is, there is no significant relation
between area and tests by using two factor anova.
F TEST
The objective of the test to determine the likelihood of a value in a sample, given that the null
hypothesis is true . An F test is a statistical test that compares the variance of two samples so
as to test the hypothesis that the samples have been taken from populations with different
variance. Its basic purpose is to check for differences between sample variances.
90
Direct Broker Any statistical test in which the test statistic has an F distribution under
9.33 3.24 null hypothesis is called an F test. The F test distribution is named after
6.94 -6.76 R.A Fisher, the famous statistician.
16.17 12.8
16.97 11.1
5.94 2.73 Data
12.61 -0.13
3.33 18.22
16.13 -0.8
11.2 -5.75
1.14 2.59
4.68 3.71
3.09 13.15
7.26 11.05
2.05 -3.12
13.07 8.94
0.59 2.74
13.57 4.07
0.35 5.6
2.69 -0.85
18.45 -0.28
4.23 16.4
10.28 6.39
7.1 -1.9
-3.09 9.49
5.6 6.7
5.27 0.19
8.09 12.39
15.05 6.54
13.21 10.92
1.72 -2.15
14.69 4.36
-2.97 -11.07
10.37 9.24
-0.63 -2.67
-0.15 8.97
0.27 1.87
4.59 -1.53
6.38 5.23
-0.24 6.87
10.32 -1.69
10.29 9.43
Two sample - testing of Variance - F test - One
4.39 8.31 tailed
-2.06 -3.99
7.66 -4.44
10.83 8.63
14.48 7.06
4.8 1.57 91
13.12 -8.44
-6.54 -5.72
-1.06 6.95
PROBLEM STATEMENT : Can we conclude at 5% level that variance of returns of directly
purchased mutual funds is higher than mutual funds bought through brokers?
H0 : Variance of directly purchased mutual funds is (less than) equal to mutual funds bought through brokers.
H1 : Variance of directly purchased mutual funds is higher that mutual funds bought through brokers.
Rules:-
1. If p value is less than alpha, reject null.
2. If f value is greater than 5 % , reject null.
92
P value is higher than Alpha, so we will accept Null, ie :
Variance of directly purchased mutual funds is (less than) equal to mutual funds bought through brokers.
STEPS:-
1. Go to Data > Data analysis > F test; Two sample for variances
2. Select variable 1 range as direct and variable 2 range as broker
3. select alpha as 0.05
4. Hit enter
93
The value of test statistic is F = 0.86499. Excel outputs one-tail p value.
Because we are conducting a two tail test, we will double the p value one tail.
So : 2*0.30684438
INFERENCE:- P value is higher than Alpha, so we will accept Null, ie : There is no difference in
values.
FINAL ANSWER-
94
The Chi-square test is intended to test how likely it is that an observed distribution
is due to chance. It is also called a "goodness of fit" statistic, because it measures
how well the observed distribution of data fits with the distribution that is expected
if the variables are independent
STEPS : 1) Calculate the row total and column totals. Also find the grand total of all the totals.
2) Calculated expected for each observation through the following formula :
expected =(row total*column total)/table total
3) Calculate Observed – Expected
95
4) Calculate (O-E)^2/E.
5) Calculate the sum of (O-E)^2/E.
6) Calculate p value using the formula : “=chitest(A15:A26, B15:B26)” and then press ENTER
DECISION RULE :
HYPOTHESIS
96
Null : There is no association between brand preference and age.
Alternate : There is association between brand preference and age.
pvalue:- 0.768154
pvalue:
- 0.768154
97
INTRODUCTI
ON TO R
WHAT IS R ?
R is the most popular data analytics tool as it is open-source, flexible, offers multiple packages and
has a huge community. But apart from being used for analytics, R is also a programming language.
DOWNLOADING R
To download R, go to-
https://cran.r-project.org/bin/windows/base/
98
To Download R studio desktop for windows, go to-
www.rstudio.com
OR:-
99
3. Open the downloaded .exe file and Install R.
100
101
FOUR PANES IN R
When we open RStudio, we see the four panes. We can change the order of the windows
under RStudio preferences. We can also change their shape by either clicking the minimize
or maximize buttons on the top right of each panel, or by clicking and dragging the middle
of the borders of the windows.
102
The RStudio interface consists of four main panes, or windows.
Source –
Top left text editor or script window. This is where you can save and edit collections of
commands.
Console :
Bottom left: console or command window. Here you can type any valid R command after
the > prompt followed by Enter and R will execute that command.
Environment / History :-
Environment & history window. The environment window contains objects (data, values,
functions) R has currently stored in its memory. The history window shows all commands that were
executed in the console.
Bottom right: files, plots, packages, help, & viewer pane. Here you can open files, view
plots, install and load packages, read man pages, and view markdown and other documents in
the viewer tab.
2.In File tab, go to Import Dataset and select “From Excel” option.
103
3.A pop up window for Import Excel Data will appear.
4.Browse through various files and open the required excel file.
104
5.Click on Import after previewing the data.
105
TO ADD VARIABLE IN R:-
a<-c(5,6,24,16,17,10,23,11,17,3,21,18,18,12,12,17,10,3,7,13,23,9,22,8)
DESCRIPTIVE STATISTICS
Data:
Group
A
76
87
98
45
66
78
76
88
78
106
87
54
65
76
89
65
78
54
87
45
Mean 73.26315789
Standard Error 3.530023363
Median 76
Mode 76
Standard Deviation 15.38701511
Sample Variance 236.7602339
Kurtosis -
0.589566068
Skewness -
0.521029258
Range 53
Minimum 45
Maximum 98
Sum 1392
Count 19
Using R Studio:
Mean:
> mean(rm_lab$`Group A`)
[1] 73.26316
Median:
> median(rm_lab$`Group A`)
[1] 76
Standard Deviation:
> sd(rm_lab$`Group A`)
[1] 15.38702
107
Minimum:
> min(rm_lab$`Group A`)
[1] 45
Maximum:
> max(rm_lab$`Group A`)
[1] 98
Sum:
> sum(rm_lab$`Group A`)
[1] 1392
Range:
> range(rm_lab$`Group A`)
[1] 45 98
Sample Variance:
> var(rm_lab$`Group A`)
[1] 236.7602
CORRELATION
To find correlation between mutual funds purchased by brokers and purchased directly
SYNTAX:-
cor.test(direct_broker$Direct,direct_broker$Broker
108
Type the command in the console
Hit enter.
109
Pearson's product-moment correlation
SYNTAX
> t.test(rm_lab$`FULL TIME`,rm_lab$`PART TIME`)
Rules:-
1. If p value is less than alpha, reject null.
2. If f value is greater than 5 % , reject null.
110
Following is the output received:-
INFERENCE
P value is greater than alpha so accept null, i.e Variance of directly purchased mutual
funds is (less than) equal to mutual funds bought through brokers
111
HYPOTHESIS
H0 : Variance of directly purchased mutual funds is (less than) equal to mutual funds bought through brokers.
H1 : Variance of directly purchased mutual funds is higher that mutual funds bought through brokers.
SYNTAX
t.test(rm_lab$`FULL TIME`,rm_lab$`PART TIME`,alternative = "greater")
Rules:-
1. If p value is less than alpha, reject null.
2. If f value is greater than 5 % , reject null.
112
Welch Two Sample t-test
INFERENCE:- Since p value is more than alpha so accept null. i.e, Variance of directly purchased
mutual funds is (less than) equal to mutual funds bought through brokers
113
Alternative:-
Rules:-
1. If p value is less than alpha, reject null.
2. If f value is greater than 5 % , reject null.
114
INFERENCE:-
Since p value is greater than alpha so accept null. i.e, Variance of directly purchased mutual
funds is (less than) equal to mutual funds bought through brokers.
115
Independent Paired t-test
Problem:To analyse that the time spent by full time students in studying statistics is different
as time spent by part time students.
HYPOTHESIS.
H0= time spent by full time students in studying statistics is not less than part time students
H1= time spent by full time students in studying statistics is less than part time students
DECISION RULE:- If p value is less than alpha, reject null, and vice versa.
H0 = µ is not equal to 20
116
H1 = µ is equal to 20
SYNTAX:
> t.test(a,mu=20)
DECISION RULE:-
if p value is less than alpha, accept null. If p value is greater than alpha, reject null.
data: a
t = -4.8471, df = 23, p-value = 6.817e-05
alternative hypothesis: true mean is not equal to 20
95 percent confidence interval:
10.78539 16.29794
sample estimates:
mean of x
13.54167
INFERENCE: - p value is less than alpha so reject null. Therefore mean is equal to 20.
117
One sample t test (two tailed)
PROBLEM STATEMENT:- To establish that the mean work experience is greater than 20
HYPOTHESIS
H0 = µ is not equal to 20
H1 = µ is equal to 20
SYNTAX:-
data: a
t = -4.8471, df = 23, p-value = 1
alternative hypothesis: true mean is greater than 20
95 percent confidence interval:
11.25811 Inf
sample estimates:
mean of x
13.54167
DECISION RULE:-
if p value is less than alpha, accept null. If p value is greater than alpha, reject null.
INFERENCE:-
P value is greater than alpha, so accept null .
118
By default conf level= 95%
To change, syntax
- (a,mu=20,conf.level=0.99)
119
Adding Data:-
SYNTAX:-
Group1=c(42,53,49,53,43,44,45,52,54)
> Group2=c(69,54,58,64,64,55,56,0,0)
> Group3=c(35,40,53,42,50,39,55,39,40)
> combinedgroup=data.frame(cbind(Group1,Group2,Group3))
> summary(combinedgroup)
>stack(combinedgroup)
120
To run one way anova test, type the following command
> stackedgroup=stack(combinedgroup)
> anovaresults=aov(values~ind,data=stackedgroup)
> summary(anovaresults)
Hypothesis Testing:
H1:At least one of the means is different.
H0:μ1 = μ2 = μ3
Following output is received:-
Inference:
Since F stat (0.189) is less than F-critical (3.402). Therefore, accept Null hypothesis.
Since P (0.828) is greater than alpha (0.05). Therefore, accept Null hypothesis.
Conclusion:
The means of the population are equal.
121
F TEST( One tailed)
PROBLEM STATEMENT:- Determine whether or not there is a significant difference
between variances of two data sets.
Syntax
Var.test(file_name)$variable,file_name$variable
HYPOTHESIS:-
122
H0- True ration of variances is equal to 1
H1- true ratio of variances is not equal to 1
INFERENCE:-
P value is greater than alpha, so accept null . i.e,
Variance of directly purchased mutual funds is (less than) equal to mutual funds bought through brokers.
HYPOTHESIS TESTING
H0 : Variance of directly purchased mutual funds is (less than) equal to mutual funds bought through brokers.
H1 : Variance of directly purchased mutual funds is higher that mutual funds bought through brokers.
DECISION RULE
If p value is less than alpha so reject null and vice versa
SYNTAX
var.test(direct_broker$Direct,direct_broker$Broker,alternative = "less")
123
Following is the OUTPUT received:-
INFERENCE:-
P value is greater than alpha, so accept null . i.e,
Variance of directly purchased mutual funds is (less than) equal to mutual funds bought through brokers.
SYNTAX
> age=rbind(c(65,76,72),c(60,40,64),c(45,52,50),c(55,65,60))
> dimnames(age)<-list(agegroup=c("a","b","c","d"),brand=c("1","2","3"))
> age
Hypothesis Testing:
H0: Not Associated
H1: Associated
Decision Rule:
If chi value is greater than table value reject null.
If p value is less than α then reject null.
125
Following output is received:
data: age
X-squared = 7.3726, df = 6, p-value = 0.2878
Since p value is greater than alpha, so accept null. That is, there is no association between age group
and brand preference.
126
127