0% found this document useful (0 votes)

442 views

Knime Project Report

The document is a lab manual report for a project that aims to predict diabetes using machine learning models. It includes: 1) An overview of the diabetes prediction problem using data mining techniques and the dataset containing 768 patients. 2) The analysis outlines the hardware and software requirements. 3) The design section describes the data preprocessing steps and models tested - KNN, random forest, decision tree. 4) The results show the decision tree model provided the best accuracy of 78% for predicting diabetes, which could help medical professionals.

Uploaded by

Ansh Rohatgi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

442 views

Knime Project Report

Uploaded by

Ansh Rohatgi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Business Intelligence and Data Visualization Lab

Manual
CSL 232

Knime Project Report

Faculty name: Dr. Poonam Chaudhary

Student name: Ansh and Rishabh

Roll No.: 20csu169 & 20csu373

Semester: 5th

Group: DS B

Department of Computer Science and Engineering

The NorthCap University, Gurugram- 122001, India
Session 2022-23
DD

BIDV Lab Manual (CSL 232) | 1

2022-23

Table of Contents
S.No Page No.

1. Project Description 2

2. Problem Statement
3
3. Analysis

3.1 Hardware Requirements

3.2 Software Requirements 3

4. Design 3

5. Implementation and Testing (stage/module wise) 4

6. Output (Screenshots) 5

7. Conclusion and Future Scope 10

BIDV Lab Manual (CSL 232) | 2

2022-23
1. Project Description
Diabetes is a common, chronic disease. Prediction of diabetes at an early stage
can lead to improved treatment. Data mining techniques are widely used for
prediction of disease at an early stage. In this research paper, diabetes is
predicted using significant attributes, and the relationship of the differing
attributes is also characterized.. Significant attributes selection was done via the
principal component analysis method. Our findings indicate a strong
association of diabetes with body mass index (BMI) and with glucose level,
which was extracted via the Apriori method. K nearest (KNN), random
forest (RF) and K-means clustering techniques were implemented for the
prediction of diabetes. The Decision Tree model provided a best accuracy of
75.7%, and may be useful to assist medical professionals with treatment
decisions.

About Dataset

The dataset contains 768 rows and 9 columns, some of which are Glucose, Insulin,
Pregnancies, BMI and Outcome. Given with these details we have to predict
whether the Patient is Diabetic or not .
DD

BIDV Lab Manual (CSL 232) | 3

2022-23
2. Problem Statement:
Predicting whether the person is Diabetic or not using supervised learning
model like KNN, Decision Tree for the optimized result and accuracy.

3. Analysis

3.1. Hardware Requirements

A 64-bit operating system with at least 32GB RAM and 8 CPU cores as minimum

3.2. Software Requirements

Knime analytics platform

4. Design
The following steps were taken to get the best model accuracy:

 Importing excel dataset

 Removing unnecessary columns
 Removing duplicate rows
 Normalizing the dataset
 Splitting data into train and test data
 Using model learner
 Model prediction
 Checking model accuracy
DD

BIDV Lab Manual (CSL 232) | 4

2022-23
5. Implementation and Testing (stage/module wise)

a) Excel Reader
Reading the excel file using this node.

b) Column Filter
Removing unnecessary columns

c) Normalizer
Normalizing the data using min-max normalization

d) Partitioning
Dividing the dataset into two parts: 80% of training data and 20% of test data

e) Logistic Learner (Regression)

Applying random forest technique on the training dataset to train the model. The EPI
score is taken as the target variable.

f) Decision Tree Learner (Regression)

Applying model to the test data.

g) Numeric Scorer-applied to both

Finding the accuracy of the model
DD

BIDV Lab Manual (CSL 232) | 5

2022-23
6. Output (Screenshots)
File Table
DD

BIDV Lab Manual (CSL 232) | 6

2022-23

Normalized table

Partitioning
DD

BIDV Lab Manual (CSL 232) | 7

2022-23

-test data
DD

BIDV Lab Manual (CSL 232) | 8

2022-23
DD

BIDV Lab Manual (CSL 232) | 9

2022-23

Statistics:

Random Forest Learner

BIDV Lab Manual (CSL 232) | 10

2022-23
DD

BIDV Lab Manual (CSL 232) | 11

2022-23

7. Conclusion

Firstly, we applied both the techniques (Logistic Regression and Decision Tree)
on our dataset without normalization. The accuracy was:

After normalization, the accuracy changed to:

Logistic Learner: 76%
Decision Tree: 78%

We can clearly see from the above accuracy scores that Decision Tree is better.

ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Polysynthi Manual PDF
No ratings yet
Polysynthi Manual PDF
19 pages
SQL Project Questions
0% (1)
SQL Project Questions
3 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Iso 13709
0% (4)
Iso 13709
120 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
FRA Extended
No ratings yet
FRA Extended
22 pages
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
Project Questions
No ratings yet
Project Questions
4 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
Clustering Project
100% (1)
Clustering Project
44 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
Tushar Tukaram Bhakare: Education Skills
No ratings yet
Tushar Tukaram Bhakare: Education Skills
1 page
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
TSF - Project
100% (1)
TSF - Project
5 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
Extended Project
No ratings yet
Extended Project
1 page
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
Tanaya - Lokhande - Advance Statistic Business Report
No ratings yet
Tanaya - Lokhande - Advance Statistic Business Report
24 pages
Project-Time Series Forecasting
100% (1)
Project-Time Series Forecasting
10 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Questions
No ratings yet
Questions
3 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Time Series
67% (3)
Time Series
34 pages
Great Learning Predictive Modelling Project
No ratings yet
Great Learning Predictive Modelling Project
12 pages
Color: Due On Sunday June 7th, by 11:59PM
No ratings yet
Color: Due On Sunday June 7th, by 11:59PM
2 pages
ML Quiz 3
No ratings yet
ML Quiz 3
2 pages
MySQL - Week 5 Quiz
100% (1)
MySQL - Week 5 Quiz
6 pages
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
No ratings yet
Quiz 3 Name: Kainat Iftikhar Reg# 2021630007 1. List Three Examples of Time Series Data. Time Series Data
2 pages
PREDICTIVE MODELING
No ratings yet
PREDICTIVE MODELING
21 pages
ML Quiz 2
No ratings yet
ML Quiz 2
1 page
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Prediction of Diabetes Using R
No ratings yet
Prediction of Diabetes Using R
6 pages
Knime - Project Report
No ratings yet
Knime - Project Report
11 pages
BIDV Project Rishabh and Ansh
No ratings yet
BIDV Project Rishabh and Ansh
21 pages
Ansh20csu169bi DV
No ratings yet
Ansh20csu169bi DV
70 pages
Not Allotted Students List
No ratings yet
Not Allotted Students List
252 pages
Huawei Mediapad m5 10.8inch Ръководство За Потребителя (Cmr-Al09, 01, Neu)
No ratings yet
Huawei Mediapad m5 10.8inch Ръководство За Потребителя (Cmr-Al09, 01, Neu)
6 pages
840Dsl_TCU30_3_equip_man_0323_en-US
No ratings yet
840Dsl_TCU30_3_equip_man_0323_en-US
92 pages
Nirmal K
No ratings yet
Nirmal K
2 pages
Task 4 (2000 Words)
No ratings yet
Task 4 (2000 Words)
10 pages
Lesson 1 Week 1
No ratings yet
Lesson 1 Week 1
12 pages
Computed Tomography
100% (1)
Computed Tomography
80 pages
FAIR Open Course - Module 02 - The FAIR Model
No ratings yet
FAIR Open Course - Module 02 - The FAIR Model
67 pages
Siddhartha Pradhan: Professional Summery
No ratings yet
Siddhartha Pradhan: Professional Summery
2 pages
Astral Column Pipe Pricelist
No ratings yet
Astral Column Pipe Pricelist
4 pages
CSC CC 3rd Ed Revised - Final
No ratings yet
CSC CC 3rd Ed Revised - Final
247 pages
Pokémon Pinball (USA)
No ratings yet
Pokémon Pinball (USA)
26 pages
Arithmetic Sequences
No ratings yet
Arithmetic Sequences
3 pages
Mib IRD-2900 - SNMP - Rev. - 4.6
No ratings yet
Mib IRD-2900 - SNMP - Rev. - 4.6
196 pages
Category A
No ratings yet
Category A
5 pages
Temenos UXP R19 (v7.0) Release
No ratings yet
Temenos UXP R19 (v7.0) Release
20 pages
l2cp-tunneling-mef
No ratings yet
l2cp-tunneling-mef
4 pages
QSC RMX 850 RMX 1450 RMX 2450
No ratings yet
QSC RMX 850 RMX 1450 RMX 2450
42 pages
Conputer Care and Maintenance
No ratings yet
Conputer Care and Maintenance
30 pages
Talk Epitech Nov13
No ratings yet
Talk Epitech Nov13
54 pages
All N E W Transistor: Electronic Data Processing System
No ratings yet
All N E W Transistor: Electronic Data Processing System
6 pages
Session 1.8
No ratings yet
Session 1.8
22 pages
Word2013 TextWrap Practice
No ratings yet
Word2013 TextWrap Practice
2 pages
Assignment 5.1
No ratings yet
Assignment 5.1
8 pages
Associate Cloud Engineer Sample Questions
No ratings yet
Associate Cloud Engineer Sample Questions
16 pages
Xavier Jouve CV PDF
No ratings yet
Xavier Jouve CV PDF
2 pages
B&R Automation Studio Target For Simulink
No ratings yet
B&R Automation Studio Target For Simulink
76 pages
2022-2023 Cit DPT
No ratings yet
2022-2023 Cit DPT
8 pages
Topic: Completing Business Messages: Evaluating The First Draft
No ratings yet
Topic: Completing Business Messages: Evaluating The First Draft
5 pages

Knime Project Report

Uploaded by

Knime Project Report

Uploaded by

Business Intelligence and Data Visualization Lab

Knime Project Report

Faculty name: Dr. Poonam Chaudhary

Student name: Ansh and Rishabh

Roll No.: 20csu169 & 20csu373

Department of Computer Science and Engineering

BIDV Lab Manual (CSL 232) | 1

3.1 Hardware Requirements

3.2 Software Requirements 3

5. Implementation and Testing (stage/module wise) 4

7. Conclusion and Future Scope 10

BIDV Lab Manual (CSL 232) | 2

BIDV Lab Manual (CSL 232) | 3

3.1. Hardware Requirements

3.2. Software Requirements

 Importing excel dataset

BIDV Lab Manual (CSL 232) | 4

e) Logistic Learner (Regression)

f) Decision Tree Learner (Regression)

g) Numeric Scorer-applied to both

BIDV Lab Manual (CSL 232) | 5

BIDV Lab Manual (CSL 232) | 6

BIDV Lab Manual (CSL 232) | 7

BIDV Lab Manual (CSL 232) | 8

BIDV Lab Manual (CSL 232) | 9

Random Forest Learner

BIDV Lab Manual (CSL 232) | 10

BIDV Lab Manual (CSL 232) | 11

After normalization, the accuracy changed to:

You might also like