0% found this document useful (0 votes)
442 views

Knime Project Report

The document is a lab manual report for a project that aims to predict diabetes using machine learning models. It includes: 1) An overview of the diabetes prediction problem using data mining techniques and the dataset containing 768 patients. 2) The analysis outlines the hardware and software requirements. 3) The design section describes the data preprocessing steps and models tested - KNN, random forest, decision tree. 4) The results show the decision tree model provided the best accuracy of 78% for predicting diabetes, which could help medical professionals.

Uploaded by

Ansh Rohatgi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
442 views

Knime Project Report

The document is a lab manual report for a project that aims to predict diabetes using machine learning models. It includes: 1) An overview of the diabetes prediction problem using data mining techniques and the dataset containing 768 patients. 2) The analysis outlines the hardware and software requirements. 3) The design section describes the data preprocessing steps and models tested - KNN, random forest, decision tree. 4) The results show the decision tree model provided the best accuracy of 78% for predicting diabetes, which could help medical professionals.

Uploaded by

Ansh Rohatgi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Business Intelligence and Data Visualization Lab

Manual
CSL 232

Knime Project Report

Faculty name: Dr. Poonam Chaudhary

Student name: Ansh and Rishabh

Roll No.: 20csu169 & 20csu373

Semester: 5th

Group: DS B

Department of Computer Science and Engineering


The NorthCap University, Gurugram- 122001, India
Session 2022-23
DD

BIDV Lab Manual (CSL 232) | 1


2022-23

Table of Contents
S.No Page No.

1. Project Description 2

2. Problem Statement
3
3. Analysis

3.1 Hardware Requirements

3.2 Software Requirements 3


4. Design 3

5. Implementation and Testing (stage/module wise) 4

6. Output (Screenshots) 5

7. Conclusion and Future Scope 10


DD

BIDV Lab Manual (CSL 232) | 2


2022-23
1. Project Description
Diabetes is a common, chronic disease. Prediction of diabetes at an early stage
can lead to improved treatment. Data mining techniques are widely used for
prediction of disease at an early stage. In this research paper, diabetes is
predicted using significant attributes, and the relationship of the differing
attributes is also characterized.. Significant attributes selection was done via the
principal component analysis method. Our findings indicate a strong
association of diabetes with body mass index (BMI) and with glucose level,
which was extracted via the Apriori method. K nearest (KNN), random
forest (RF) and K-means clustering techniques were implemented for the
prediction of diabetes. The Decision Tree model provided a best accuracy of
75.7%, and may be useful to assist medical professionals with treatment
decisions.

About Dataset

The dataset contains 768 rows and 9 columns, some of which are Glucose, Insulin,
Pregnancies, BMI and Outcome. Given with these details we have to predict
whether the Patient is Diabetic or not .
DD

BIDV Lab Manual (CSL 232) | 3


2022-23
2. Problem Statement:
Predicting whether the person is Diabetic or not using supervised learning
model like KNN, Decision Tree for the optimized result and accuracy.

3. Analysis

3.1. Hardware Requirements


A 64-bit operating system with at least 32GB RAM and 8 CPU cores as minimum

3.2. Software Requirements


Knime analytics platform

4. Design
The following steps were taken to get the best model accuracy:

 Importing excel dataset


 Removing unnecessary columns
 Removing duplicate rows
 Normalizing the dataset
 Splitting data into train and test data
 Using model learner
 Model prediction
 Checking model accuracy
DD

BIDV Lab Manual (CSL 232) | 4


2022-23
5. Implementation and Testing (stage/module wise)

a) Excel Reader
Reading the excel file using this node.

b) Column Filter
Removing unnecessary columns

c) Normalizer
Normalizing the data using min-max normalization

d) Partitioning
Dividing the dataset into two parts: 80% of training data and 20% of test data

e) Logistic Learner (Regression)


Applying random forest technique on the training dataset to train the model. The EPI
score is taken as the target variable.

f) Decision Tree Learner (Regression)


Applying model to the test data.

g) Numeric Scorer-applied to both


Finding the accuracy of the model
DD

BIDV Lab Manual (CSL 232) | 5


2022-23
6. Output (Screenshots)
File Table
DD

BIDV Lab Manual (CSL 232) | 6


2022-23

Normalized table

Partitioning
DD

BIDV Lab Manual (CSL 232) | 7


2022-23

-test data
DD

BIDV Lab Manual (CSL 232) | 8


2022-23
DD

BIDV Lab Manual (CSL 232) | 9


2022-23

Statistics:

Random Forest Learner


DD

BIDV Lab Manual (CSL 232) | 10


2022-23
DD

BIDV Lab Manual (CSL 232) | 11


2022-23

7. Conclusion

Firstly, we applied both the techniques (Logistic Regression and Decision Tree)
on our dataset without normalization. The accuracy was:

After normalization, the accuracy changed to:


Logistic Learner: 76%
Decision Tree: 78%

We can clearly see from the above accuracy scores that Decision Tree is better.

You might also like