Exploring, Transforming, And Summarizing Input Datasets for Building Classification Models

Uploaded by

mrh943213

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Exploring, Transforming, And Summarizing Input Datasets for Building Classification Models

Uploaded by

mrh943213

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

INSTITUTE - UIE

DEPARTMENT- ACADEMIC UNIT-2

Bachelor of Engineering (Computer Science &
Engineering)
SUBJECT NAME:- IT, HW AND AI
WORKSHOP
SUBJECT CODE- 24ECP-102
Prepared By: Dr. Rachit Manchanda
Exploring, Transforming, and Summarizing Input DISCOVER . LEARN . EMPOWER
Datasets for Building Classification Models
1
Course Objectives
S. No. Objectives

1 To develop an understanding of the building blocks of AI.

2 To aware about Data Science/Analytics.

To provide knowledge about data processing.

4 To make familiar with AIML Algorithms.

5 To give brief knowledge about IT, HW AND AI.

2
Course Outcomes
CO Title Level
Number

CO1 Recognise the characteristics of disruptive technologies and Remember

understand building blocks of data science, artificial
intelligence, and machine learning.
CO2 Describe AI/ML algorithms and techniques to demonstrate Understand
its applications.
CO3 Experiment with effective data visualizations, and explain Apply
how to work with data through the entire data science
process.
CO4 Analyse and evaluate solutions to address real time problems Analyze and
using AI/ML for different applications. evaluate
CO5 Design, formulate and integrate in a team that can propose, a Create
solution for their selected domain.

3
Exploring, Transforming, and
Summarizing Input Datasets for
Building Classification Models
A Comprehensive Guide

4
Machine Learning
• Machine learning (ML) is a subdomain of artificial intelligence (AI) that focuses on developing
systems that learn or improve performance, based on the data.
• Artificial intelligence is a broad word that refers to systems or machines that resemble human
intelligence.
• A crucial distinction is that, while all machine learning is AI, not all AI is machine learning. We
mainly use machine learning to achieve AI.
Features of Machine Learning
• Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed.
• It is similar to data mining, as both deal with substantial amounts of data.
• For large organizations, branding is crucial, and targeting a relatable customer base becomes easier.
• Given a dataset, ML can detect various patterns in the data.
• Machines can learn from past data and automatically improve their performance.
• Machine learning is a data-driven technology. A large amount of data is generated by organizations
daily, enabling them to identify notable relationships and make better decisions.
5
6
7
8
9
Introduction to Classification Models
• Classification models are used to categorize data into predefined classes or categories.
• Common algorithms: Logistic Regression, Decision Trees, Random Forest, k-Nearest
Neighbors (KNN), etc.
• Building a classification model requires a good understanding of the dataset before
training.

10
Dataset Exploration - The First Step
• Exploration is essential for understanding the dataset, identifying potential issues, and
gaining insights. Key steps include: Checking for missing values
• Exploring basic statistics
• Visualizing the data
• Libraries used: Pandas, Matplotlib, Seaborn

11
Data Import and Initial Inspection

12
Data Cleaning and Transformation

13
Feature Engineering
• Feature engineering improves model accuracy by creating new features or modifying
existing ones. Common techniques: Feature scaling (e.g., normalization, standardization)
• Creating interaction features
• Polynomial features

14
Summarizing Data - Descriptive Statistics
• Use Descriptive statistics to summarize the dataset.

• This will provide statistics such as mean, standard deviation, min, max, and quartiles for
numerical features.
• Visualize data distribution using histograms or box plots.

15
Data Visualization for Classification
• Visualization helps in understanding the distribution and relationships of the data.
Examples of plots to visualize data:
• Histograms for feature distributions
• Pair plots for visualizing relationships
• Box plots for detecting outliers

16
Splitting Data into Training and Test Sets
• Before building a classification model, split the data into training and test sets to evaluate
model performance.
• Use train_test_split from scikit-learn:

17
Summary of Key Steps
• Exploring: Load and inspect the dataset, check for missing values and basic statistics.
• Transforming: Handle missing values, encode categorical features, and scale data.
• Summarizing: Use descriptive statistics and visualization to understand data distribution
and relationships.
• Model Training: Split the data into training and test sets before training the classification
model.
• Next Steps: Apply a classification algorithm (e.g., Logistic Regression, Decision Trees,
etc.) to train the model on the processed data.

18
Learning Outcomes
On completion of the experiment students will be able to understand:-
Understanding Data Preprocessing
• Students will learn the importance of cleaning, transforming, and preparing data to ensure accurate and efficient
model performance.
Proficiency in Dataset Exploration
• Learners will acquire the ability to inspect datasets, identify patterns, and detect issues such as missing values,
outliers, and inconsistencies.
Applying Data Transformation Techniques
• Participants will gain hands-on experience in applying techniques like feature scaling, encoding categorical
variables, and handling missing data.
Visualizing and Summarizing Data
• Students will understand how to summarize datasets using statistical metrics and visualize distributions and
relationships to derive insights.
Preparing Data for Classification Models
• Learners will be able to split datasets into training and test sets effectively and prepare them for classification tasks
by applying necessary transformations.

19
Viva Voice Questions
• What are Different Types of Machine Learning algorithms?
• What is Supervised Learning?
• What is Unsupervised Learning?
• What is ‘training Set’ and ‘test Set’ in a Machine Learning Model?
• How Much Data Will You Allocate for Your Training, Validation, and Test Sets?

20
THANK YOU

For queries
Email: [email protected]

Final Exam STA 36-200/36-247 (Reasoning With Data / Statistics For Lab Sciences) Spring 2020
0% (1)
Final Exam STA 36-200/36-247 (Reasoning With Data / Statistics For Lab Sciences) Spring 2020
18 pages
Stats Test #3 Word Cheat Sheet
No ratings yet
Stats Test #3 Word Cheat Sheet
3 pages
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
No ratings yet
(A) What Is Machine Learning? Explain The Impact of Various Machine Learning Techniques in Today's World
6 pages
Building a Classification Model Using Different Machine Learning Algorithms
No ratings yet
Building a Classification Model Using Different Machine Learning Algorithms
19 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
machineLearning-unit1
No ratings yet
machineLearning-unit1
9 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Unit_I_1
No ratings yet
Unit_I_1
203 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
ML Interactively
No ratings yet
ML Interactively
273 pages
Module_-1
No ratings yet
Module_-1
9 pages
Lecture 2 Unit 1
No ratings yet
Lecture 2 Unit 1
60 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
EPS DL Handout1 Introduction Compressed
No ratings yet
EPS DL Handout1 Introduction Compressed
46 pages
Lesson 4 -Introduction Machine Learning
No ratings yet
Lesson 4 -Introduction Machine Learning
44 pages
CBSYLLABUS BDA 1
No ratings yet
CBSYLLABUS BDA 1
4 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
mlintro-2
No ratings yet
mlintro-2
28 pages
Research Trends in Machine Learning: Muhammad Kashif Hanif
No ratings yet
Research Trends in Machine Learning: Muhammad Kashif Hanif
80 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
10 Machine Learning
No ratings yet
10 Machine Learning
9 pages
AI Learning
No ratings yet
AI Learning
19 pages
Manual Data
No ratings yet
Manual Data
13 pages
Class10-Introduction_to_ML
No ratings yet
Class10-Introduction_to_ML
32 pages
An Enlightenment To Machine Learning - Resp
No ratings yet
An Enlightenment To Machine Learning - Resp
22 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Data Science Intro Mulawarman
No ratings yet
Data Science Intro Mulawarman
89 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
SWE 227 Slide 01
No ratings yet
SWE 227 Slide 01
21 pages
Diya Basera
No ratings yet
Diya Basera
15 pages
Chapter 2 Preparing To Model
No ratings yet
Chapter 2 Preparing To Model
49 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
Book Summary
No ratings yet
Book Summary
35 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Pa 2
No ratings yet
Pa 2
13 pages
ML 1
No ratings yet
ML 1
79 pages
Advance ML - Unit 1
No ratings yet
Advance ML - Unit 1
12 pages
Report Print
No ratings yet
Report Print
22 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
49 pages
Machine Learning in New
No ratings yet
Machine Learning in New
13 pages
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
No ratings yet
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
102 pages
Machine: Learning ATO Z - I
No ratings yet
Machine: Learning ATO Z - I
131 pages
L2 - SLM Notes (Pre-Processing)
No ratings yet
L2 - SLM Notes (Pre-Processing)
37 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Major Project
No ratings yet
Major Project
20 pages
Machine Learning
100% (2)
Machine Learning
104 pages
basant vt
No ratings yet
basant vt
36 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Lecture 1
No ratings yet
Lecture 1
21 pages
MLUnit_1
No ratings yet
MLUnit_1
131 pages
ML_DA
No ratings yet
ML_DA
55 pages
AI-900 - Fundamental Principles of ML
No ratings yet
AI-900 - Fundamental Principles of ML
55 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
11 85-95PNJ Pengaruh+Customer+Experience+dan+Rating+Pengguna+Aplikasi+GrabFood+Terhadap+Repurchase+IntentionTri+Wahyuningsih1,+Kadunci2,+Riza+Hadikusuma Compressed
No ratings yet
11 85-95PNJ Pengaruh+Customer+Experience+dan+Rating+Pengguna+Aplikasi+GrabFood+Terhadap+Repurchase+IntentionTri+Wahyuningsih1,+Kadunci2,+Riza+Hadikusuma Compressed
11 pages
Question Set - Forecasting (FMA)
No ratings yet
Question Set - Forecasting (FMA)
9 pages
Solutions 5
No ratings yet
Solutions 5
6 pages
Descriptive and Inferential Statistics, VOL 1, NO 1
No ratings yet
Descriptive and Inferential Statistics, VOL 1, NO 1
14 pages
Module 4 (Forecasting)
No ratings yet
Module 4 (Forecasting)
42 pages
TPJC JC 2 H2 Maths 2011 Mid Year Exam Solutions
No ratings yet
TPJC JC 2 H2 Maths 2011 Mid Year Exam Solutions
13 pages
Business Statistics - Session 9
No ratings yet
Business Statistics - Session 9
60 pages
Gap Statistic
No ratings yet
Gap Statistic
32 pages
Final Examination in Educ-Pa 502
No ratings yet
Final Examination in Educ-Pa 502
3 pages
L3 Demo - Building A Linear Regression
No ratings yet
L3 Demo - Building A Linear Regression
60 pages
Data and Web Mining (COMP 4008)
No ratings yet
Data and Web Mining (COMP 4008)
8 pages
Pengaruh Penentuan Lokasi Terhadap Kesuksesan Usah
No ratings yet
Pengaruh Penentuan Lokasi Terhadap Kesuksesan Usah
12 pages
Output SPSS Balita
No ratings yet
Output SPSS Balita
14 pages
MCQS Introduction To Statistical Theory MSC 4TH
No ratings yet
MCQS Introduction To Statistical Theory MSC 4TH
19 pages
3.1. Hypergeometric Distribution
0% (1)
3.1. Hypergeometric Distribution
4 pages
Exercises Dobson
0% (1)
Exercises Dobson
3 pages
Revision 7
No ratings yet
Revision 7
2 pages
2022 Final
No ratings yet
2022 Final
10 pages
2547101-MBA-Integrated-SUMMER-2022
No ratings yet
2547101-MBA-Integrated-SUMMER-2022
2 pages
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
No ratings yet
Linear Combination of Random Variables: E (X) and Var (X) of Modified Random Variable
2 pages
S1 Specimen MS
No ratings yet
S1 Specimen MS
6 pages
Weir 2005 JSCR Reliability PDF
No ratings yet
Weir 2005 JSCR Reliability PDF
10 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Final Exam - Sample Test
No ratings yet
Final Exam - Sample Test
6 pages
1 PB PDF
No ratings yet
1 PB PDF
115 pages
Tong Hop Cong Thuc SB
No ratings yet
Tong Hop Cong Thuc SB
10 pages
Beyond Significance Testing Statistics Reform in The Behavioral Sciences
50% (2)
Beyond Significance Testing Statistics Reform in The Behavioral Sciences
361 pages
Final exam review
No ratings yet
Final exam review
3 pages