0% found this document useful (0 votes)
3 views

Exploring, Transforming, And Summarizing Input Datasets for Building Classification Models

Uploaded by

mrh943213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Exploring, Transforming, And Summarizing Input Datasets for Building Classification Models

Uploaded by

mrh943213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

INSTITUTE - UIE

DEPARTMENT- ACADEMIC UNIT-2


Bachelor of Engineering (Computer Science &
Engineering)
SUBJECT NAME:- IT, HW AND AI
WORKSHOP
SUBJECT CODE- 24ECP-102
Prepared By: Dr. Rachit Manchanda
Exploring, Transforming, and Summarizing Input DISCOVER . LEARN . EMPOWER
Datasets for Building Classification Models
1
Course Objectives
S. No. Objectives

1 To develop an understanding of the building blocks of AI.

2 To aware about Data Science/Analytics.

To provide knowledge about data processing.


3

4 To make familiar with AIML Algorithms.

5 To give brief knowledge about IT, HW AND AI.

2
Course Outcomes
CO Title Level
Number

CO1 Recognise the characteristics of disruptive technologies and Remember


understand building blocks of data science, artificial
intelligence, and machine learning.
CO2 Describe AI/ML algorithms and techniques to demonstrate Understand
its applications.
CO3 Experiment with effective data visualizations, and explain Apply
how to work with data through the entire data science
process.
CO4 Analyse and evaluate solutions to address real time problems Analyze and
using AI/ML for different applications. evaluate
CO5 Design, formulate and integrate in a team that can propose, a Create
solution for their selected domain.

3
Exploring, Transforming, and
Summarizing Input Datasets for
Building Classification Models
A Comprehensive Guide

4
Machine Learning
• Machine learning (ML) is a subdomain of artificial intelligence (AI) that focuses on developing
systems that learn or improve performance, based on the data.
• Artificial intelligence is a broad word that refers to systems or machines that resemble human
intelligence.
• A crucial distinction is that, while all machine learning is AI, not all AI is machine learning. We
mainly use machine learning to achieve AI.
Features of Machine Learning
• Machine Learning is the field of study that gives computers the capability to learn without being
explicitly programmed.
• It is similar to data mining, as both deal with substantial amounts of data.
• For large organizations, branding is crucial, and targeting a relatable customer base becomes easier.
• Given a dataset, ML can detect various patterns in the data.
• Machines can learn from past data and automatically improve their performance.
• Machine learning is a data-driven technology. A large amount of data is generated by organizations
daily, enabling them to identify notable relationships and make better decisions.
5
6
7
8
9
Introduction to Classification Models
• Classification models are used to categorize data into predefined classes or categories.
• Common algorithms: Logistic Regression, Decision Trees, Random Forest, k-Nearest
Neighbors (KNN), etc.
• Building a classification model requires a good understanding of the dataset before
training.

10
Dataset Exploration - The First Step
• Exploration is essential for understanding the dataset, identifying potential issues, and
gaining insights. Key steps include: Checking for missing values
• Exploring basic statistics
• Visualizing the data
• Libraries used: Pandas, Matplotlib, Seaborn

11
Data Import and Initial Inspection

12
Data Cleaning and Transformation

13
Feature Engineering
• Feature engineering improves model accuracy by creating new features or modifying
existing ones. Common techniques: Feature scaling (e.g., normalization, standardization)
• Creating interaction features
• Polynomial features

14
Summarizing Data - Descriptive Statistics
• Use Descriptive statistics to summarize the dataset.

• This will provide statistics such as mean, standard deviation, min, max, and quartiles for
numerical features.
• Visualize data distribution using histograms or box plots.

15
Data Visualization for Classification
• Visualization helps in understanding the distribution and relationships of the data.
Examples of plots to visualize data:
• Histograms for feature distributions
• Pair plots for visualizing relationships
• Box plots for detecting outliers

16
Splitting Data into Training and Test Sets
• Before building a classification model, split the data into training and test sets to evaluate
model performance.
• Use train_test_split from scikit-learn:

17
Summary of Key Steps
• Exploring: Load and inspect the dataset, check for missing values and basic statistics.
• Transforming: Handle missing values, encode categorical features, and scale data.
• Summarizing: Use descriptive statistics and visualization to understand data distribution
and relationships.
• Model Training: Split the data into training and test sets before training the classification
model.
• Next Steps: Apply a classification algorithm (e.g., Logistic Regression, Decision Trees,
etc.) to train the model on the processed data.

18
Learning Outcomes
On completion of the experiment students will be able to understand:-
Understanding Data Preprocessing
• Students will learn the importance of cleaning, transforming, and preparing data to ensure accurate and efficient
model performance.
Proficiency in Dataset Exploration
• Learners will acquire the ability to inspect datasets, identify patterns, and detect issues such as missing values,
outliers, and inconsistencies.
Applying Data Transformation Techniques
• Participants will gain hands-on experience in applying techniques like feature scaling, encoding categorical
variables, and handling missing data.
Visualizing and Summarizing Data
• Students will understand how to summarize datasets using statistical metrics and visualize distributions and
relationships to derive insights.
Preparing Data for Classification Models
• Learners will be able to split datasets into training and test sets effectively and prepare them for classification tasks
by applying necessary transformations.

19
Viva Voice Questions
• What are Different Types of Machine Learning algorithms?
• What is Supervised Learning?
• What is Unsupervised Learning?
• What is ‘training Set’ and ‘test Set’ in a Machine Learning Model?
• How Much Data Will You Allocate for Your Training, Validation, and Test Sets?

20
THANK YOU

For queries
Email: [email protected]

21

You might also like