0% found this document useful (0 votes)
29 views2 pages

DM Assignment 2

Assignment Machine Learning Algorithm

Uploaded by

Memoona Ishfaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views2 pages

DM Assignment 2

Assignment Machine Learning Algorithm

Uploaded by

Memoona Ishfaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

COMSATS UNIVERSITY ISLAMABAD

Department of Computer Science


Assignment No. 2

Course: Data Mining (DSC306) Total marks 10

[CLO 2 Apply preprocessing and classification techniques to solve classification problems of


moderate complexity.]

Applying Pre-processing and Classification Techniques


Objective:
The purpose of this assignment is to apply pre-processing and classification techniques to solve
classification problems of moderate complexity. Students will gain hands-on experience with data
preparation, feature selection, model training, and evaluation.
1. Data Selection:
• Choose a dataset that presents a classification problem of moderate complexity. This could
be from sources like UCI Machine Learning Repository, Kaggle, or any other relevant
source.
• Provide a brief description of the dataset, including the number of instances, features, and
the target variable.
2. Data Pre-processing:
• Data Cleaning: Handle missing values, remove duplicates, and correct inconsistencies in
the dataset.
• Data Transformation: Normalize or standardize the data as necessary. Convert categorical
variables into numerical format using techniques such as one-hot encoding or label
encoding.
• Feature Selection: Identify and select relevant features that contribute to the classification
task. You can use techniques like correlation analysis, recursive feature elimination, or
feature importance from tree-based models.
3. Model Selection and Training:
• Apply at least two classification models (Decision Trees, Random Forest, Support Vector
Machines, or Neural Networks.)
• Split your dataset into training and testing sets (e.g., 80/20 split).
• Train the selected models on the training set.
4. Model Evaluation:
• Evaluate the performance of your models using appropriate metrics such as accuracy,
precision, recall, F1-score, and ROC-AUC.
• Create confusion matrices for each model to visualize performance.
• Discuss the strengths and weaknesses of each model based on the evaluation metrics.
5. Hyperparameter Tuning:
• For one of the models, perform hyperparameter tuning using techniques like Grid Search
or Random Search to optimize performance.
• Report the best parameters and the resulting performance metrics.
6. Conclusion:
• Summarize your findings, including which model performed best and why.
• Discuss any challenges faced during the pre-processing and modeling phases and how you
overcame them.

Deliverables:
• A well-documented Jupyter Notebook containing:
• Code for each step of the assignment.
• Visualizations where applicable (e.g., plots for data distribution, confusion matrices).
• Comments explaining your thought process and decisions made throughout the
assignment.
• (Optional) A written report (2-3 pages) summarizing your approach, findings, and conclusions.

Evaluation Criteria:

Your assignment will be evaluated based on the following criteria:


1. Dataset Selection and Description (1 points):
• Appropriateness of the chosen dataset for a moderate complexity classification problem.
• Clarity and completeness of the dataset description.
2. Data Pre-processing (1 points):
• Effectiveness of data cleaning methods applied.
• Appropriateness of data transformation techniques used.
• Justification for feature selection methods and the relevance of selected features.
3. Model Selection and Training (3 points):
• Justification for the choice of classification algorithms.
• Correct implementation of data splitting and model training.
4. Model Evaluation (3 points):
• Use of appropriate evaluation metrics and clarity in presenting results.
• Quality of confusion matrices and analysis of model performance.
• Depth of discussion regarding the strengths and weaknesses of each model.
5. Hyperparameter Tuning (2 points):
• Effectiveness of the hyperparameter tuning process.
• Clarity in reporting the best parameters and their impact on model performance.
6. Conclusion and Reporting:
• Clarity and depth of the summary of findings.
• Insightfulness in discussing challenges faced and solutions implemented.
• Overall organization and professionalism of the written report and code documentation.

You might also like