0% found this document useful (0 votes)
9 views8 pages

DataScience - ML DEEP LEARNING - LPEI - 120 Days

The document outlines a comprehensive 120-day Data Science training program, covering topics from Python programming and exploratory data analysis to advanced machine learning and deep learning techniques. It includes weekly themes, practical assignments, and capstone projects to reinforce learning. The program also emphasizes model deployment and interview preparation, ensuring participants are industry-ready.

Uploaded by

engineeringbaby4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

DataScience - ML DEEP LEARNING - LPEI - 120 Days

The document outlines a comprehensive 120-day Data Science training program, covering topics from Python programming and exploratory data analysis to advanced machine learning and deep learning techniques. It includes weekly themes, practical assignments, and capstone projects to reinforce learning. The program also emphasizes model deployment and interview preparation, ensuring participants are industry-ready.

Uploaded by

engineeringbaby4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Science LPEI – 120 DAYS

Week 1 : Introduction to Data Science and Python Programming


● Overview of Data Science and Data Science Workflow.
● Python Basics: Variables, Data Types, and Basic Operations.
● Python Control Structures: Conditionals and Loops.
● Python Functions, Lambda Functions, and Error Handling.
● Working with Python Modules and Packages.
● Introduction to NumPy for Numerical Data.
● Introduction to Pandas for Data Manipulation.

Week 2: Exploratory Data Analysis (EDA) & Visualization


● Introduction to EDA and Descriptive Statistics.
● Data Visualization Basics using Matplotlib.
● Advanced Visualization with Seaborn: Pair Plots, Heatmaps.
● Analyzing and Visualizing Categorical Data.
● Analyzing Relationships between Variables.
● Handling Missing Data in EDA.
● Outlier Detection and Treatment.

Week 3: Data Cleaning and Feature Engineering


● Data Cleaning: Removing Duplicates and Handling Null Values.
● Data Transformation and Encoding Categorical Variables.
● Feature Scaling and Normalization.
● Feature Engineering: Creating New Features.
● Feature Selection Techniques.
● Handling Imbalanced Data.
● Data Preparation Pipeline.

Week 4 : Statistics for Data Science


● Introduction to Descriptive Statistics.
● Probability Distributions: Normal, Binomial, Poisson.
● Confidence Intervals and Hypothesis Testing.
● T-tests and ANOVA.
● Chi-Square Test for Categorical Data.
● Linear Regression Model: Theory and Implementation.
● Correlation Analysis and Interpretation.

Week 5: Advanced SQL for Data Science


● Introduction to SQL for Data Science: Basic Queries.
● Joins and Subqueries in SQL.
● Aggregation and Grouping in SQL.
● Window Functions and Partitioning Data.
● Optimizing SQL Queries for Performance.
● Advanced SQL Analytics: CTEs and Recursive Queries.
● SQL Case Study: Data Analysis Project.
Week 6: Supervised Machine Learning Algorithms (Associate Level)
● Introduction to Supervised Learning and Machine Learning Workflow.
● Logistic Regression: Implementation and Evaluation.
● Decision Trees: Theory and Implementation.
● Cross-Validation and Overfitting Prevention.
● Ensemble Learning: Bagging and Boosting Techniques.
● Random Forests: Bagging Technique.
● Model Evaluation Techniques: Precision, Recall, F1-Score.
● Hyperparameter Tuning with GridSearchCV.
● Support Vector Machine (SVM) Classifier.

Week 7: Advanced Supervised Learning Techniques


● K-Nearest Neighbors (KNN) Classifier.
● Time Series Analysis: Decomposing Trends and Seasonality.
● ARIMA Model for Time Series Forecasting.
● Introduction to Deep Learning: Neural Networks (ANN)
● PowerBI for Data Visualization and Dashboard Creation.

Week 8 : Unsupervised Learning and Dimensionality Reduction


● Introduction to Unsupervised Learning: Clustering Algorithms.
● K-Means Clustering: Theory and Implementation.
● Hierarchical Clustering and Dendrograms.
● DBSCAN Algorithm for Noise and Outlier Detection.
● Introduction to Dimensionality Reduction: PCA.
● Feature Selection Techniques for Model Optimization.

Week 9: Advanced Machine Learning Techniques


● Introduction to Ensemble Learning: Bagging and Boosting.
● XGBoost: Theory and Implementation.
● Stacking and Blending Models for Better Performance.
● Introduction to Reinforcement Learning: Concepts and Applications.
● Hyperparameter Optimization with Bayesian Optimization.
● Building and Deploying a Machine Learning API using Flask.

Week 10: Deep Learning and Neural Networks


● Introduction to Deep Learning: Neural Network Fundamentals.
● Convolutional Neural Networks (CNN): Theory and Applications.
● Transfer Learning with Pre-trained Models (VGG16 / Facenet512 / ResNet).
● Long Short-Term Memory (LSTM) Networks for Time Series.
● Introduction to NLP (Natural Language Processing): Text Preprocessing (TextBlob)

Week 11: Web Frameworks


● Introduction to HTML, Common Tags and Elements.
● Introduction to Flask framework, Setting up flask environment, Project structure– apps, templates and
Database.
● Introduction to streamlit as an alternate to HTML and FLASK
Week 12: Model Deployment and Productionisation
● Introduction to Cloud Services: AWS, GCP.
● Setting up AWS EC2 Instance for deployment
● Model Deployment on AWS EC2 Instance
● Streamlit for Interactive Data Science Applications.
● End-to-End Machine Learning Project: Model Deployment Case Study.

Week13-15: Three Capstone Project with Industry-Level Practices and AWS Deployment
● Final Project Kickoff: Industry-Level Capstone Project.
● Capstone Project Presentation and Introduction.
● Project Presentation and Review

Week 16: Mock Tests and Interview Preparation


 Data Science Mock Tests and Interview
 Assistance in resume building and Soft Skills.

Sample Assignments:
Python Assignments

List & tuples: (Same questions can be asked on tuples)


1. X = [10,20,30] print last 2 values using : operator
2. X = [10,20,30] print all elements using : operator
3. X = [10,20,30,40...100] print 10 30 50 and so on (alternative elements) without using the
for loop. Hint - use :: operator
4. Store 10,20,30,....980,990,1000 into a list. Hint - use range
5. Take a list of 10 elements print elements in reverse order, hint use :: operator
6. Take a list of 10 elements print elements in reverse order using for loop
7. Take a list of 10 elements print all elements using for loop
8. Show one eg for the property list is mutable.
9. Modify 2nd index element of list with 300
10. Which method do you use to search if a given element is available in the list. Show e.g.
how do you use it. What happens if the element is not available?
11. Which method do you use to delete an element present in the list. Show one e.g. how do
you use it.
12. Insert a new element before 1st index in the list using a method of list, show e.g. of that
method
13. Which method do you use to delete an element present in the 1st index of the list? Show
code.
14. x=[ (10,20), (30,40), (50,60) ] print element 40
15. In the above example what will happen if we print(x[-1][-1])
Set:
1. Create an empty set.
2. Store 10,20,30 in a set and print 10 using index
3. Print set elements using for loop
4. Set stores elements in insertion order [t/f]
5. Sets are faster compared to lists [t/f]
6. Which method do you use to add a new element into the set? Show eg
7. Which method do you use to delete an element from a set? Show eg.
Dictionary:
1. Dictionary - Take a dictionary insert a new pair k=30
2. Dictionary - Print the value of key i
3. Dictionary - In the dictionary update value of key j with 20
4. Dictionary x = {‘i’:10, ‘i’ : 20 } what happens if i print (x)
5. Dictionary - does it allow duplicate values
6. Dictionary - can we get key based on value? Why?
7. Dictionary - print(x[‘i’]) assume that key i is not available, what happens?
8. Print dictionary keys and values using a for loop.
9. How will you print only keys present in the dictionary?
10. How will you print only values present in the dictionary?
11. Which method of dictionary do you use to find if a given key is available in the dictionary
or not? Show eg .
12. Which method do you use to get the value of a given key from a dictionary? Show eg?
13. Which method do you use to delete a pair from the dictionary based on the given key.
Show eg.
14. X = { ‘i’ : [10,20], ‘j’:[40,50] } print 10,20
15. X = { ‘i’ : [10,20], ‘j’:[40,50] } print 20
16. X = { ‘i’ : [10,20], ‘j’:[40,50] } print 50
17. X = { ‘a’: {‘i’:10, ‘j’:20}, ‘b’:{‘i’:100, ‘j’:200} } print 10
18. X = { ‘a’: {‘i’:10, ‘j’:20}, ‘b’:{‘i’:100, ‘j’:200} } print 200
19. X = { ‘a’: {‘i’:10, ‘j’:20}, ‘b’:{‘i’:100, ‘j’:200} } dictionary a present in dictionary x
Functions default arg kwargs parameters:
1. Trainer can ask find output based questions on default parameters
2. Trainer can ask how do you call this method (with default parameter)
3. What is the data type of args internally
4. What is the data type of kwargs internally
5. How many times can we use args for a given function?
6. How many times can we use kwargs for a given function?
7. How many default parameters can we use for a given function?
8. Def f1(a,b,*c,d=10,**e) : pass call this method and pass 1 to a, 2 to b, 3,4,5 to c and 6
to d and pass hno=10 street =btm to e.
9. Def f1(x,y,*z=10): pass what happens if we call f1(10) what will be x y z values
10. Def f1(x,y,*z=10): pass what happens if we call f1(10,20,30) what will be x y z values
List comprehension:
1. Take a list l1 copy all l1 elements into l2
2. Take a list l1 copy double of each element of l1 into l2
3. Take a list l1 copy each element power 2 of l1 into l2
4. Take a list l1 copy each element+1 into l2
5. Take a list l1 copy all even numbers into l2
6. Take name=’palle’ copy all letters into l2 if it is not vowel
7. Copy even numbers from 0 to 11 into a list using comprehension
Lambda expression:
1. Write a lambda expression which takes one parameter and returns element power 2, call
that function and print the returned value
2. Write a lambda expression which takes 3 parameters and returns a sum. Call and print.
3. Write a lambda expression to find the biggest of 2 numbers. Call and print
4. Take a list l1 copy each element+1 into l2 using map() function
5. Take a list l1 copy all even numbers into l2 using filter() function
6. Take a list l1 copy all elements which are greater than 10 into l2 using filter function
7. Take a list l1 copy half of each element into l2 using map() function
8. Take a list l1 copy all odd numbers into l2 using filter() function

Numpy Assignments
Given a numpy array mat as below, perform the matrics operations using slicing and broadcasting techniques
mat = np.arange(1,26).reshape(5,5)
1. # WRITE CODE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
array([[12, 13, 14, 15],
[17, 18, 19, 20],
[22, 23, 24, 25]])
2. # WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
20

3. # WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
array([[ 2],
[ 7],
[12]])
4. # WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
array([21, 22, 23, 24, 25])
5. # WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
325
6. # WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
7.2111025509279782
7. # WRITE CODE HERE THAT REPRODUCES THE OUTPUT OF THE CELL BELOW
array([55, 60, 65, 70, 75])

Pandas Assignments
1. Write a Pandas program to add, subtract, multiple and divide two Pandas Series.
Sample Series: [2, 4, 6, 8, 10], [1, 3, 5, 7, 9]

2. Write a Pandas program to convert a NumPy array to a Pandas series.


Sample NumPy array: d1 = [10, 20, 30, 40, 50]

3. Write a Pandas program to get the first 3 rows of a given DataFrame.


Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Expected Output:

First three rows of the data frame:

attempts name qualify score

a1 Anastasia yes 12.5

b3 Dima no 9.0

c2 Katherine yes 16.5

4. Write a Pandas program to select the rows where number of attempts in the examination is less than 2 and
score greater than 15.
Sample DataFrame:

Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],


'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

5. Write a Pandas program to calculate the mean score for each different student in data frame.
Sample DataFrame:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Exploratory Data Analysis (EDA) & Visualization Assignments


1. Load a CSV file (e.g., COVID-19 Data) and perform basic data analysis.
2. Calculate summary statistics for the Iris Dataset using Pandas.
3. Plot histograms and scatter plots using the Mall Customer Segmentation Dataset.
4. Create pair plots and heatmaps for the Flight Delay Dataset
5. Use bar plots and count plots to analyze the Titanic Dataset
6. Perform correlation analysis on the Wine Quality Dataset and visualize it using a heatmap
7. Clean and handle missing values in the Housing Prices Dataset
8. Detect and treat outliers in the Boston Housing Dataset

Data Cleaning and Feature Engineering Assignments


1. Clean a raw dataset like the Credit Card Dataset for analysis
2. Perform one-hot encoding on the Customer Churn Dataset
3. Apply Min-Max scaling and Standardization on the Diabetes Dataset
4. Create new features for the Loan Prediction Dataset
5. Use feature importance scores from a Random Forest model to select features for the Breast
Cancer Dataset
6. Implement SMOTE to handle imbalanced data in the Fraud Detection Dataset
7. Create a data preprocessing pipeline using scikit-learn for the Heart Disease Dataset

Statistics for Data Science Assignments


1. Calculate measures of central tendency and variability for the Olympics Dataset
2. Plot and analyze different probability distributions using the Insurance Dataset
3. Perform hypothesis testing on the Heart Failure Dataset
4. Conduct T-tests and ANOVA on the Student Performance Dataset
5. Apply Chi-square test on the Census Income Dataset
6. Implement linear regression on the Advertising Dataset
7. Analyze the correlation between variables in the NBA Players Dataset

MySQL Assignments
1. Write SQL queries to explore the World Population Dataset
2. Use various JOINs and subqueries on the Chinook Database
3. Perform aggregation tasks using the IMDB Movies Dataset
4. Apply window functions on the Airbnb Listings Dataset
5. Optimize complex queries for the COVID-19 Cases Dataset
6. Write CTEs and recursive queries for the Employee Database
7. Conduct a full analysis using SQL on the Sales Data
Supervised Machine Learning Algorithms (Associate Level) Assignments
1. Implement a simple regression model using the Advertising Dataset
2. Apply logistic regression on the Diabetes Dataset.
3. Build a decision tree classifier using the Wine Quality Dataset
4. Use Random Forest to classify the Breast Cancer Dataset
5. Evaluate a classification model using precision, recall, and F1-score on the Heart Disease Dataset.
6. Perform hyperparameter tuning on the Car Evaluation Dataset.
7. Implement k-fold cross-validation on the Housing Prices Dataset.

Advanced Supervised Learning Techniques Assignments


1. Train an SVM model on the Spam Email Dataset.
2. Apply KNN on the Glass Identification Dataset
3. Implement AdaBoost and XGBoost on the Wine Dataset.
4. Perform time series decomposition on the Airline Passengers Dataset.
5. Build an ARIMA model to forecast sales using the Retail Sales Dataset
6. Build a simple neural network for regression using the Concrete Strength Dataset.
7. Implement a CNN to classify images from the CIFAR-10 Dataset.
8. Create an interactive dashboard using the Superstore Dataset.

Unsupervised Learning and Dimensionality Reduction Assignments


1. Perform clustering using K-Means on the Customer Segmentation Dataset.
2. Analyze the clusters formed using K-Means on the Wholesale Customers Dataset.
3. Apply hierarchical clustering on the Country Clusters Dataset and visualize the dendrogram.
4. Use DBSCAN to identify outliers in the Credit Card Fraud Dataset.
5. Apply Principal Component Analysis (PCA) on the Wine Quality Dataset to reduce dimensions and
visualize the data.
6. Compare t-SNE and UMAP visualizations on the Fashion MNIST Dataset.
7. Use Recursive Feature Elimination (RFE) for feature selection on the Diabetes Dataset.

Advanced Machine Learning Techniques Assignments


1. Implement Bagging and Boosting techniques using the Heart Disease Dataset.
2. Build an XGBoost model for the Loan Prediction Dataset.
3. Create a stacked model using the Boston Housing Dataset.
4. Implement a simple Q-learning algorithm for a tic-tac-toe game simulation.
5. Perform Bayesian optimization on the Forest Cover Type Dataset
6. Use SHAP values to interpret predictions from a Random Forest model on the California Housing
Dataset.
7. Create a Flask API to serve predictions from a model trained on the Iris Dataset

Deep Learning and Neural Networks Assignments


1. Build a simple feedforward neural network for the Banknote Authentication Dataset
2. Train a CNN model for image classification using the CIFAR-10 Dataset.
3. Use transfer learning for image classification with the Flowers Recognition Dataset
4. Build an RNN model to predict the next word in a sentence using the Shakespeare Text Dataset.
5. Implement an LSTM model to forecast stock prices using the Apple Stock Price Dataset
6. Create a simple GAN to generate new handwritten digits based on the MNIST Dataset
7. Develop an image classification model on the Chest X-ray Dataset.

Advanced Deep Learning Techniques Assignments


1. Perform text preprocessing (tokenization, stopword removal, stemming) on the Amazon Reviews
Dataset.
2. Use Word2Vec to create word embeddings for the Movie Reviews Dataset.
3. Build an LSTM model to perform sentiment analysis on the Twitter Sentiment Analysis Dataset.
4. Implement an attention-based LSTM model on the Fake News Detection Dataset.
5. Fine-tune a BERT model for text classification on the Quora Insincere Questions Dataset.
6. Use neural style transfer to apply artistic styles to images from the COCO Dataset.
7. Develop an image segmentation model using the Oxford Pets Dataset.

Model Deployment and Productionization Assignments


1. Create a basic API with FLASK to serve predictions from a trained model.
2. Dockerize a machine learning application using.
3. Set up an EC2 instance on AWS and deploy a simple model API.
4. Store a dataset in AWS S3 and access it in a Jupyter notebook for analysis
5. Deploy a Flask-based machine learning model using Elastic Beanstalk.
6. Build a Streamlit application to visualize and predict outcomes.
7. Complete an end-to-end deployment of a designed model.

Capstone Projects and Industry-Level Practices


1. For a start by performing EDA and initial model selection.
2. Complete the capstone projects, prepare a detailed report, and present your findings. Include al l
aspects from data analysis, model building, evaluation, and deployment.

You might also like