0% found this document useful (0 votes)
101 views

Bca Ctis Sem-5 Introduction To Data Science

Uploaded by

Vikas Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Bca Ctis Sem-5 Introduction To Data Science

Uploaded by

Vikas Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Swarrnim School of Computing & IT

Course Dossier

Prof. Vikas Chandra Sharma

Bachelor of Computer Application- CTIS

Semester: V
Subject Name: INTRODUCTION TO DATA SCIENCE
Subject Code: 14110503
SWARRNIM STARTUP & INNOVATION UNIVERSITY
Swarrnim School of Business (BCA -CTIS)
Open Elective-I
Introduction to Data Science
Semester: V
Code: ________

Teaching & Evaluation Scheme:-


Teaching Scheme Evaluation Scheme

Credits Internal External Total


Th Tu Pr Total

Th Pr Th Pr
2 - - 2 2 30 - 70 - 100

Objectives: -

● Apply quantitative modeling and data analysis techniques to the solution of real world
business problems, communicate findings, and effectively present results using data
visualization techniques.
● Recognize and analyze ethical issues in business related to intellectual property, data
security, integrity, and privacy.
Prerequisites: -NA

Course outline:-
Sr. Course Contents Number
No. of Hours
1 Data Science- An Overview 6
Introduction to Data Science, Definition and description of Data Science,
history and development of Data Science, terminologies related with Data
Science, basic framework and architecture, difference between Data Science
and business analytics, importance of Data Science in today’s business world,
primary components of Data Science, users of Data Science and its hierarchy,
overview of different Data Science techniques, challenges and opportunities in
business analytics, different industrial application of Data Science techniques.
2 Mathematics and Statistics in Data Science 6
Role of mathematics in Data Science, importance of probability and statistics
in Data Science, important types of statistical measures in Data Science :
Descriptive, Predictive and prescriptive statistics, introduction to statistical
inference and its usage in Data Science, application of statistical techniques in
Data Science, overview of linear algebra : matrix and vector theory, role of
linear algebra in Data Science, exploratory data analysis and visualization
techniques, difference between exploratory and descriptive statistics, EDA and
visualization as key component of Data Science
3 Machine Learning in Data Science 6
Role of machine learning in Data Science, different types of machine learning
techniques and its broad scope in Data Science : Supervised, unsupervised,
reinforcement and deep learning, difference between different machine
learning techniques, brief introduction to machine learning algorithms,
importance of machine learning in today’s business, difference between
machine learning classification and prediction.
4 Computers in Data Science 6
Role of computer science in Data Science, various components of computer
science being used for Data Science, role of relation data base systems in Data
Science: SQL, NoSQL, role of data warehousing in Data Science, terms
related with data warehousing techniques, importance of operating concepts
and memory management, various freely avDSlable software tools used in
Data Science : R, Python, important proprietary software tools, different
business intelligence tools and its crucial role in Data Science project
presentation.
5 Data Science Project Management 6
Data Science project framework, execution flow of a Data Science project,
various components of Data Science projects, stakeholders of Data Science
project, industry use cases of Data Science implementation, challenges and
scope of Data Science project management, process evaluation model,
comparison of Data Science project methods, improvement in success of Data
Science project models.

Learning Outcomes:-

On successful completion of the course, students will be able to,


● Understand the process and components of Data Science project.

● Learn the importance of probability and statistics in Data Science

● Understand the machine learning in today’s business world.

● Understands the various components of computer science being used for Data Science

● Understand the execution flow of a Data Science project

Teaching & Learning Methodology:-

● The class will be taught using theory and case based method. In addition to assigning the
case studies, the course instructor will spend considerable time in understanding the
concept of innovation through the eyes of the consumer. The instructor will cover the
ways to think innovatively liberally using thinking techniques

Books Recommended:-
Text Books:

1. “Data Science from Scratch: First Principles with Python 1st Edition by Joel Grus
2. Principles of Data Science by Sinan Ozdemir, (2016) PACKT of Database Systems”,
Fourth Edition, Pearson/Addision Wesley, 2007
3. Data Science for Dummies by Lillian Pierson (2015)

Reference Books:

1. Data Science for Business: What You Need to Know about Data Mining and Data-
Analytic Thinking by Foster Provost, Tom Fawcet
2. Data Smart: Using Data Science to Transform Information into Insight 1st Edition by
John W. Foreman. (2015) Wiley Publication
MODULE 1: Data Science - An Overview

1. Introduction to Data Science:


- Definition and description of Data Science.
- The interdisciplinary nature of Data Science, combining domains
such as statistics, computer science, and domain knowledge.
- The role of Data Scientists in extracting insights from data to drive
decision-making.

2. History and Development of Data Science:


- Overview of the evolution of Data Science from early statistics to
the current era.
- Key milestones and breakthroughs in the field.
- The role of technological advancements in shaping Data Science.

3. Terminologies Related to Data Science:


- Explanation of terms commonly used in Data Science, such as Big
Data, Machine Learning, Artificial Intelligence, Data Mining, etc.
- Understanding the significance and context of each term.

4. Basic Framework and Architecture:


- Overview of the typical Data Science workflow, including data
collection, data cleaning, data exploration, feature engineering,
modelling, evaluation, and deployment.
- Explanation of various tools and technologies used in Data Science
pipelines.

5. Difference Between Data Science and Business Analytics:


- Distinctions between Data Science and Business Analytics in terms
of goals, methodologies, and application.
- Understanding how the two fields complement each other.

6. Importance of Data Science in Today's Business World:


- The growing significance of Data Science in decision-making and
strategy development across industries.
- Examples of successful applications of Data Science in real-world
scenarios.

7. Primary Components of Data Science:


- Explanation of the core components of Data Science, such as data
collection methods, data storage and management, data analysis
techniques, and data visualization.
- Understanding the role of each component in the overall process.

8. Users of Data Science and Its Hierarchy:


- Identification of various stakeholders who benefit from Data
Science insights, including executives, managers, data analysts, and
researchers.
- Understanding the hierarchy of roles in a typical Data Science team.

9. Overview of Different Data Science Techniques:


- Introduction to various Data Science techniques, such as statistical
analysis, machine learning algorithms, natural language processing,
and data visualization.
- Examples of how these techniques are applied in different domains.

10. Challenges and Opportunities in Business Analytics:


- Identification of challenges faced in implementing Data Science
solutions, such as data quality issues, data privacy concerns, and
model interpretability.
- Exploration of the opportunities for businesses to gain a
competitive advantage through Data Science.

11. Different Industrial Applications of Data Science Techniques:


- Case studies and examples of Data Science applications in diverse
industries, such as healthcare, finance, marketing, and
transportation.
- Understanding the impact of Data Science on business processes
and decision-making.
Assignment: An Overview of Data Science and its Applications

Objective:
The objective of this assignment is to provide students with a
comprehensive understanding of Data Science, its history,
development, and the fundamental components involved in the Data
Science process. The assignment also aims to highlight the
importance of Data Science in today's business world, explore
different Data Science techniques, and analyze their applications
across various industries.

Assignment Questions:

1. Introduction to Data Science:


a) Define Data Science and describe its interdisciplinary nature.
b) How do Data Scientists contribute to decision-making in various
domains?

2. History and Development of Data Science:


a) Briefly explain the historical evolution of Data Science, citing key
milestones.
b) How have technological advancements influenced the growth of
Data Science?

3. Terminologies Related to Data Science:


a) Define the following terms:
- Big Data
- Machine Learning
- Artificial Intelligence
- Data Mining
b) Provide examples of how each of these terms is applied in real-
world scenarios.

4. Basic Framework and Architecture:


a) Outline the typical Data Science workflow, including the main
steps involved.
b) Explain the significance of each step in the Data Science process.

5. Difference Between Data Science and Business Analytics:


a) Compare and contrast Data Science and Business Analytics in
terms of objectives, methodologies, and applications.
b) How can businesses benefit from integrating both Data Science
and Business Analytics?

6. Importance of Data Science in Today's Business World:


a) Discuss the importance of Data Science in decision-making and its
impact on business strategies.
b) Provide specific examples of successful Data Science applications
in different industries.

7. Primary Components of Data Science:


a) Identify and explain the key components of Data Science,
including data collection methods, data storage and management,
data analysis techniques, and data visualization.
b) How do these components work together to derive insights from
data?

8. Users of Data Science and Its Hierarchy:


a) Identify the different stakeholders who benefit from Data Science
insights in an organization.
b) Describe the hierarchy of roles in a typical Data Science team.

9. Overview of Different Data Science Techniques:


a) Provide an overview of various Data Science techniques, such as
statistical analysis, machine learning algorithms, natural language
processing, and data visualization.
b) Give examples of how each technique can be applied to solve real-
world problems.
10. Challenges and Opportunities in Business Analytics:
a) Identify the challenges faced in implementing Data Science
solutions in organizations.
b) Discuss the opportunities for businesses to leverage Data Science
for competitive advantage.

11. Different Industrial Applications of Data Science Techniques:


a) Select two industries (e.g., healthcare, finance, marketing,
transportation) and explain how Data Science techniques have been
applied in each of them.
b) Assess the impact of Data Science on the efficiency and
effectiveness of these industries.

Submission Guidelines:
- The assignment should be typed and submitted as a document
(e.g., MS Word or PDF).
- Clearly label each question and provide the corresponding answers.
- Use appropriate headings, subheadings, and bullet points for
clarity.
- Cite your sources whenever you use external references or
examples.

Class Test: An Overview of Data Science and its Applications

Section A: Multiple Choice Questions (10 points, 2 points each)

1. Data Science is an interdisciplinary field that combines knowledge


from which domains?
a) Biology and Chemistry
b) Statistics and Computer Science
c) History and Literature
d) Geology and Geography
2. Which of the following terms is used to refer to the large and
complex datasets that cannot be processed using traditional data
processing techniques?
a) Big Data
b) Data Mining
c) Artificial Intelligence
d) Machine Learning

3. The primary goal of Data Science is to:


a) Collect as much data as possible.
b) Extract insights and knowledge from data.
c) Store and manage data efficiently.
d) Create visually appealing data charts.

4. What is the key difference between Data Science and Business


Analytics?
a) Data Science deals with structured data, while Business Analytics
deals with unstructured data.
b) Data Science focuses on historical data analysis, while Business
Analytics focuses on real-time data.
c) Data Science uses statistical techniques, while Business Analytics
uses machine learning algorithms.
d) Data Science aims to generate insights and predictions, while
Business Analytics focuses on operational and strategic decisions.

5. Data Science techniques are commonly used in which of the


following industries?
a) Agriculture and Farming
b) Fashion and Entertainment
c) Healthcare and Medicine
d) Construction and Real Estate

Section B: Short Answer Questions (15 points, 3 points each)


6. Define Data Science and explain its significance in today's business
world.

7. Briefly describe the historical development of Data Science and


provide two key milestones in its evolution.

8. Outline the typical Data Science workflow and explain the


importance of data visualization in the process.

9. Differentiate between supervised and unsupervised learning


techniques used in Data Science.

10. Identify two challenges faced in implementing Data Science


solutions in organizations and propose potential solutions for each
challenge.

Section C: Application and Analysis (25 points, 5 points each)

11. Choose one industry of your choice and describe how Data
Science techniques have been applied to solve real-world problems
in that industry. Provide specific examples to support your answer.

12. Imagine you are part of a Data Science team working for a retail
company. Explain how you would use Data Science techniques to
analyze customer purchase patterns and recommend personalized
products to customers.

CASE STUDIES
Case Study 1: Healthcare Industry - Predictive Analytics for Patient
Readmission

Background:
A leading hospital chain aims to reduce patient readmissions, which
not only impact the quality of care but also result in increased
healthcare costs. The hospital management wants to leverage Data
Science techniques to predict which patients are at high risk of
readmission, enabling timely interventions and personalized care
plans.

Objective:
Develop a predictive analytics model using Data Science techniques
to identify patients at high risk of readmission.

Data:
The hospital has collected electronic health records (EHR) of patients
over the past five years, including demographics, medical history,
diagnostic tests, medications, and previous hospitalization details.

Challenges:
1. Dealing with imbalanced data where the majority of patients do
not experience readmission.
2. Ensuring privacy and compliance with patient data while
performing analysis.

Solution:
1. Data Preprocessing: Clean and preprocess the EHR data, handling
missing values and converting categorical variables into numerical
formats.

2. Feature Engineering: Extract relevant features from the EHR data,


such as the number of previous hospitalizations, specific medical
conditions, medications prescribed, and length of hospital stay.

3. Model Selection: Experiment with different classification


algorithms, such as logistic regression, random forests, and support
vector machines, to find the best-performing model.

4. Addressing Class Imbalance: Apply techniques like oversampling,


undersampling, or synthetic data generation to address the class
imbalance issue.
5. Model Evaluation: Split the data into training and testing sets, and
evaluate the model's performance using metrics like precision, recall,
and F1-score.

6. Deployment: Integrate the predictive model into the hospital's


electronic health record system to provide real-time risk scores for
patients.

Case Study 2: Marketing Industry - Customer Segmentation using


Clustering

Background:
A large e-commerce company wants to understand its customer base
better and tailor marketing strategies to different customer
segments. They have vast amounts of transactional data but lack
insights into customer behavior and preferences.

Objective:
Segment customers based on their buying behavior and preferences
to create targeted marketing campaigns.

Data:
The company has a database of historical transactions, including
customer IDs, products purchased, transaction dates, and order
values.

Challenges:
1. Dealing with high-dimensional and noisy transaction data.
2. Identifying an optimal number of customer segments for effective
targeting.

Solution:
1. Data Preparation: Transform transactional data into a customer-
product matrix with customer IDs as rows and products as columns,
and populate it with binary values (1 for purchased, 0 for not
purchased).

2. Dimensionality Reduction: Apply techniques like Principal


Component Analysis (PCA) or t-distributed Stochastic Neighbor
Embedding (t-SNE) to reduce the dimensionality of the data while
preserving essential information.

3. Clustering: Use clustering algorithms such as K-means or


hierarchical clustering to group customers based on their purchasing
patterns.

4. Evaluation: Assess the quality of the clusters using metrics like


silhouette score or within-cluster sum of squares.

5. Customer Profiling: Analyze the characteristics of each customer


segment to understand their preferences and behavior.

6. Marketing Strategies: Develop targeted marketing campaigns for


each customer segment, focusing on products or offers that align
with their preferences.

You might also like