0% found this document useful (0 votes)

12 views

1. Introduction of Subject

The document outlines a course on Big Data Analytics, detailing its objectives, syllabus, and examination scheme. It covers topics such as the evolution of Big Data, analytics techniques, Hadoop, NoSQL databases, and various algorithms for data analysis. Additionally, it includes practical lab sessions and case studies demonstrating the application of Big Data in industries like retail, transportation, and entertainment.

Uploaded by

Pranshav Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

1. Introduction of Subject

Uploaded by

Pranshav Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction

2CS702 Big Data Analytics

Dr Jigna Patel
N-407
[email protected]
9898942993
Course Outcomes
After successful completion of this course, student will be able to
1. outline the significance and challenges of big data
2. model big data using different tools and frameworks
3. apply big data techniques for useful business analytic applications
4. design algorithms for mining the data from large volumes
Syllabus
Unit I
Introduction to Big Data: Evolution of Big Data, Types of Digital Data, Classification of Digital Data, Structured Data, Semi-
Structured Data, Unstructured Data, Definition of Big Data, Challenges of Conventional Systems, Big data platforms and data
storage

Unit II
Big Data Analytics: Importance of Big data analytics, Classification of Analytics, Top Challenges Facing Big Data, Technologies to
meet the Challenges Posed by Big Data, Terminologies Used in Big Data Environment

Unit III
Hadoop: Introducing Hadoop, comparisons of RDBMS and Hadoop, Distributed Computing Challenges, Hadoop Overview,
Business Value of Hadoop, Hadoop Distributed File System, Processing Data with Hadoop, working with Map Reduce,
Hadoop YARN, Hadoop in the Cloud, Applications on Big Hadoop Ecosystem, Fundamentals of Pig, Hive, HBase and ZooKeeper,
Basic concepts of Apache Spark

Unit IV
The Big data technology landscape: CAP Theorem - BASE Concept, NoSQL, Types of No SQL databases, Introduction to
MongoDB, Data Types in MongoDB, CRUD, Apache Cassandra, Features of Cassandra, CRUD

Unit V
Big data analytics Algorithm: Applying Linear Regression, Clustering, Association rule mining, Decision tree on Big Data.
Self-study: Frameworks: Applications on Big Data Using Pig and Hive
References:
1. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer
2. Tom White, Hadoop: The Definitive Guide, Third Edition, O’reilly Media
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos, Understanding Big Data: Analytics
for Enterprise Class Hadoop and Streaming Data, McGraw Hill Publishing
4. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press
5. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced
Analytics, John Wiley & sons
6. Glenn J. Myatt, Making Sense of Data, John Wiley & Sons
7. Pete Warden, Big Data Glossary, O’Reilly
8. Jiawei Han, Micheline Kamber, Data Mining Concepts and Techniques, Second Edition, Elsevier
9. Da Ruan, Guoquing Chen, Etienne E.Kerre, GeertWets, Intelligent Data Mining, Springer
10. Paul Zikopoulos, Dirk deRoos, Krishnan Parasuraman, Thomas Deutsch, James Giles, David Corrigan,
Harness the Power of Big Data The IBM Big Data Platform, Tata McGraw Hill Publications
11. Michael Minelli, Michele Chambers, Ambiga Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence
and Analytic Trends for Today's Businesses, Wiley Publications
12. Zikopoulos, Paul, Chris Eaton, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming
Data, Tata McGraw Hill Publications
13. Seema Acharya and Subhashini C, Big Data and Analytics, Wiley India
Examination Scheme
CE SEE LPW
Exam Duration Continuous 3.0 Hrs Continuous Evaluation
Evaluation + 2 hrs Semester End
LPW Exam

Component Weightage 0.4 0.4 0.2

Innovative/Special Assignment
• Project Work
• Tasks and its Timeline
• Group Size: Maximum 3 students -5/08/2024
• Identification of dataset – 10/08/2024
• Problem formulation – 20/08/2024
• Proposed Methodology/Approach to solve it – 1/10/2024
• Deployment: your own/private cluster or use cloud services like GCP Dataproc
service on Hadoop Ecosystem, MongoDB Atlast etc. – 18/10/2024
• Submit Report of project work carried out – 25/10/2024
Lab session
Phase-1 – Big Data in different domains
Sr. NO Practical Title Hours CLO
1. Study and explore various applications of big data in different domains. Choose one of 02 Hours 1
it and study in detail, Also write down the report on different types of digital data
generated in selected application. For eg:
• Big Data in Retail
• Big Data in Healthcare
• Big Data in Education
• Big Data in E-commerce
• Big Data in Media and Entertainment
• Big Data in Finance
• Big Data in Travel Industry
• Big Data in Telecom
• Big Data in Automobile
2 Learning limitation of data analytics by applying Machine Learning Techniques on large 02 Hours 3
amount of data. Write a program to read data set from any online website, excel file
and CSV file and to perform ML task like,
a) Linear regression and logistic regression on iris dataset.
b) K-means clustering.
• Students will learn the limitation of platform and algorithm.
Lab session
Phase-2 – Hadoop Ecosystem
Sr. NO Practical Title Hours CLO
3. Install and configure a single-node Hadoop cluster 02 3
hours
4. Perform HDFS Commands for following categories 02 3
Hours
• User Commands
o hdfs dfs – runs filesystem commands on the HDFS
o hdfs fsck – runs a HDFS filesystem checking command
• Administration Commands
o hdfs dfsadmin – runs HDFS administration commands
5. Apply MapReduce algorithms to perform analytics on a single node 02 4
cluster.(Any One)
a) find phrase frequency from given dataset. Hours
b) Search records with matching criteria.
Prepare a report to guide design of mapper and reducer.
6. Analyse impact of different number of mapper and reducer on same 02 4
definition as practical 4. Hours
• Prepare a conclusive report on analysis.
Lab session
Phase-3 – NoSQL and Analytics algorithm
Sr. NO Practical Title Hours CLO
7. Setup MongoDB environment in your system. Import Restaurant 02 2
Dataset and perform CRUD operation. Hours
8. Setup Cassandra environment in your system and apply Create, 02 2
Update, Read and Delete operations. Hours
9. Case study: Use following platforms for solving any big data analytic 02 2
problem of your choice. (1) Amazon web services,(2) Microsoft Azure,
(3)Google App engine Hours

10. Implement any one of the analytic algorithm using Pyspark and MLLib 04 4
for larger datasets in main memory. (Machine Learning application)
• Regression Hours
• K-means Clustering
Association Rule Mining Algorithm
Lab session
Practice Purpose
Sr. NO Practical Title Hours CLO
11* Extend MongoDB functionality for MapReduce on document collection 02 3
Hours
12* Extend Cassandra functionality for Map Reduce on restaurant dataset. 02 3
Hours
InClassQuestion#1

• What is the need to learn this subject?

Big Data popular case study

Reference : https://data-flair.training/blogs/big-data-case-studies/
• The largest retailer in the world and the world’s largest company by revenue,
• with more than 2.1 million employees
• 10,586 stores and clubs in 24 countries
• More than 2 million employees and 20000 stores in 28 countries
• Major Problems are :
1. Inventory Management: Ensuring shelves are stocked with the right products at the right time.
2. Customer Insights: Understanding and predicting customer behavior to improve sales.
3. Supply Chain Optimization: Managing a vast network of suppliers and logistics.

1. Inventory Management: 2. Customer Insights:

Tools: Apache Hadoop, Spark Tools: Data lakes, Tableau
Algorithms: Predictive analytics, machine learning Algorithms: Clustering, recommendation engines
Solution: Real-time monitoring of inventory levels and Solution: Analysis of customer purchase data to identify trends
predictive algorithms help in anticipating demand and and preferences, enabling personalized marketing and
automating restocking processes. optimized product placement.

3. Supply Chain Optimization:

Tools: SAP HANA, IBM Watson
Algorithms: Optimization algorithms, route planning
Solution: Streamlining the supply chain through advanced
analytics to improve delivery times and reduce costs.
• Uber is the first choice for people around the world when they think of moving people and making deliveries.
• It uses the personal data of the user to closely monitor which features of the service are mostly used, to analyse
usage patterns and to determine where the services should be more focused.
• Uber focuses on the supply and demand of the services due to which the prices of the services provided changes.
Therefore one of Uber’s biggest uses of data is surge pricing.
1. Dynamic Pricing: Adjusting prices in real-time based on supply and demand.
2. Route Optimization: Finding the most efficient routes for drivers.
3. Customer Satisfaction: Ensuring a high level of service for riders and drivers.

1. Dynamic Pricing: 2. Route Optimization:

Tools: Apache Kafka, Cassandra Tools: MapReduce, Google Maps API
Algorithms: Real-time analytics, dynamic pricing algorithms Algorithms: Shortest path algorithms, machine learning
Solution: Adjusting prices based on real-time data on rider Solution: Providing drivers with optimal routes using GPS data
demand and driver availability. and traffic patterns to reduce travel time and fuel
consumption.
3. Customer Satisfaction:
Tools: SQL, NoSQL databases
Algorithms: Sentiment analysis, predictive analytics
Solution: Analyzing feedback and ride data to improve service
quality and address issues promptly.
• It is the most loved American entertainment company specializing in online on-demand
streaming video for its customers. Netflix has been determined to be able to predict what
exactly its customers will enjoy watching with Big Data.

1. Content Recommendation: Providing personalized content recommendations to users.

2. Content Creation: Deciding which new shows and movies to produce.
3. Streaming Quality: Ensuring a seamless streaming experience across different devices and networks.

1. Content Recommendation: 2. Content Creation:

Tools: Apache Spark, Hadoop Tools: Python, R
Algorithms: Collaborative filtering, deep learning Algorithms: Predictive analytics, machine learning
Solution: Delivering personalized content suggestions by Solution: Identifying trends and preferences to inform
analyzing viewing habits and preferences. content production decisions.

3. Streaming Quality:
Tools: Amazon Web Services (AWS), Akamai
Algorithms: Adaptive bitrate streaming, predictive analytics
Solution: Optimizing streaming quality by predicting and
managing network congestion.
• A big technical challenge for eBay as a data-intensive business to exploit a system
that can rapidly analyze and act on data as it arrives (streaming data).
• There are many rapidly evolving methods to support streaming data analysis.
• eBay is working with several tools including Apache Spark, Storm, Kafka.
• It allows the company’s data analysts to search for information tags that have
been associated with the data (metadata) and make it consumable to as many
people as possible with the right level of security and permissions (data
governance).
• The company has been at the forefront of using big data solutions and actively
contributes its knowledge back to the open-source community
• It is a 179-year-old company.
• The genius company has recognized the potential of Big Data and put it to use in
business units around the globe.
• P&G has put a strong emphasis on using big data to make better, smarter, real-
time business decisions.
• The Global Business Services organization has developed tools, systems, and
processes to provide managers with direct access to the latest data and advanced
analytics. Therefore P&G being the oldest company, still holding a great share in
the market despite having many emerging companies
InClassQuestion#2
• How can we apply Big data Analytics in Education Sector?
• Personalizing Learning: Tailoring educational content and approaches to meet
individual student needs and learning styles.
• Predicting Student Performance: Using data to identify at-risk students and
intervene early to improve outcomes.
• Enhancing Curriculum Development: Analyzing data on student engagement
and success to refine and improve curriculum content.
• Optimizing Resource Allocation: Efficiently distributing resources like faculty,
funding, and facilities based on data insights.
• Improving Administrative Efficiency: Streamlining operations and decision-
making processes using data-driven insights.
Reference : https://www.bigdataframework.org/short-history-of-big-data/
Evolution of Technology

Reference : https://www.youtube.com/watch?v=zez2Tv-bcXY
Internet of Things

Reference : https://www.edureka.co/blog/big-data-tutorial
Conclusion

• With the different technologies it holds, Big Data assists

almost every company or sector that aims to grow.

• Evaluating large datasets that are associated with the

proceedings of the company can give them the vision to
increase their customer satisfaction.

AIOps Fundamentals Level 1 Quiz - Attempt Review - Correct Answers
100% (1)
AIOps Fundamentals Level 1 Quiz - Attempt Review - Correct Answers
13 pages
Big Data Black Book
16% (25)
Big Data Black Book
2 pages
cp5293 Big Data Analytics Question Bank
0% (1)
cp5293 Big Data Analytics Question Bank
13 pages
Big Data Black Book PDF
15% (20)
Big Data Black Book PDF
2 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Product Owner Sample Resume
No ratings yet
Product Owner Sample Resume
5 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Syllabus
No ratings yet
Syllabus
3 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Big Data Analytics-Digital Notes
No ratings yet
Big Data Analytics-Digital Notes
86 pages
Data Science and Big Data Analytics_ Unit_1
No ratings yet
Data Science and Big Data Analytics_ Unit_1
47 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
COMP9313: Big Data Management
No ratings yet
COMP9313: Big Data Management
79 pages
CS8091-Big-Data-Analytics
No ratings yet
CS8091-Big-Data-Analytics
28 pages
Essentials of Big Data Griet
No ratings yet
Essentials of Big Data Griet
2 pages
BIG Data Syllabus
No ratings yet
BIG Data Syllabus
2 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
CS8091 Syllabus
No ratings yet
CS8091 Syllabus
2 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
Big Data - 2 Marks-1
No ratings yet
Big Data - 2 Marks-1
1 page
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
2 pages
326E5E
No ratings yet
326E5E
2 pages
Syllabus of Course Big Data Integration
No ratings yet
Syllabus of Course Big Data Integration
9 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
1 page
Cp5293 Big Data Analytics Question Bank
0% (1)
Cp5293 Big Data Analytics Question Bank
13 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
BDA U1
No ratings yet
BDA U1
80 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
Big Data Analytics
No ratings yet
Big Data Analytics
31 pages
4.7.1 BDA-MBA
No ratings yet
4.7.1 BDA-MBA
2 pages
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
2 pages
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
0% (2)
391 - CS8091 Big Data Analytics - Anna University 2017 Regulation Syllabus
2 pages
CSE704 Data Analytics Syllabus Theory
No ratings yet
CSE704 Data Analytics Syllabus Theory
2 pages
Introduction Big Data With Hadoop
No ratings yet
Introduction Big Data With Hadoop
3 pages
20ai402 Data Analytics Unit-1
No ratings yet
20ai402 Data Analytics Unit-1
52 pages
Big Data Analytics
No ratings yet
Big Data Analytics
3 pages
Techknowledge Publication: Big Data Analytics
No ratings yet
Techknowledge Publication: Big Data Analytics
156 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
BDA Syllabus
No ratings yet
BDA Syllabus
4 pages
Bda Aids Syllabus
No ratings yet
Bda Aids Syllabus
3 pages
Se7204 Big Data Analytics L T P C
No ratings yet
Se7204 Big Data Analytics L T P C
2 pages
113 Ce 74
No ratings yet
113 Ce 74
4 pages
PCAC2009
No ratings yet
PCAC2009
3 pages
IOT Analytics - AI361
No ratings yet
IOT Analytics - AI361
3 pages
Big Data Analytics
No ratings yet
Big Data Analytics
2 pages
Information Technology Engineering Syllabus Sem Viii Mumbai University
No ratings yet
Information Technology Engineering Syllabus Sem Viii Mumbai University
60 pages
CourseCurriculum (8)-1
No ratings yet
CourseCurriculum (8)-1
3 pages
L8 Big Data Management en
No ratings yet
L8 Big Data Management en
58 pages
BDA_DIGITAL NOTES
No ratings yet
BDA_DIGITAL NOTES
85 pages
CIT 4401Big Data Analytics Course Outline
No ratings yet
CIT 4401Big Data Analytics Course Outline
5 pages
HICET - Department of Computer Science and Engineering
No ratings yet
HICET - Department of Computer Science and Engineering
1 page
Big Data Syllabus
No ratings yet
Big Data Syllabus
2 pages
22IS61 Big data analytics 2025
No ratings yet
22IS61 Big data analytics 2025
4 pages
BCA-BIGDATA-FIFTH_SEM-APPROVED-SYLLABUS
No ratings yet
BCA-BIGDATA-FIFTH_SEM-APPROVED-SYLLABUS
23 pages
Data Bots Training Courses
No ratings yet
Data Bots Training Courses
36 pages
Bda Sem 7 Book
No ratings yet
Bda Sem 7 Book
188 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Designing deep learning systems: Software engineering, #1
From Everand
Designing deep learning systems: Software engineering, #1
rayaan
No ratings yet
Talamayan_2024_-Beyond-algorithms_-Exploring-the-power-and-sociopolitical-impact-of-UI-and-UX-design-1
No ratings yet
Talamayan_2024_-Beyond-algorithms_-Exploring-the-power-and-sociopolitical-impact-of-UI-and-UX-design-1
8 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Application of AI Technology in Interior Design
No ratings yet
Application of AI Technology in Interior Design
6 pages
Automotive Trend Report StartUs-Insights
No ratings yet
Automotive Trend Report StartUs-Insights
22 pages
Brave New Digital World Manifesto
No ratings yet
Brave New Digital World Manifesto
36 pages
Big Data Implementation
No ratings yet
Big Data Implementation
78 pages
When Industry Meets Trustworthy AI: A Systematic Review of AI For Industry 5.0
No ratings yet
When Industry Meets Trustworthy AI: A Systematic Review of AI For Industry 5.0
34 pages
r18 - Big Data Analytics - Cse (DS)
0% (1)
r18 - Big Data Analytics - Cse (DS)
1 page
Compare. Compete. Win: An E-Commerce Competitive Intelligence Solution
No ratings yet
Compare. Compete. Win: An E-Commerce Competitive Intelligence Solution
15 pages
Big Data Streams Analytics: Challenges, Analysis, and Applications
No ratings yet
Big Data Streams Analytics: Challenges, Analysis, and Applications
55 pages
Sai Harish Addanki - Lead Data EngineerAWS Certified Professional
No ratings yet
Sai Harish Addanki - Lead Data EngineerAWS Certified Professional
2 pages
The Future of Employment - How AI Jobs Will Reshape The Workforce and Their Solutions
No ratings yet
The Future of Employment - How AI Jobs Will Reshape The Workforce and Their Solutions
3 pages
M.Tech.: Data Science & Engineering
No ratings yet
M.Tech.: Data Science & Engineering
17 pages
Lesson 1 Introduction To Data Science
No ratings yet
Lesson 1 Introduction To Data Science
43 pages
Big Data Analytic Using Cloud Computing
No ratings yet
Big Data Analytic Using Cloud Computing
6 pages
KDS601-BIG-DATA-AND-ANALYTICS_copy
No ratings yet
KDS601-BIG-DATA-AND-ANALYTICS_copy
1 page
Buy Ebook Clinical Analytics and Data Management For The DNP Martha L Sylvia Cheap Price
No ratings yet
Buy Ebook Clinical Analytics and Data Management For The DNP Martha L Sylvia Cheap Price
49 pages
Xiao et al. (2024)
No ratings yet
Xiao et al. (2024)
27 pages
The New Hero of Big Data and Analytics The Chief Data Office
No ratings yet
The New Hero of Big Data and Analytics The Chief Data Office
20 pages
Deloitte Uk Automotive Analytics
No ratings yet
Deloitte Uk Automotive Analytics
16 pages
Certified Data Science Specialist
No ratings yet
Certified Data Science Specialist
6 pages
Bis470s - June 2023 FT & PT QP-1
No ratings yet
Bis470s - June 2023 FT & PT QP-1
4 pages
Adedoyin Ahmed Hussain Ouns Bouachir Fadi Al-Turjman Moayad Aloqaily
No ratings yet
Adedoyin Ahmed Hussain Ouns Bouachir Fadi Al-Turjman Moayad Aloqaily
21 pages
Big Data Taxonomy PDF
No ratings yet
Big Data Taxonomy PDF
33 pages
U.V. Patel College of Engineering Department of Computer Engineering and Information Technology Subject: Big Data Analytics (2IT709) LAB-1 Task 1
No ratings yet
U.V. Patel College of Engineering Department of Computer Engineering and Information Technology Subject: Big Data Analytics (2IT709) LAB-1 Task 1
5 pages
Mirketa Job Description
No ratings yet
Mirketa Job Description
2 pages
Actual4Test: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
No ratings yet
Actual4Test: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
6 pages
Get Managing and Using Information Systems: A Strategic Approach Free All Chapters
100% (5)
Get Managing and Using Information Systems: A Strategic Approach Free All Chapters
53 pages

1. Introduction of Subject

Uploaded by

1. Introduction of Subject

Uploaded by

Introduction

2CS702 Big Data Analytics

Component Weightage 0.4 0.4 0.2

• What is the need to learn this subject?

1. Inventory Management: 2. Customer Insights:

3. Supply Chain Optimization:

1. Dynamic Pricing: 2. Route Optimization:

1. Content Recommendation: Providing personalized content recommendations to users.

1. Content Recommendation: 2. Content Creation:

• With the different technologies it holds, Big Data assists

• Evaluating large datasets that are associated with the

You might also like