0% found this document useful (0 votes)
12 views

1. Introduction of Subject

The document outlines a course on Big Data Analytics, detailing its objectives, syllabus, and examination scheme. It covers topics such as the evolution of Big Data, analytics techniques, Hadoop, NoSQL databases, and various algorithms for data analysis. Additionally, it includes practical lab sessions and case studies demonstrating the application of Big Data in industries like retail, transportation, and entertainment.

Uploaded by

Pranshav Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1. Introduction of Subject

The document outlines a course on Big Data Analytics, detailing its objectives, syllabus, and examination scheme. It covers topics such as the evolution of Big Data, analytics techniques, Hadoop, NoSQL databases, and various algorithms for data analysis. Additionally, it includes practical lab sessions and case studies demonstrating the application of Big Data in industries like retail, transportation, and entertainment.

Uploaded by

Pranshav Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Introduction

2CS702 Big Data Analytics

Dr Jigna Patel
N-407
[email protected]
9898942993
Course Outcomes
After successful completion of this course, student will be able to
1. outline the significance and challenges of big data
2. model big data using different tools and frameworks
3. apply big data techniques for useful business analytic applications
4. design algorithms for mining the data from large volumes
Syllabus
Unit I
Introduction to Big Data: Evolution of Big Data, Types of Digital Data, Classification of Digital Data, Structured Data, Semi-
Structured Data, Unstructured Data, Definition of Big Data, Challenges of Conventional Systems, Big data platforms and data
storage

Unit II
Big Data Analytics: Importance of Big data analytics, Classification of Analytics, Top Challenges Facing Big Data, Technologies to
meet the Challenges Posed by Big Data, Terminologies Used in Big Data Environment

Unit III
Hadoop: Introducing Hadoop, comparisons of RDBMS and Hadoop, Distributed Computing Challenges, Hadoop Overview,
Business Value of Hadoop, Hadoop Distributed File System, Processing Data with Hadoop, working with Map Reduce,
Hadoop YARN, Hadoop in the Cloud, Applications on Big Hadoop Ecosystem, Fundamentals of Pig, Hive, HBase and ZooKeeper,
Basic concepts of Apache Spark

Unit IV
The Big data technology landscape: CAP Theorem - BASE Concept, NoSQL, Types of No SQL databases, Introduction to
MongoDB, Data Types in MongoDB, CRUD, Apache Cassandra, Features of Cassandra, CRUD

Unit V
Big data analytics Algorithm: Applying Linear Regression, Clustering, Association rule mining, Decision tree on Big Data.
Self-study: Frameworks: Applications on Big Data Using Pig and Hive
References:
1. Michael Berthold, David J. Hand, Intelligent Data Analysis, Springer
2. Tom White, Hadoop: The Definitive Guide, Third Edition, O’reilly Media
3. Chris Eaton, Dirk DeRoos, Tom Deutsch, George Lapis, Paul Zikopoulos, Understanding Big Data: Analytics
for Enterprise Class Hadoop and Streaming Data, McGraw Hill Publishing
4. Anand Rajaraman and Jeffrey David Ullman, Mining of Massive Datasets, Cambridge University Press
5. Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced
Analytics, John Wiley & sons
6. Glenn J. Myatt, Making Sense of Data, John Wiley & Sons
7. Pete Warden, Big Data Glossary, O’Reilly
8. Jiawei Han, Micheline Kamber, Data Mining Concepts and Techniques, Second Edition, Elsevier
9. Da Ruan, Guoquing Chen, Etienne E.Kerre, GeertWets, Intelligent Data Mining, Springer
10. Paul Zikopoulos, Dirk deRoos, Krishnan Parasuraman, Thomas Deutsch, James Giles, David Corrigan,
Harness the Power of Big Data The IBM Big Data Platform, Tata McGraw Hill Publications
11. Michael Minelli, Michele Chambers, Ambiga Dhiraj, Big Data, Big Analytics: Emerging Business Intelligence
and Analytic Trends for Today's Businesses, Wiley Publications
12. Zikopoulos, Paul, Chris Eaton, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming
Data, Tata McGraw Hill Publications
13. Seema Acharya and Subhashini C, Big Data and Analytics, Wiley India
Examination Scheme
CE SEE LPW
Exam Duration Continuous 3.0 Hrs Continuous Evaluation
Evaluation + 2 hrs Semester End
LPW Exam

Component Weightage 0.4 0.4 0.2


Innovative/Special Assignment
• Project Work
• Tasks and its Timeline
• Group Size: Maximum 3 students -5/08/2024
• Identification of dataset – 10/08/2024
• Problem formulation – 20/08/2024
• Proposed Methodology/Approach to solve it – 1/10/2024
• Deployment: your own/private cluster or use cloud services like GCP Dataproc
service on Hadoop Ecosystem, MongoDB Atlast etc. – 18/10/2024
• Submit Report of project work carried out – 25/10/2024
Lab session
Phase-1 – Big Data in different domains
Sr. NO Practical Title Hours CLO
1. Study and explore various applications of big data in different domains. Choose one of 02 Hours 1
it and study in detail, Also write down the report on different types of digital data
generated in selected application. For eg:
• Big Data in Retail
• Big Data in Healthcare
• Big Data in Education
• Big Data in E-commerce
• Big Data in Media and Entertainment
• Big Data in Finance
• Big Data in Travel Industry
• Big Data in Telecom
• Big Data in Automobile
2 Learning limitation of data analytics by applying Machine Learning Techniques on large 02 Hours 3
amount of data. Write a program to read data set from any online website, excel file
and CSV file and to perform ML task like,
a) Linear regression and logistic regression on iris dataset.
b) K-means clustering.
• Students will learn the limitation of platform and algorithm.
Lab session
Phase-2 – Hadoop Ecosystem
Sr. NO Practical Title Hours CLO
3. Install and configure a single-node Hadoop cluster 02 3
hours
4. Perform HDFS Commands for following categories 02 3
Hours
• User Commands
o hdfs dfs – runs filesystem commands on the HDFS
o hdfs fsck – runs a HDFS filesystem checking command
• Administration Commands
o hdfs dfsadmin – runs HDFS administration commands
5. Apply MapReduce algorithms to perform analytics on a single node 02 4
cluster.(Any One)
a) find phrase frequency from given dataset. Hours
b) Search records with matching criteria.
Prepare a report to guide design of mapper and reducer.
6. Analyse impact of different number of mapper and reducer on same 02 4
definition as practical 4. Hours
• Prepare a conclusive report on analysis.
Lab session
Phase-3 – NoSQL and Analytics algorithm
Sr. NO Practical Title Hours CLO
7. Setup MongoDB environment in your system. Import Restaurant 02 2
Dataset and perform CRUD operation. Hours
8. Setup Cassandra environment in your system and apply Create, 02 2
Update, Read and Delete operations. Hours
9. Case study: Use following platforms for solving any big data analytic 02 2
problem of your choice. (1) Amazon web services,(2) Microsoft Azure,
(3)Google App engine Hours

10. Implement any one of the analytic algorithm using Pyspark and MLLib 04 4
for larger datasets in main memory. (Machine Learning application)
• Regression Hours
• K-means Clustering
Association Rule Mining Algorithm
Lab session
Practice Purpose
Sr. NO Practical Title Hours CLO
11* Extend MongoDB functionality for MapReduce on document collection 02 3
Hours
12* Extend Cassandra functionality for Map Reduce on restaurant dataset. 02 3
Hours
InClassQuestion#1

• What is the need to learn this subject?


Big Data popular case study

Reference : https://data-flair.training/blogs/big-data-case-studies/
• The largest retailer in the world and the world’s largest company by revenue,
• with more than 2.1 million employees
• 10,586 stores and clubs in 24 countries
• More than 2 million employees and 20000 stores in 28 countries
• Major Problems are :
1. Inventory Management: Ensuring shelves are stocked with the right products at the right time.
2. Customer Insights: Understanding and predicting customer behavior to improve sales.
3. Supply Chain Optimization: Managing a vast network of suppliers and logistics.

1. Inventory Management: 2. Customer Insights:


Tools: Apache Hadoop, Spark Tools: Data lakes, Tableau
Algorithms: Predictive analytics, machine learning Algorithms: Clustering, recommendation engines
Solution: Real-time monitoring of inventory levels and Solution: Analysis of customer purchase data to identify trends
predictive algorithms help in anticipating demand and and preferences, enabling personalized marketing and
automating restocking processes. optimized product placement.

3. Supply Chain Optimization:


Tools: SAP HANA, IBM Watson
Algorithms: Optimization algorithms, route planning
Solution: Streamlining the supply chain through advanced
analytics to improve delivery times and reduce costs.
• Uber is the first choice for people around the world when they think of moving people and making deliveries.
• It uses the personal data of the user to closely monitor which features of the service are mostly used, to analyse
usage patterns and to determine where the services should be more focused.
• Uber focuses on the supply and demand of the services due to which the prices of the services provided changes.
Therefore one of Uber’s biggest uses of data is surge pricing.
1. Dynamic Pricing: Adjusting prices in real-time based on supply and demand.
2. Route Optimization: Finding the most efficient routes for drivers.
3. Customer Satisfaction: Ensuring a high level of service for riders and drivers.

1. Dynamic Pricing: 2. Route Optimization:


Tools: Apache Kafka, Cassandra Tools: MapReduce, Google Maps API
Algorithms: Real-time analytics, dynamic pricing algorithms Algorithms: Shortest path algorithms, machine learning
Solution: Adjusting prices based on real-time data on rider Solution: Providing drivers with optimal routes using GPS data
demand and driver availability. and traffic patterns to reduce travel time and fuel
consumption.
3. Customer Satisfaction:
Tools: SQL, NoSQL databases
Algorithms: Sentiment analysis, predictive analytics
Solution: Analyzing feedback and ride data to improve service
quality and address issues promptly.
• It is the most loved American entertainment company specializing in online on-demand
streaming video for its customers. Netflix has been determined to be able to predict what
exactly its customers will enjoy watching with Big Data.

1. Content Recommendation: Providing personalized content recommendations to users.


2. Content Creation: Deciding which new shows and movies to produce.
3. Streaming Quality: Ensuring a seamless streaming experience across different devices and networks.

1. Content Recommendation: 2. Content Creation:


Tools: Apache Spark, Hadoop Tools: Python, R
Algorithms: Collaborative filtering, deep learning Algorithms: Predictive analytics, machine learning
Solution: Delivering personalized content suggestions by Solution: Identifying trends and preferences to inform
analyzing viewing habits and preferences. content production decisions.

3. Streaming Quality:
Tools: Amazon Web Services (AWS), Akamai
Algorithms: Adaptive bitrate streaming, predictive analytics
Solution: Optimizing streaming quality by predicting and
managing network congestion.
• A big technical challenge for eBay as a data-intensive business to exploit a system
that can rapidly analyze and act on data as it arrives (streaming data).
• There are many rapidly evolving methods to support streaming data analysis.
• eBay is working with several tools including Apache Spark, Storm, Kafka.
• It allows the company’s data analysts to search for information tags that have
been associated with the data (metadata) and make it consumable to as many
people as possible with the right level of security and permissions (data
governance).
• The company has been at the forefront of using big data solutions and actively
contributes its knowledge back to the open-source community
• It is a 179-year-old company.
• The genius company has recognized the potential of Big Data and put it to use in
business units around the globe.
• P&G has put a strong emphasis on using big data to make better, smarter, real-
time business decisions.
• The Global Business Services organization has developed tools, systems, and
processes to provide managers with direct access to the latest data and advanced
analytics. Therefore P&G being the oldest company, still holding a great share in
the market despite having many emerging companies
InClassQuestion#2
• How can we apply Big data Analytics in Education Sector?
• Personalizing Learning: Tailoring educational content and approaches to meet
individual student needs and learning styles.
• Predicting Student Performance: Using data to identify at-risk students and
intervene early to improve outcomes.
• Enhancing Curriculum Development: Analyzing data on student engagement
and success to refine and improve curriculum content.
• Optimizing Resource Allocation: Efficiently distributing resources like faculty,
funding, and facilities based on data insights.
• Improving Administrative Efficiency: Streamlining operations and decision-
making processes using data-driven insights.
Reference : https://www.bigdataframework.org/short-history-of-big-data/
Evolution of Technology

Reference : https://www.youtube.com/watch?v=zez2Tv-bcXY
Internet of Things

Reference : https://www.edureka.co/blog/big-data-tutorial
Conclusion

• With the different technologies it holds, Big Data assists


almost every company or sector that aims to grow.

• Evaluating large datasets that are associated with the


proceedings of the company can give them the vision to
increase their customer satisfaction.

28

You might also like