Module 2 - BigData Fundamentals
Module 2 - BigData Fundamentals
©lnthanh
2
Big Data Platform
Comprehensive, enterprise-ready, integrated
©lnthanh 3
Main tasks in Big data
Aggregation Analysis
Manipulation Visualization
©lnthanh
4
Big data are multidisciplinary
o Technologies applied to Big data should include
Massively parallel processing databases
Distributed
databases Data mining
grids
Strong internet
Scalable
storage systems
Distributed filesystems Cloud computing platforms
©lnthanh
6
A Big data platform should offer
o Comprehensive
• Every dimension of the Big data challenge is addressed.
o Enterprise-ready
• Features of performance, security, usability and reliability included.
o Integrated
• Introduction of Big data technologies to enterprise should be
simplified and accelerated
• Integration with information supply chain, including databases, data
warehouses, and business intelligence applications.
o Moreover, a Big Data platform should also offer
• Open-source based, low latency reads/updates, ad-hoc queries,
scalability, extensible, robust fault-tolerant, minimal maintenance.
©lnthanh
7
IBM Big data platform
©lnthanh
8
IBM Big data platform
©lnthanh
9
IBM Big data platform
©lnthanh
10
IBM Big data platform
©lnthanh
11
Components in a Big data
platform
©lnthanh
12
IBM Big data platform
©lnthanh
13
Components in a Big data
platform
©lnthanh
14
Digging into Big Data Technology
Digging deeper, better insights
©lnthanh 15
Big data technology stack
4.
3.
2.
1.
0.
©lnthanh
16
Layer 0: Redundant physical infrastructure
©lnthanh
17
Where most of this began?
o A prioritized list of these principles should include
statements about the following
Flexibility Performance
Cost
Availability
Scalability
©lnthanh
18
It grows bigger..
©lnthanh
19
….then very big
©lnthanh
20
Why redundant?
o Most big data implementations need to be highly available.
o That is, networks, servers, and physical storage must be both
resilient and redundant.
o A system is resilient to failure or changes when sufficient
redundant resources are in place, ready to jump into action.
©lnthanh
21
Layer 1: Security infrastructure
After the data from the transaction is written to the database, it stays
there “forever.” Durability
©lnthanh
23
Layer 3: Organizing Data Services and Tools
©lnthanh
24
Hadoop, MapReduce and Big
Table
o New technologies to store,
access, and analyze huge
amounts of data.
o What does your business now do with all the data in all
its forms to try to make sense of it for the business?
• Managing big data holistically requires many different analysis
approaches, depending on the problem being solved, to help the
business to successfully plan for the future.
• Some analyses will use a traditional data warehouse, while the
others will take advantage of advanced predictive analytics.
o Key techniques: Analytical data warehouses and data
marts, Big data analytics, Reporting and visualization and
Big data applications, etc.
©lnthanh
26
Big data platform and
analytics software
o Features of Big data platform and analytics software
29
Source: https://www.xenonstack.com/blog/big-data-engineering/iot-analytics-
©lnthanh
platform-solutions/
Source: http://mattturck.com/bigdata2018/,
©lnthanh updated 15/07/2018 30
Big Data and Data Science
…
©lnthanh 31
What is Data science?
o Data science is the process of distilling insights from
data to inform decisions.
©lnthanh
32
What is Data science?
©lnthanh
33
Data science process for Big data
o The data science process for Big data could include
the following steps:
©lnthanh
34
Data scientist vs. Data analyst
©lnthanh
35
Data scientist vs. Data analyst
Source: https://www.edureka.co/blog/difference-between-data-scientist-and-data-analyst/
©lnthanh
36
©lnthanh 37