0% found this document useful (0 votes)
14 views37 pages

Module 2 - BigData Fundamentals

The document outlines the fundamentals of Big Data, including its platform, technologies, and the distinction between data analysts and data scientists. It emphasizes the importance of an integrated, enterprise-ready Big Data platform that addresses performance, security, and usability. Additionally, it discusses the Big Data technology stack and the various layers involved in managing and analyzing large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views37 pages

Module 2 - BigData Fundamentals

The document outlines the fundamentals of Big Data, including its platform, technologies, and the distinction between data analysts and data scientists. It emphasizes the importance of an integrated, enterprise-ready Big Data platform that addresses performance, security, and usability. Additionally, it discusses the Big Data technology stack and the various layers involved in managing and analyzing large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

BIG DATA FUNDAMENTALS

Presented by: Le Ngoc Thanh


Outline
o Big Data Platform and Technologies
• IBM Big Data Platform
o Digging into Big Data Technology
• Big Data Technology Stack
• Big Data Analytics Platforms and Software
• Big Data Landscape 2018
o Big Data and Data Science
• The Data Process for Big Data
• Data Analyst vs. Data Scientist

©lnthanh
2
Big Data Platform
Comprehensive, enterprise-ready, integrated

©lnthanh 3
Main tasks in Big data

Aggregation Analysis

Manipulation Visualization

©lnthanh
4
Big data are multidisciplinary
o Technologies applied to Big data should include
Massively parallel processing databases

Distributed
databases Data mining
grids

Strong internet
Scalable
storage systems
Distributed filesystems Cloud computing platforms

o These can be drawn from several fields such as Statistics,


Compute science, Applied mathematics, Economics, etc.
©lnthanh
5
IBM Big data platform

o Give a solution which is


designed specifically
with the needs of the
enterprise in the mind.

©lnthanh
6
A Big data platform should offer
o Comprehensive
• Every dimension of the Big data challenge is addressed.
o Enterprise-ready
• Features of performance, security, usability and reliability included.
o Integrated
• Introduction of Big data technologies to enterprise should be
simplified and accelerated
• Integration with information supply chain, including databases, data
warehouses, and business intelligence applications.
o Moreover, a Big Data platform should also offer
• Open-source based, low latency reads/updates, ad-hoc queries,
scalability, extensible, robust fault-tolerant, minimal maintenance.

©lnthanh
7
IBM Big data platform

©lnthanh
8
IBM Big data platform

©lnthanh
9
IBM Big data platform

©lnthanh
10
IBM Big data platform

©lnthanh
11
Components in a Big data
platform

©lnthanh
12
IBM Big data platform

©lnthanh
13
Components in a Big data
platform

©lnthanh
14
Digging into Big Data Technology
Digging deeper, better insights

©lnthanh 15
Big data technology stack

4.

3.
2.
1.
0.
©lnthanh
16
Layer 0: Redundant physical infrastructure

• The physical infrastructure


is the lowest level.
• Hardware, network, etc.

o Your company might already have a data center or


made investments in physical infrastructures.
o Hence, you may want to find a way to utilize existing
assets.

©lnthanh
17
Where most of this began?
o A prioritized list of these principles should include
statements about the following

Flexibility Performance

Cost

Availability
Scalability
©lnthanh
18
It grows bigger..

©lnthanh
19
….then very big

©lnthanh
20
Why redundant?
o Most big data implementations need to be highly available.
o That is, networks, servers, and physical storage must be both
resilient and redundant.
o A system is resilient to failure or changes when sufficient
redundant resources are in place, ready to jump into action.

©lnthanh
21
Layer 1: Security infrastructure

o Security and privacy requirements for big data are similar


to those for conventional data environments.
o They have to be closely aligned to specific business needs.
The data should be available only to those who have a legitimate business need
Data access for examining or interacting with it.

Protection from unauthorized usage or access are offered by


most APIs. Application access
Most challenging, extremely stress the systems’ resources
Data encryption Encrypt only data elements that require this level of security

The inclusion of mobile devices and social networks exponentially


increases both the amount of data and the opportunities for security Threat detection
threats.
©lnthanh
22
Layer 2: Operational databases
o The core of any Big data environment is database
engines holding collections of data elements
relevant to a business.
If any part of the transaction or the underlying system fails, the entire transaction
Atomicity fails.

Only transactions with valid data will be performed. Consistency

Multiple and simultaneous transactions do not interfere with each other.


Isolation All valid transactions will execute until completed and in the order they were
submitted for processing.

After the data from the transaction is written to the database, it stays
there “forever.” Durability
©lnthanh
23
Layer 3: Organizing Data Services and Tools

o Organizing data services are, in reality, an ecosystem


of tools and technologies that can be used to gather
and assemble data in preparation for further
processing.
o Technologies in this layer include the following:
• A distributed file system
• Serialization services
• Coordination services
• Extract, transform, and load (ETL) tools
• Workflow services

©lnthanh
24
Hadoop, MapReduce and Big
Table
o New technologies to store,
access, and analyze huge
amounts of data.

• Proved to be the sparks that led to a new generation of data


management.
• Addressing one of the most fundamental problems: the
capability of processing massive amounts of data efficiently,
cost effectively, and in a timely fashion.
©lnthanh
25
Layer 4: Traditional and advanced analytics

o What does your business now do with all the data in all
its forms to try to make sense of it for the business?
• Managing big data holistically requires many different analysis
approaches, depending on the problem being solved, to help the
business to successfully plan for the future.
• Some analyses will use a traditional data warehouse, while the
others will take advantage of advanced predictive analytics.
o Key techniques: Analytical data warehouses and data
marts, Big data analytics, Reporting and visualization and
Big data applications, etc.

©lnthanh
26
Big data platform and
analytics software
o Features of Big data platform and analytics software

Data ingestion, Data management, ETL and Warehouse,


Hadoop system and Stream Computing

Analytics/Machine learning, Content management, Data


integration and governance

Provide efficiency in workplace


Provide accurate data
Give answer to complex questions
It is secure
Source: https://www.predictiveanalyticstoday.com/bigdata-platforms-
©lnthanh
bigdata-analytics-software/ 27
Big data analytic platform tools
o There are some key Big data analytic platform tools
available for enterprise use

Reference for more:


https://www.predictiveanalyticstoday.com/bigdata-platforms-bigdata-
analytics-software/
©lnthanh
28
Example of Analytics platform for
Real-time Data ingestion, Streaming analytics

29
Source: https://www.xenonstack.com/blog/big-data-engineering/iot-analytics-
©lnthanh

platform-solutions/
Source: http://mattturck.com/bigdata2018/,
©lnthanh updated 15/07/2018 30
Big Data and Data Science

©lnthanh 31
What is Data science?
o Data science is the process of distilling insights from
data to inform decisions.

©lnthanh
32
What is Data science?

o In data science, the size of the


data is less important.
o One can use data of all sizes,
small, medium, and big data that
is related to a business or
scientific case.

©lnthanh
33
Data science process for Big data
o The data science process for Big data could include
the following steps:

©lnthanh
34
Data scientist vs. Data analyst

©lnthanh
35
Data scientist vs. Data analyst

Jobs trends of Data analysts (left) and Data scientists (right)

Source: https://www.edureka.co/blog/difference-between-data-scientist-and-data-analyst/
©lnthanh
36
©lnthanh 37

You might also like