Unit-6: Data Visualization and Hadoop
Unit-6: Data Visualization and Hadoop
● Introduction to Da t a Visualization
● Types of d a t a visualization
● Da t a Visualization Techniques
Representation of
Data Visualization
is
Graphical
Information
D ata
Data Visualization
● Data visualization is a graphical representation of any d a t a o r information.
● Visual elements such as charts, graphs, a n d maps a r e the few d a t a visualization
tools that provide the viewers with a n easy a n d accessible way of understanding the
represented information.
● Da ta visualization enables you o r decision-makers of any en terp rise o r industry
to look into analytical r e p o r t s a n d understan d concepts that might otherwise be
difficult to grasp.
Introduction
Data Visualization
Tools
Provide
Accessible way
To
Visual Elements
Charts Graphs
Maps
Introduction
Introduction
Its Need
Data
is
When
More
Visualized
Valuable
It is
Its Need
Charts Graphs
Make
Communicating
Data
Findings
Easier
Without
Its Need
True Meaning
the
Audience to Grasp of
For the
Visual representation
Hard
insights Findings
of
It can be
Outline
Information
Large Image
Loss
Perception
Visual noise
Most of the objects in dataset are too relative to each other. Users cannot
divide them as separate objects on the screen.
Information loss
Reduction of visible data sets can be used, but leads to information loss
Users observe data and cannot react to the number of data change or its intensity on display.
Solution
Meeting Speed
Understanding Data
Displaying Meaningful
Results
1. Table
2.Histogram
3.Scatter Plot
Types of Data
Visualization 4. Various Charts
5.Timeline
6.Various Diagram
Types of data visualization
Table
Types of Data Visualization
Histogram
Scatter plot
Scatter plot
Conventional data visualization methods
Scatter plot
Conventional data visualization methods
Scatter plot
Conventional data visualization methods
Scatter plot
Conventional data visualization methods
● Dot symbol to
represent a feature on
the map
Conventional data visualization tools
Pie Chart
3) hierarchical
Tree Diagram
Timeline
Timeline
Conventional data visualization methods
Timeline
Conventional data visualization methods
Timeline
Conventional data visualization methods
Venn Diagram
Conventional data visualization methods
Parallel Coordinates
Treemap
Semantic Network
1. Data Visualization
2.Information Visualization
3.Concept Visualization
Visualization Technique/
Methods 4. Strategic Visualization
5.Metaphor Visualization
6.Compound Visualization
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.259.4640&rep=rep1&type=pdf
Big data visualization Challenge
Information visualization
Concept visualizations
These methods used to elaborate ideas, plan, concepts, and analyze it easily,
e.g. Mindmap, Layer chart, Concentric circle, Decision tree, Pert chart etc.
Strategic visualization
Metaphor visualization
Compound visualization
Dundas BI
Infogram
Power BI
Google Charts
FineReport WhataGraph
Grafana Tableau
What is Tableau
Tableau
Founded
is
Interactive Data Focus on
Visualization Products
in
produce
Software Company Business Products
Tableau
Tableau Online
• Tableau Online is a secure, cloud-based solution used for sharing, distributing, and
collaborating on Tableau views and Tableau dashboards.
• Tableau online sets the flexibility and ease of a powerful cloud-based data visualization
solution to work intended for you—without servers, server software, or IT support.
Tableau Public
• Tableau Public is a free software to facilitate anyone to connect to a spreadsheet or file
and create interactive data visualizations for the web.
• It is delivered as a service that permits the user to be up and running overnight.
• With Tableau Public users can construct amazing interactive visuals and publish them
quickly, without the help of programmers or IT.
• It is designed for organizations to facilitate their websites with interactive data
visualizations. There are higher limits on the size of data you can work with and among
other features, you can keep your underlying data hidden.
What is Tableau
Tableau
Allow
to Spend more
time
Customer on
all
less on data
Data analysis wrangling
What is Google Charts
• Microsoft Power BI is a
business intelligence (BI)
platform that provides
nontechnical business users
with tools for aggregating,
analyzing, visualizing and
sharing data.
• Power BI's user interface is
fairly intuitive for users familiar
with Excel, and its deep
integration with other Microsoft
products makes it a versatile
self-service tool that requires
little upfront training.
Uses of Power BI?
• Though Power BI is a self-service BI tool that brings data analytics to
employees, it's mostly used by data analysts and BI professionals who create
the data models before disseminating reports throughout the organization.
• However, those without an analytical background can still navigate Power BI
and create reports.
Key features of Power BI
Some of the most important features are the following:
• Artificial intelligence. Users can access image recognition and text analytics in Power BI, create machine
learning models using automated ML capabilities and integrate with Azure Machine Learning.
• Hybrid deployment support. This feature provides built-in connectors that allow Power BI tools to
connect with a number of different data sources from Microsoft, Salesforce and other vendors.
• Quick Insights. This feature allows users to create subsets of data and automatically apply analytics to
that information.
• Common data model support. Power BI's support for the common data model allows the use of a
standardized and extensible collection of data schemas (entities, attributes and relationships).
• Cortana integration. This feature, which is especially popular on mobile devices, allows users to
verbally query data using natural language and access results using Cortana, Microsoft's digital assistant.
• Customization. This feature allows developers to change the appearance of default visualization and
reporting tools and import new tools into the platform.
Content Beyond Syllabus- Dashboard
Data Accuracy
1
Dashboard
A good Presentation
Information
in that enables
Visual form
1
*
Dashboard
Challenge
For to
Organization
1
*
Outline
• The Hadoop Ecosystem is a framework and suite of tools that tackle the many
• Although Hadoop has been on the decline for some time, there are organizations
like LinkedIn where it has become a core technology.
• Some of the popular tools that help scale and improve functionality are Pig,
Hive, Oozie, and Spark.
• Spark has developed legs of its own and has become an ecosystem unto itself,
where add-ons like Spark MLlib turn it into a machine learning platform that
supports Hadoop, Kubernetes, and Apache Mesos.
• Most of the tools in the Hadoop Ecosystem revolve around the four core
technologies, which are YARN, HDFS, MapReduce, and Hadoop Common.
• All these components or tools work together to provide services such as
absorption, storage, analysis, maintenance of big data, and much more.
Hadoop ecosystem,Map Reduce, Pig, Hive
• A
Hadoop is an open source framework, from the Apache foundation,
• Capable of processing large amounts of heterogeneous data sets in a
distributed fashion across clusters of commodity computers and hardware
using a simplified programming model.
• Hadoop provides a reliable shared storage and analysis system.
Hadoop ecosystem,Map Reduce, Pig, Hive
• Namenode will store metadata and data nodes will store actual data.
• The client will interact with the Namenode in the cluster to perform the task.
• Data nodes will keep sending a heartbeat to Namenode to indicate that it’s
alive.
https://www.cloudduggu.com/hadoop/hdfs/
Hadoop ecosystem,Map Reduce, Pig, Hive
Worker Node
DataServing
Hadoop ecosystem,Map Reduce, Pig, Hive
Map Reduce
Map Reduce
MapReduce
Consolidates the
Applies an intermediate
operation to a Reduce outputs from the
piece of data Map Step map steps
Step
Map Reduce
Map Reduce
Hadoop ecosystem, Map Reduce, Pig, Hive
Map Reduce
Hadoop ecosystem,Map Reduce, Pig, Hive
`
Driver
Mapper
A typical MapReduce
program in Java consists
of three classes Reducer
Hadoop ecosystem,Map Reduce, Pig, Hive
Hadoop MapReduce
Program Language Option
Hadoop Streaming
Hadoop pipes-
Java API- Require
Knowledge of Python, Uses c++ Code
C, or Ruby
Hadoop ecosystem,Map Reduce, Pig, Hive
● Pig include entering the Pig execution environment by typing pig at the command prompt
and then entering a sequence of Pig instruction lines at the grunt prompt.
● Example :
$ pig grunt> records = LOAD ‘/user/customer.txt’ AS (cust_id:INT,
first_name:CHARARRAY, last_name:CHARARRAY,
email_address:CHARARRAY);
grunt> filtered_records = FILTER records BY email_address matches
‘.*@isp.com’; grunt> STORE filtered_records INTO
‘/user/isp_customers’;
grunt> quit
Hadoop ecosystem,Map Reduce, Pig, Hive
Load/Store Math
Eval
String DateTime
Hadoop ecosystem,Map Reduce, Pig, Hive
● Apache Hive enables users to process data without explicitly writing MapReduce code.
● Hive language, HiveǪL (Hive Ǫuery Language), resembles Structured Ǫuery Language
(SǪL)
● A Hive table structure consists of rows and columns.
● The rows typically correspond to some record, transaction, or particular entity (for
example, customer) detail.
● The values of the corresponding columns represent the various attributes or characteristics
for each row.
● Additionally, a user may consider using Hive if the user has experience with SQL and
the data is already in HDFS.
● Hive is not intended for real-time querying
Hadoop ecosystem,Map Reduce, Pig, Hive
● To load the customer table with the contents of HDFS file, customer.txt
hive> load data inpath ‘/user/customer.txt’ into table customer;
● Exploratory or ad-hoc analysis of HDFS data: Data can be queried, transformed, and
exported to analytical tools, such as R.
● Extracts or data feeds to reporting systems, dashboards, or data repositories such as
HBase:
Hive queries can be scheduled to provide such periodic feeds.
● Combining external structured data to data already residing in HDFS: Hadoop is
excellent for processing unstructured data, but often there is structured data residing in
an RDBMS, such as Oracle or SQL Server, that needs to be joined with the data residing
in HDFS.
● The data from an RDBMS can be periodically added to Hive tables for querying with
existing data in HDFS.
Thank You