DWDV UNIT-3 PPT
DWDV UNIT-3 PPT
UNIT - III
Syllabus:
3.1. Classification of visualization systems,
3.2. Interaction and visualization techniques misleading,
3.3. Visualization of one, two and multi-dimensional data,
3.4. text and text documents.
3.1 Classification of visualization systems,
• There are several ways to categorize and think about different kinds of
visualizations. Here are four of the most useful.
• The first two are unrelated to the others; the last two are related to each other.
(i) Complexity
• One way to classify a data visualization is by counting how many different data
dimensions it represents.
• By this we mean the number of discrete types of information that are visually
encoded in a diagram.
• For example, a simple line graph may show the price of a companys stock on
different days: thats two data dimensions. If multiple companies are shown (and
therefore compared), there are now three dimensions; if trading volume per day
is added to the graph, there are four.
Complexity(cont…)
• The above figure shows Four data dimensions in this graph. Adding
more points within any of these dimensions’ won’t change the graphs
complexity.
• This count of the number of data dimensions can be described as the
level of complexity of the visualization.
• As visualizations become more complex, they are more challenging to
design well, and can be more difficult to learn from.
• For that reason, visualizations with no more than three or four
dimensions of data are the most common though visualizations with
six, seven, or more dimensions can be found.
• The way to succeed in the face of this challenge is to be intentional
about which property to use for each dimension, and iterate or
change encodings as the design evolves.
• The second challenge for designing more complex visualizations is that
there are relatively few well-known conventions, metaphors, defaults,
and best practices to rely on.
• Because the safety net of convention may not exist, there is more of a
burden on the designer to make good choices that can be easily
understood by the reader.
(ii) Infographics versus Data
Visualization
• You may have heard the terms infographics and data visualization used in different ways, or
interchangeably in different contexts, or even casually by the same person in a single sentence.
• infographic to refer to representations of information perceived as casual, funny, or frivolous,
and visualization to refer to designs perceived to be more serious, rigorous, or academic.
• The truth is, even though the art of representing statistical information visually is hundreds of
years old, the vocabulary of the field is still evolving and settling.
• Among the general public, there is still confusion over what these two terms mean, but within the
information design community, definitions for these terms are solidifying.
• In short: The distinction between infographics and data visualizations (or information
visualizations) is based on both form and origin (see the figure below)
(ii) Infographics versus Data
Visualization
• The above Figure shows The
difference between infographics
and data visualization may be
loosely determined by the
method of generation, the
quantity of data represented,
and the degree of aesthetic
treatment applied.
(ii) Infographics versus Data
Visualization
• Infographics • Data Visualization
• the term infographics is useful for • By contrast, we suggest that the terms data
referring to any visual representation of visualization and information
data that is: visualization (casually, data viz and info viz) are useful
• manually drawn (and therefore a for referring to any visual representation of data that
custom treatment of the information); is:
• specific to the data at hand (and • algorithmically drawn (may have custom touches but
therefore nontrivial to recreate with is largely rendered with the help of computerized
different data); methods);
• aesthetically rich (strong visual content
• easy to regenerate with different data (the same form
meant to draw the eye and hold
interest); and may be repurposed to represent different datasets
with similar dimensions or characteristics);
• relatively data-poor (because each piece
of information must be manually • often aesthetically barren (data is not decorated); and
encoded). • relatively data-rich (large volumes of data are welcome
and viable, in contrast to infographics).
Exploration versus Explanation
• there are two categories of data visualization: exploration and explanation. The two serve different
purposes, and so there are tools and approaches that may be appropriate only for one and not the
other.
• For this reason, it is important to understand the distinction, so that you can be sure you are using
tools and approaches appropriate to the task at hand.
• Exploration
• Exploratory data visualizations are appropriate when you have a whole bunch of data and
youre not sure whats in it.
• When you need to get a sense of whats inside your data set, translating it into a visual medium can
help you quickly identify its features, including interesting curves, lines, trends, or anomalous
outliers.
• Exploration is generally best done at a high level of granularity. There may be a whole lot of noise
in your data, but if you oversimplify or strip out too much information, you could end up missing
something important.
• This type of visualization is typically part of the data analysis phase, and is used to find the story
the data has to tell you.
Explanation
• By contrast, explanatory data visualization is appropriate when you already know what
the data has to say, and you are trying to tell that story to somebody else. It could be
the head of your department, a grant committee, or the general public.
• Whoever your audience is, the story you are trying to tell (or the answer you are trying
to share) is known to you at the outset, and therefore you can design to specifically
accommodate and highlight that story.
• In other words, you’ll need to make certain editorial decisions about which information
stays in, and which is distracting or irrelevant and should come out.
• This is a process of selecting focused data that will support the story you are trying to
tell.
• If exploratory data visualization is part of the data analysis phase, then explanatory
data visualization is part of the presentation phase.
• Such a visualization may stand on its own, or may be part of a larger presentation,
such as a speech, a newspaper article, or a report. In these scenarios, there is some
supporting narrative written or verbal that further explains things.
Informative versus Persuasive versus Visual Art
• there are three main categories of explanatory visualizations based on the
relationships between the three necessary players: the designer, the reader, and the
data.
• it discusses designing visualizations of data with known parameters and stories.
The Designer-Reader-Data Trinity
• It is useful to think of an effective explanatory data visualization as being supported
by a three-legged stool consisting of the designer, the reader, and the data.
• Each of these legs exerts a force, or contributes a separate perspective, that must
be taken into consideration for a visualization to be stable and successful
• Each of the three legs of the stool has a unique relationship to the other two.
• While it is necessary to account for the needs and perspective of all three in each
visualization project, the dominant relationship will ultimately determine which
category of visualization is needed, see Figure below
Informative
Informative
• An informative visualization primarily serves the
relationship between the reader and the data. It
aims for a neutral presentation of the facts in such
a way that will educate the reader (though not
necessarily persuade him).
• Informative visualizations are often associated with
broad data sets, and seek to distill the content into
a manageably consumable form.
• Ideally, they form the bulk of visualizations that the
average person encounters on a day-to-day basis
whether thats at work, in the newspaper, or on a
service-providers website. The Burning Man
Infographic (Figure is an example of informative
Figure: The nature of the visualization depends visualization.)
on which relationship between two of the
three components) is dominant.
Persuasive
• A persuasive visualization primarily serves the relationship between the designer
and the reader.
• It is useful when the designer wishes to change the readers mind about
something.
• It represents a very specific point of view, and advocates a change of opinion or
action on the part of the reader.
• In this category of visualization, the data represented is specifically chosen for the
purpose of supporting the designers point of view, and is presented carefully so
as to convince the reader of same. See also: propaganda.
• A good example of persuasive visualization is the Joint Economic Committee
minoritys rendition of the proposed Democratic health care plan in 2010
Visual Art
• The third category, visual art, primarily serves the relationship between the designer and the
data.
• Visual art is unlike the previous two categories in that it often entails unidirectional encoding of
information, meaning that the reader may not be able to decode the visual presentation to
understand the underlying information.
• Whereas both informative and persuasive visualizations are meant to be easily decodable
bidirectional in their encoding visual art merely translates the data into a visual form.
• The designer may intend only to condense it, translate it into a new medium, or make it beautiful;
she may not intend for the reader to be able to extract anything from it other than enjoyment.
• This category of visualization is sometimes more easily recognized than others. For example, Nora
Ligorano and Marshall Reese designed a project that converts Twitter streams into a woven fiber-
optic tapestry
• . A worthy pursuit in its own right, perhaps, but better clearly labelled as visual art, and not
confused with informative visualization.
3.2 Interaction and visualization
techniques misleading
• Data visualization is a critical aspect of data analysis, as it helps organizations
make sense of large amounts of data and gain insights that are not immediately
obvious. However, data visualization can also be misleading if not done
correctly. Misleading data visualizations can lead to incorrect conclusions,
misinterpretations, and ultimately, poor decision-making.
• Understanding the factors that contribute to misleading data visualizations is
critical for organizations that want to gain meaningful insights from their data and
make informed decisions. By avoiding these examples of misleading data
visualization, organizations can ensure that their data visualizations are accurate,
meaningful, and actionable.
3.2 Interaction and visualization
techniques misleading
• we will explore 5 common examples of misleading data visualization and provide
guidelines for avoiding these pitfalls. Data visualization is the graphical
representation of data in the form of charts, graphs, maps, and other interactive
visual elements. The purpose of data visualization is to help users understand,
analyze, and communicate data insights more effectively.
• By converting raw data into a visual format, data visualization enables users to
identify patterns, trends, and relationships in the data, making it easier to identify
key insights and make informed decisions.
Data Visualization Dashboard
• Bar Plot with Two Axes: Shows two variables where one axis represents
categories and the other represents the numerical values.
• Use Case: Comparing the revenue and profit of companies.
Examples:
• Weight vs. Height: Visualized using a scatter plot to show the relationship
between these two variables.
• Temperature Across Regions: Visualized using a heatmap where regions are on
one axis and time on another.
3. Multi-Dimensional Data Visualization
• Scatter Plot Matrix (Pair Plot): Displays all pairs of variables in a multi-
dimensional dataset as individual scatter plots in a grid.
• Use Case: Visualizing relationships between multiple numerical variables in a dataset.
Examples:
• Iris Dataset (4D): Visualized using a parallel coordinates plot or PCA to reduce
dimensions and create a 2D scatter plot.
• Customer Segmentation: Visualized using t-SNE to reduce high-dimensional
customer features into 2D space for clustering.
• Multi-factor Stock Analysis: Use of a radar chart to show stock performance
based on different factors like price-to-earnings ratio, dividend yield, etc.
3. Multi-Dimensional Data Visualization
One-Dimensional Histograms, Line Charts, Bar Charts Distribution of scores, time-series data, sales by category
Two-Dimensional Scatter Plot, Heatmap, Bubble Chart, Bar Plot with Two Axes Correlation between variables, comparisons across categories
• Text and text documents in data visualization refer to the process of transforming unstructured
textual data into visual formats that help users understand the content, patterns, and
relationships within the text. Since text data is vast and complex, visualizing it can provide
significant insights that are otherwise difficult to derive from plain text. Below are some common
techniques and approaches for text visualization.
1. Word Cloud
• Description: A word cloud (or tag cloud) is a visual representation of word frequency in a text,
where the size of each word indicates its frequency or importance in the document.
• Use Case: Quickly summarizing key themes or topics in a document, such as analyzing customer
reviews, social media posts, or research papers.
• Strengths: Simple to create, gives a quick snapshot of frequently occurring terms.
• Limitations: Doesn't show the relationship between words or the context in which they appear.
2. Word Tree
• Description: A word tree shows a hierarchical view of words, focusing on how specific words (usually the root word) are
followed by other words in a sequence.
• Use Case: Useful for understanding phrases, common word associations, and exploring repeated patterns or themes in
documents.
• Strengths: Displays context around keywords, shows relationships between words.
• Limitations: Limited to analyzing short phrases and small-scale text.
• (Example Image)
3. Document Term Matrix (Heatmap)
• Description: A Document Term Matrix (DTM) is a table where each row corresponds to a document, and
each column corresponds to a term, showing the frequency of each term in each document. Visualizing this
matrix as a heatmap highlights word usage across multiple documents.
• Use Case: Analyzing the frequency and distribution of specific terms across a large set of documents, such as
comparing themes across different research papers or news articles.
• Strengths: Highlights frequent terms and compares term occurrence across multiple documents.
• Limitations: Does not account for semantics, limited by the number of terms and documents.
4. Topic Modeling (LDA) Visualization
• Description: Latent Dirichlet Allocation (LDA) is a popular topic modeling algorithm that identifies topics in a
set of documents. LDA visualizations often present these topics as clusters or show how different topics are
related.
• Use Case: Analyzing large collections of text to discover underlying themes or topics without manually
reading all the content, such as discovering topics in a large collection of news articles or product reviews.
• Strengths: Shows hidden structure and thematic relationships in large sets of unstructured text.
• Limitations: Requires tuning, may not work well with small datasets.
•
5. Sentiment Analysis Visualization
• Description: Sentiment analysis visualizes the emotional tone of text data by assigning sentiment
scores (e.g., positive, negative, or neutral) to documents, sentences, or phrases. The results are
often visualized through line charts (over time), pie charts (distribution), or bar charts.
• Use Case: Tracking customer sentiment in social media posts, reviews, or survey responses.
• Strengths: Helps gauge overall mood or opinion from a large collection of text.
• Limitations: Sentiment detection can be inaccurate due to sarcasm, ambiguity, or language
nuances.
•
6. Network Diagrams (Text Relationship Networks)
• Description: Network diagrams visualize relationships between words or topics by treating them
as nodes connected by edges, which represent associations or co-occurrences in text.
• Use Case: Mapping relationships between entities in a document, such as exploring character
interactions in literature or tracking frequently mentioned products or terms in news articles.
• Strengths: Highlights connections and dependencies between terms.
• Limitations: May become overly complex for large datasets.
7. Text Summarization Visualization
• Description: Automatic text summarization tools extract the most important sentences or phrases from a
document, which can be visually presented to highlight key points, either in a condensed list form or as a
visual timeline of document events.
• Use Case: Summarizing long reports, news articles, or academic papers to quickly understand the most
critical points.
• Strengths: Reduces the need to read large amounts of text.
• Limitations: May miss nuances or important details.
•
8. Timeline Visualization (for Documents)
• Description: Timelines can be used to track and visualize key events, discussions, or changes in sentiment
over time in text data, such as social media posts, news reports, or journal entries.
• Use Case: Monitoring the progression of a specific topic or issue over time, such as the unfolding of a
political debate or a brand’s reputation.
• Strengths: Shows temporal patterns in data, such as trends and shifts in tone or frequency.
• Limitations: Limited to datasets with clear time markers.
•
9. N-gram Analysis
• Description: N-gram visualizations display sequences of "n" words that occur together in text,
typically shown in charts or graphs that highlight frequent word combinations.
• Use Case: Analyzing common phrases or word combinations in text documents (e.g., common
product features in reviews, frequent phrases in customer complaints).
• Strengths: Reveals patterns in word usage that can indicate key themes or topics.
• Limitations: Works best for shorter text fragments or corpora.
• (Example Image)
•
10. Hierarchical Document Visualization (Tree-based)
• Description: Hierarchical text visualizations use tree structures to represent the structure of a document or
collection of documents. For instance, large text collections (e.g., books, reports) can be visualized as
hierarchical trees, where nodes represent chapters, sections, or topics.
• Use Case: Visualizing the structure of a long document (e.g., books, legal documents) to understand its
organization or topic hierarchy.
• Strengths: Useful for visualizing and navi
• gating large, complex documents.
• Limitations: Can be difficult to understand with very large datasets or poorly structured documents.
•
10. Hierarchical Document Visualization (Tree-based)(cont…)