INTRODUCTION and M1-CH-1
INTRODUCTION and M1-CH-1
VISUALIZATION
21CS644
Module-1
INTRODUCTION
Data… Data… EveryWhere!
purchases at department/
grocery stores
Bank/Credit Card
transactions
Social Network
DATA….
INFORMATION…
HOW MUCH DATA?
▪ Google processes 20 PB a day (2017)
▪ Wayback Machine has 70 PB + 100 TB/month (12/2020)
▪ Facebook has 4 PB of user data + 500TB/day (1/2021)
▪ eBay has 100 PB of user data + 50 TB/day (4/2014)
▪ CERN’s Large Hadron Collider (LHC) generates 15 PB a year
CERN’S HADRON COLLIDER: LARGEST MACHINE IN THE
WORLD
<rec><name>Prashant Rao</name><gender>Male</gender><age>35</age></rec>
<rec><name>Seema R.</name><gender>Female</gender><age>41</age></rec>
<rec><name>Satish Mane</name><gender>Male</gender><age>29</age></rec>
<rec><name>Subrato Roy</name><gender>Male</gender><age>26</age></rec>
What to do with these data?
Aggregation and Statistics
Data Mining
Statistical Modeling
WHAT IS BIG DATA?
WHAT IS BIG DATA?
It is a set of extremely large data so complex and unorganized that it defies the
common and easy data management methods that were designed and used up until this
rise in data.
Big data sets can’t be processed in traditional database management systems and tools.
Data Analysts
analyze and interpret data, visualize it, and build reports to help make better business decisions.
Data Scientists
mine data by assessing data sources and use algorithms and Machine Learning techniques.
Data Architects
design database systems and tools.
Database Managers
control database system performance, perform troubleshooting, and upgrade hardware and software.
Big Data Engineers
design, maintain, and support Big Data solutions.
DATA SCIENCE
Over the past few years, there’s been a lot of hype in the media about
“data science” and “Big Data.”
Today, Data rules the world. This has resulted in a huge demand for
Data Scientists.
A Data Scientist helps companies with data-driven
decisions, to make their business better.
Data science is a field that deals with unstructured,
structured data, and semi-structured data.
It involves practices like data cleansing, data preparation,
data analysis, and much more.
Data science is the combination of: statistics,
mathematics, programming, and problem-solving;
capturing data in ingenious ways;
What is Datafication?
• Risk forecasting and mitigation: Data analytics can help identify risks and predict future
trends, allowing organizations to take steps in order to mitigate those risks before they occur.
• Innovation: Lastly, Datafication can provide valuable insights for innovation and the
development of new products and services. That is, new products can emerge based on user
behavior and dentification of their real needs.
The Current Landscape
In Academics:
An academic data scientist is a scientist, trained in any of the academic
fields-trained in anything from social science to biology, who works with large amounts
of data, and must grapple with computational problems posed by the
structure, size, messiness, and the complexity and nature of the data,
while simultaneously solving a real world problem.
Define Data Science by Usage-In Industry:
Data scientists often work with a team to complete projects. Typical activities include:
Design, develop, and maintain machine learning and other data models
Select, use, and debug existing data models
Perform statistical and data analyses, often to make decisions about products.
Conduct research to learn more about the field and to improve model accuracy,
including meeting with and interviewing experts
Define Data Science by Usage
In a general sense:
Whose work involves data collection, cleaning, and munging which requires
persistence, statistics, and software engineering skills – Skills for understanding biases
in data, for debugging and logging output from code.
Define Data Science by Usage
In a general sense:
A Data scientist is someone:(contd.)
Who performs Exploratory Data Analysis, --is used by data scientists to analyze and
investigate data sets and summarize their main characteristics, often employing data visualization
methods.
Who finds patterns, build models and algorithms for different purposes.
Who designs experiments as it is a critical part of data driven decision
making.
Who communicates with team members, engineers, and leadership in clear
language an with data visualizations
END OF M1-CH1