0% found this document useful (0 votes)
20 views

Bsd1313 Chapter 1

Uploaded by

r9v54xcfhz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Bsd1313 Chapter 1

Uploaded by

r9v54xcfhz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

INTRODUCTION TO DATA SCIENCE

BSD1313
DR. MOHD KHAIRUL BAZLI BIN MOHD AZIZ
CENTRE FOR MATHEMATICAL SCIENCES, UNIVERSITI MALAYSIA PAHANG
CHAPTER 1: INTRODUCTION TO DATA SCIENCE

 1.1 Definition of Data Science


 1.2 History of Data Science
 1.3 The Importance of Data Science
 1.4 Data Scientist Skills set
WHAT IS DATA
SCIENCE?

 Data science is a comprehensive


study of `big data' or huge
amount of data to obtain
meaningful insights.
WHAT IS BIG DATA?

 The term “big data” refers to data that is so large, fast or complex that
it’s difficult or impossible to process using traditional methods. The act
of accessing and storing large amounts of information for analytics has
been around a long time. But the concept of big data gained momentum
in the early 2000s when industry analyst Doug Laney articulated the
now-mainstream definition of big data as the three V’s:
 Volume
 Velocity
 Variety
Meet Katie Bouman, the woman
behind the first-ever image of a
black hole
She led the development of a computer
program which eventually put all the pieces
together.

“We’re a melting pot of


astronomers, physicists,
mathematicians, and
engineers, and that’s what it
took to achieve something
once thought impossible.”
WHAT IS DATA SCIENCE?

 Data science requires a powerful combination of


various disciplines including mathematics & statistics,
computer science and domain expertise. Hence, it is
an interdisciplinary field.
WHY DATA SCIENCE?
Data Growth

 Due to digital technology


advancement, abundance of data is
generated endlessly every second -
explosion of digital footprint.
 The amount of data is growing at
such an explosive rate that we have
gone past the ordinary decimal
system. Today, American organisations
like NSA and FBI are talking about
Yottabytes regarding the information
they have of citizens. In the (near)
future we will talk about BrontoBytes
regarding sensory data.
 Therefore, new terms have been
defined for the upcoming data flood
that is expected in the next five
years:
WHY DATA SCIENCE?
Data Growth

Data grows from 0.7 zetabyte in 2009 to


35 zetabyte in 2020, 500% increase than
2015. Note that 1 zettabyte = 1021 byte
and 1 yottabyte = 1024 byte.
For example:
 Sensors used in shopping malls
to gather shoppers’ information
 Posts on social media platforms
 Digital pictures and videos
captured in our phones
 Purchase transactions made
through e-commerce
GLOBAL
SOCIAL MEDIA
STATISTICS
WHY DATA SCIENCE?
Job Demand

 Reported in the World Economic Forum (WEF, 2018) by Guthrie Jensen Global Training Consultants, the 10 most
in-demand job in 2020.
WHY DATA SCIENCE?
2022 Job Skills

 According to World Economic


Forum, WEF 2018, new skills
are required to support high-
speed mobile internet, artificial
intelligence, big data analytics
and cloud technology which
will spearhead companies'
adoption of new technologies
between 2018 and 2022.
WHY DATA SCIENCE?
Skills Outlook & Soft Skills

 The World Economic Forum's Future of Jobs report (2019), which


concluded that “human” skills like originality, initiative and critical
thinking are likely to increase in value as technology and automation
advances.
 Strengthening a soft skill is one of the best investments you can make in
your career, as they never go out of style," LinkedIn Learning Editor, Paul
Petrone wrote in a blog. “Plus, the rise of AI is only making soft skills
increasingly important, as they are precisely the type of skills robots
can't automate.”(Linkedln, 2020).
WHY DATA SCIENCE?
Job Demand Trend

 The Harvard Business Review


(2012) has tagged data scientist
as the sexiest job of the 21st
century. It is 'sexy' because of
high demand and has rare
qualities.
WHY DATA SCIENCE?
Job Demand Trend

 Explosive growth in the volume


of data has impacted every
business sector, driving up the
number of data science
professionals required to help
companies uncover the insights
they need to stay competitive
(July 2019).
WHY DATA SCIENCE?
High Salary
WHY DATA SCIENCE?
Industrial Revolution, IR4.0

 German introduces the IR 4.0


concept at the World
Economic Forum (2015). IR 4.0
focuses on interconnectivity
between physical-space and
cyber-space (cyber-physical
system).
 According to The World
Economic Forum, an estimated
65% of kids enrolling in
primary education today will
end up working in jobs that
haven’t been created yet.
WHY DATA SCIENCE?
Made in China 2025

 In 2015, China launches `Made


in China (MIC) 2025'
document. It is a state-led
industrial policy to shift China
from being a low-end
manufacturer to becoming a
high-end producer of goods.
WHY DATA SCIENCE?
Industrial Revolution – Society 5.0
1.1 DEFINITION OF
DATA SCIENCE

 Data science is the science concerned


with the `discovery' of information from
 large data set.
 manipulating the data to find
interesting insights.
 visualizing the data to obtain a
better perspective.
 understanding the data to make
better decision.
 analysing the data to predict future
outcomes
 all about making the data talk to us.
DATA SCIENCE
Discovery Of Data
Insight

 Netflix data mines movie


viewing patterns to understand
what drives viewer interest and
uses that to make decisions on
which Netflix original series to
produce.
DATA SCIENCE
Discovery Of Data
Insight
 Google Spell checking is
developed by suggesting
corrections to misspelled
searches, and observing what
the user clicks in response,
Google made it much more
accurate. Google make a
dictionary of common
misspellings, their corrections,
and the contexts in which they
occur.
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 Gmail's spam filter is data


product an algorithm behind
the scenes processes incoming
mail and determines if a
message is junk or not.
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 During the Swine Flu epidemic


of 2009, Google was able to
track the progress of the
epidemic by following searches
for flu-related topics.
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 Apple and Google announced a


system for tracking the spread
of the new coronavirus,
allowing users to share data
through Bluetooth Low Energy
(BLE) transmissions and
approved apps from health
organizations.
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 Uniqlo identifies what are


major customer segments
within it’s base and the unique
shopping behaviors within
those segments, which helps to
guide messaging to different
market audiences.
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 Proctor & Gamble utilizes time


series models to more clearly
understand future demand,
which help plan for production
levels more optimally.
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 Amazon's recommendation
engines suggest items for you
to buy, determined by their
algorithms
DATA SCIENCE
DISCOVERY OF DATA
INSIGHT

 Computer vision used for self-


driving cars is also data
product – machine learning
algorithms are able to
recognize traffic lights, other
cars on the road, pedestrians,
etc.
1.2 HISTORY OF
DATA SCIENCE

 Although data science isn’t a


new profession, it has evolved
considerably over the last 50
years. A trip into the history of
data science reveals a long and
winding path that began as
early as 1962 when
mathematician John W. Tukey
predicted the effect of
modern-day electronic
computing on data analysis as
an empirical science.
1.2 HISTORY OF
DATA SCIENCE
1962 John W. Tukey writes in “The
Future of Data Analysis”:

“For a long time I thought I


was a statistician, interested in
inferences from the particular
to the general. But as I have
watched mathematical
statistics evolve, I have had
cause to wonder and doubt… I
have come to feel that my
central interest is in Data
Analysis.”
1.2 HISTORY OF
DATA SCIENCE

 Data Science was officially


accepted as a study since the
year 2011; the different or
related names were being used
since 1962.
 In 2008, DJ Patil and Je
Hammerbacher introduced the
term `Data Science’.

Jeff Hammerbacher, Chief Scientist, Cloudera and DJ Patil, Entrepreneur-


in-Residence, Greylock Ventures
Hammerbacher and Patil coined the term "data scientist.” Now it’s Silicon Valley's hottest job title. These
two built the first formal data science teams at Facebook and LinkedIn, respectively. Now at Cloudera,
Hammerbacher has been key to driving the success of Hadoop as a standard tool for processing large,
unstructured data sets with a network of commodity computers. As Data Scientist in Residence at Greylock,
Patil is seeking out the next generation of hot data-driven startups.
1.2 HISTORY
OF DATA
SCIENCE
DATA SCIENCE TASK

 Task 1: Do a research about History of Data Science.


Then, Draw a diagram/picture/mind map to summarise
it. (Individual)

 Submit through Google Classroom by next class.


1.3 THE IMPORTANCE OF DATA SCIENCE

Data science is needed for:


 Better decision: Whether A or B?
 Predictive Analysis: What will happen next?
 Pattern discovery: Is there any hidden information in the
data?
 Reducing the cost
 Development of enabling technology
 Full expectation from customers
 Effective use of resources
1.3 THE IMPORTANCE OF DATA SCIENCE

For an industry, data science is needed for:


 Reducing cost
 Recognising and penetrating new market opportunities
 Identifying new demographics
 Engaging the effectiveness of a marketing campaign
 Increasing its competitive advantage
 Launching a new innovative product or service
1.3 THE IMPORTANCE OF DATA SCIENCE

For an organisation, data science is needed for:


 Empowering management and officers to make better decisions
 Directing actions based on trends-which in turn help to define goals
 Challenging the staff to adopt best practices and focus on issues that
matter
 Identifying opportunities
 Decision making with quantifiable, data-driven evidence
 Testing these decisions
 Identification and refining of target audiences
 Recruiting the right talent for the organization
DATA SCIENCE
GOOGLE

 Google is an expert in big data.


 Many open source tools and
technologies had been developed and
widely used in the big data
ecosystem.
 Many different big data techniques is
used by incorporating millions of
websites and petabytes of data to
provide the right answer within
milliseconds.
 Thus far, Google is the biggest
company on hiring trained Data
Scientist and offers the best salaries.
DATA SCIENCE
AMAZON

 Amazon is a global e-
commerce and cloud
computing.
 It hires data scientists on a
big scale
 The data scientists will
explore customer mindset
and enhance the
geographical reach of both
e-commerce and cloud
domains.
DATA SCIENCE
VISA

 Visa is an online gateway for most companies and


does transactions worth hundreds millions in a single
day.
 Customer-centricity is the new policy and customers
have the opportunity to shop anywhere and anytime.
 Hence, data scientists are in huge demand at Visa to
generate more revenue, check fraudulent transactions
and design products and services to full customer
needs.
1.4 DATA SCIENTIST
SKILLS SET

 According to Roland Van Loon


(2019) from Simplilearn, Data
Scientist needs to have
technical and non-technical
skills.
1.4 DATA SCIENTIST
SKILLS SET
WHO IS DATA SCIENTIST?

 Data Scientist is a person with


multi-skilled in
mathematics/statistics,
computer programming and
has domain expert.
1.4 DATA SCIENTIST SKILLS SET
WHO IS DATA SCIENTIST?

 According to Alison Doyle (2019), Data Scientist is a multi-skilled person with analytical skills, mathematics,
programming, open-mindedness and communication.
Analytical Skills Open Communication Mathematics Programming and
Mindedness Technical
Proficiencies
Big Data Adaptability Assertiveness Statistics Microsoft Excel
Data Analysis Decision Making Collaboration Construct Python/R/C++/Java
Data Analytics Critical Thinking Consulting Algorithms MATLAB
Data Science Logical Thinking Consensus Linear Algebra SQL
Predictive Modelling Problem Solving Facilitating Machine Learning NoSQL
Data Mining Leadership Multivariable Tableau
Data Visualization Professionals Calculus
Verbal/Written
Communications
WHAT DATA SCIENTIST DO?
A Data Scientist's job is to analyze data for actionable insights by
doing following tasks:
 Identifying and asking questions to be solved.
 Devising and applying models and algorithms for mining big data
from structured and unstructured forms.
 Cleaning and validating data to ensure accuracy, completeness
and uniformity.
 Analyzing the data to identify patterns and trends.
 Communicating findings to stakeholders using visualization and
other means.
FUTURE DATA
SCIENTIST IS BORN
IN UMP

Today is the first


step...
THANK YOU

You might also like