0% found this document useful (0 votes)
39 views

LM #01-Introduction To ML

Uploaded by

amisskpop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

LM #01-Introduction To ML

Uploaded by

amisskpop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Machine Learning:

Introduction

Dr. Aadam S.O. Olatunji


Associate Professor of Computer
Science, IAU, Dammam, KSA.

With some contents selectively adapted from:


ETHEM ALPAYDIN (INTRODUCTION
TO
Machine
Learning)
© The MIT Press, 2014
11/17/2024 1
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
A Few Quotes
• “A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
• “Machine learning is the next Internet”
(Tony Tether, Director, DARPA)
• Machine learning is the hot new thing”
(John Hennessy, President, Stanford)
• “Web rankings today are mostly a matter of machine
learning” (Prabhakar Raghavan, Dir. Research, Yahoo)
• “Machine learning is going to result in a real revolution”
(Greg Papadopoulos, CTO, Sun)
• “Machine learning is today’s discontinuity”
(Jerry Yang, CEO, Yahoo)
11/17/2024 3
Preamble:
Machine Learning is what?
• Automating automation
• Getting computers to program themselves
• Writing software is the bottleneck
• Let the data do the work instead!

11/17/2024 4
Preamble: ..contd.
Traditional Programming
Data
Computer Output
Program

Machine Learning
Data
Computer Program
Output
11/17/2024 5
Preamble:
Magic?
No, more like gardening

• Seeds = Algorithms
• Nutrients = Data
• Gardener = You
• Plants = Programs

11/17/2024 6
What is Machine Learning?
• “The goal of machine learning is to build
computer systems that can adapt and learn
from their experience.”
– Tom Dietterich

11/17/2024 7
What is Machine Learning?.........Contd…

• Adapt to / learn from data


– To optimize a performance function

Can be used to:


– Extract knowledge from data
– Learn tasks that are difficult to formalise
– Create software that improves over time

11/17/2024 8
What is Machine Learning?.........Contd…
• Machine Learning
– Study of algorithms that
– improve their performance
– at some task
– with experience
• Optimize a performance criterion using example data or past
experience.
• Role of Statistics: Inference from a sample
• Role of Computer science: Efficient algorithms to
– Solve the optimization problem
– Representing and evaluating the model for inference

11/17/2024 9
Another Definition of Machine
Learning
• Machine Learning algorithms discover the
relationships between the variables of a system
(input, output and hidden) from direct samples of
the system

• These algorithms originate from many fields:


– Statistics, mathematics, theoretical computer science,
physics, neuroscience, etc

11/17/2024 10
A Generic System
x1 y1
x2 y2
System


xN h1 , h2 ,..., hK
yM

Input Variables: x  x1 , x2 ,..., xN 


Hidden Variables: h h1 , h2 ,..., hK 
Output Variables: y  y1 , y2 ,..., yK 
11/17/2024 11
Big Data – A unique justification
for ML
• Widespread use of personal computers and
wireless communication leads to “big data”
• We are both producers and consumers of data
• Data is not random, it has structure, e.g., customer
behavior
• We need “big theory” to extract that structure from
data for
(a) Understanding the process
(b) Making predictions for the future
12
Why “Learn” ?
• Machine learning is programming computers to
optimize a performance criterion using example data
or past experience.
• There is no need to “learn” to calculate payroll
• Learning is used when:
– Human expertise does not exist (navigating on Mars),
– Humans are unable to explain their expertise (speech
recognition)
– Solution changes in time (routing on a computer network)
– Solution needs to be adapted to particular cases (user
biometrics)
13
What We Talk About When We Talk
About “Learning”
• Learning general models from a data of particular
examples
• Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
• Example in retail: Customer transactions to consumer
behavior:
People who bought “Blink” also bought “Outliers”
(www.amazon.com)
• Build a model that is a good and useful
approximation to the data.
14
Why “Machine Learning”? Or Why “Learn”?
• Machine learning is programming computers to optimize a
performance criterion using example data or past
experience.
• There is no need to “learn” to calculate payroll
• Learning is used when:
– Human expertise does not exist (navigating on Mars),
– Humans are unable to explain their expertise (speech
recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases (user
biometrics, user medical vital signs taken, etc)
11/17/2024 15
Why Machine Learning?....Contd……

• Economically efficient
• Can consider larger data spaces and hypothesis spaces than
people can
• Can formalize learning problem to explicitly
identify/describe goals and criteria

11/17/2024 16
Facts Encouraging ML…
• Learning general models from data of particular example
• Data is cheap and abundant (data warehouses, data marts);
Knowledge is expensive and scarce
• Example in retail: Customer transactions to consumer
behavior: (Recommender system)
customers who bought “Advances in Knowledge Discovery
and Data Mining”, also bought “Data Mining: Practical Machine
Learning Tools and Techniques with Java Implementations”
(www.amazon.com)
• Build a model that is a good and useful approximation to
the data

11/17/2024 17
Successful Machine Learning
Applications
• Speech recognition
– Telephone menu navigation
• Computer vision
– Mail sorting
• Bio-surveillance
– Identifying disease outbreaks
• Robot control
– Autonomous driving
• Empirical science
• Information extraction
• Social networks
• Debugging
• [Your favorite areas]…
18
Applications ..Contd….
• Speech and hand-writing recognition
• Autonomous robot control
• Data mining and bioinformatics: motifs, alignment, …
• Playing games
• Fault detection
• Clinical diagnosis
• Spam email detection
• Retail: Market basket analysis, Customer relationship
management (CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Medicine: Medical diagnosis
• Telecommunications: Quality of service optimization
• Web mining: Search engines
• etc………
Applications are diverse but methods are generic 19
When are ML algorithms NOT
needed?
• When the relationships between all system
variables (input, output, and hidden) is
completely understood!

• This is NOT the case for almost any real


system!

11/17/2024 20
Relevant disciplines
• Algorithms • Linear algebra
• Artificial intelligence • Etc, etc …..
• Control
• Statistics Researchers in machine
• Information theory learning come from
• Dynamical systems a variety of backgrounds.
• Neurobiology
• Signal processing

11/17/2024 21
22
+ Machine learning extract features manually and with simple
data set (text)
Relevant emerging Disciplines

Deep learning extract features from videos or images

November 17, 2024


Machine Learning Paradigms/or The Sub-
Fields of ML/Application Areas of ML
Basket analysis:
• P (Y | X ) probability that somebody who buys X also buys Y
Association where X and Y are products/services.
• Example: P ( chips | biscuit ) = 0.7

Classification
Supervised Learning Regression

Unsupervised Learning Clustering/dimensionality reduction

Co-training (mix small labeled data with Large unlabeled data)


Semi-supervised Learning Active learning (Interactive supervised learning)

Learning a policy: A sequence of outputs


• Policy could be: positive, negative, punishment, and extinction.
Reinforcement Learning • (Learns from mistakes-Algorithm learns to react to
environment)

11/17/2024 23
24
+ Tools & Modules

 Implementation:

For References and


bibliography

November 17, 2024


25
+ Task 1: Software Installation

 Install Mendeley reference manager Deadline: 27th Jan,


2020
 Brain storming among team members towards selecting preferred
tools
 Here you discuss several options and choose your preference in the choice of
software to be used
 However, weka (that is recommended for beginners) is to be installed by all for easier
and faster application runs
 (At least complete Mendeley & Weka by the deadline please!)

November 17, 2024


26
+ Task 2: Project Selection Document

 Suggested Title of your proposed Project.


 Introduction
Deadline: 30th Jan,
2020
 Brain storming
 Here you list at least three topics or datasets and give brainstorming outcomes that led to selection of
only one topic out of these finally.

 The selected topic (chosen option after discussing at least 3 alternatives during
brainstorming stage)
 Description of the topic and the benefits or advantages (what, why, how, and possible expected
outcomes)
 Further Justification for this topic based on brief literature review done (is it a worthwhile topic based
on brief literature search done, or may be someone already did it but you wish to improve upon it – so
what improvement are you proposing? Or why the need to still do such project?).

 References
 Please consult your Professor appropriately as you try to complete this task please!

November 17, 2024


27
+ Popular Dataset
 UCI Repository: (recommended in this course)
http://www.ics.uci.edu/~mlearn/MLRepository.html
 UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html
 Statlib: http://lib.stat.cmu.edu/
 Delve: http://www.cs.utoronto.ca/~delve/
 Kaggle: https://www.kaggle.com/datasets
 Open ML: https://www.openml.org/
 AI-Wiki: https://skymind.ai/wiki/open-datasets

November 17, 2024


28
+ Dataset
 mldata.org — A public repository for machine learning data

 Wikipedia Database — Webpage for access to complete Wikipedia database


dumps

 IMDb Datasets — Webpage for access to IMDb datasets

 Last.fm Datasets — Webpage for access to Last.fm datasets

 Census.gov — US government source of data about the nation's people and


economy

 Data.gov — Source of machine-readable datasets generated by the US


government

 UK's Office for National Statistics — Source of datasets generated by the UK's
Office for National Statistics

 UK's Met Office Data — Climate station records from the UK's National
Weather Service

November 17, 2024


29
+ Dataset
 CDC Data — Medical data from the Centers for Disease Control and
Prevention
 World Bank Catalog — World Bank data
 RealClimate Data — Aggregator for selected sources of code and data related
to climate science
 Google Public Data Explorer — Google's public data portal to explore,
visualize, and communicate large datasets
 Dataverse Network — Repository for research datasets
 Linked Data — Linkage site for distributed data
 Datamob — Aggregator for public datasets
 Quandl — Search engine for financial, economic, and social datasets
 Data Market — Portal for shared business data

November 17, 2024


30
+ Dataset
 CKAN — Open-source data portal platform
 Hilary Mason (bitly) Data Links — Hilary Mason's bookmarked research-
quality datasets
 Peter Skomoroch (LinkedIn) Data Links — Peter Skomoroch's bookmarked
machine learning data resources
 Jake Hofman Data Links — Jake Hofman's bookmarked computational social
science data resources
 Reddit Open Data — Forum on the social news site reddit for open APIs and
datasets
 Guardian DataBlog — Data journalism and data visualization from the
Guardian
 Free SVG Maps — Website for free geographic maps
 StateMaster — Reference site for data on US states
 Wolfram|Alpha — Computational knowledge engine or answer engine

November 17, 2024


+ Resources: further Datasets
 UCI Repository:
31
http://www.ics.uci.edu/~mlearn/MLRepository.html
 Statlib: http://lib.stat.cmu.edu/
 Delve: http://www.cs.utoronto.ca/~delve/
 And Others – see image below:
32
+ Resource Journal (samples…)
 Journal of Machine Learning Research www.jmlr.org
 http://www.jmlr.org/mloss/
 Applied Soft Computing: http://www.journals.elsevier.com/applied-soft-computing/

 Informatics in Medicine Unlocked


 International Journal of Medical Informatics
 Expert Systems with Applications
 Applied Computing and Informatics
 Machine Learning
 IEEE Transactions on Neural Networks
 IEEE Transactions on Pattern Analysis and Machine Intelligence

November 17, 2024


33
+ Resource Conference (samples…)
 International Conference on Machine Learning (ICML)
 European Conference on Machine Learning (ECML)
 International Conference on Machine Learning and Data Mining (MLD
M)
 Neural Information Processing Systems (NIPS)
 Computational Learning
 International Joint Conference on Artificial Intelligence (IJCAI)
 ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)

November 17, 2024

You might also like