0% found this document useful (0 votes)
80 views

Machine Learning & Data Science: Sylvia Unwin Faculty, Program Chair Assistant Dean, iBIT

The document discusses key topics from a Machine Learning and Data Science conference, including: 1) Avoiding common pitfalls like thinking more data alone will solve problems, not clearly defining the problem, and scaling AI technologies too quickly without understanding data or use cases. 2) Understanding which machine learning algorithms like logistic regression, neural networks, and decision trees are best suited for different types of supervised and unsupervised learning problems. 3) The importance of evaluating models, understanding data, and taking action with machine learning and AI rather than just focusing on development.

Uploaded by

Sof Yan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Machine Learning & Data Science: Sylvia Unwin Faculty, Program Chair Assistant Dean, iBIT

The document discusses key topics from a Machine Learning and Data Science conference, including: 1) Avoiding common pitfalls like thinking more data alone will solve problems, not clearly defining the problem, and scaling AI technologies too quickly without understanding data or use cases. 2) Understanding which machine learning algorithms like logistic regression, neural networks, and decision trees are best suited for different types of supervised and unsupervised learning problems. 3) The importance of evaluating models, understanding data, and taking action with machine learning and AI rather than just focusing on development.

Uploaded by

Sof Yan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Machine Learning & Data

Science
Sylvia Unwin
Faculty, Program Chair
Assistant Dean, iBIT
Machine Learning
• Attended TDWI in Oct 2017
• Focus on Machine Learning, Data
Science, Python, AI
• Started with a catchy opening speech
– “BS-Free AI For Business”
– Top 5 BS List
AI
• What’s the BS?
– AI is first
– According to the speaker, doesn’t solve
a necessary real-world problem
– Startups (investments) in scaling AI
– Doesn’t show ROI without promise of
more, perfect and better data
Avoid
• Big data problem will only provide a
small data solution
– Thinking more data will solve the
problem (if perfect data, will work)
• Not defining what is the problem?
– Be specific (reduce waste by 10x)
• Know who owns the data
• Avoid scaling too quickly
Avoid
• No Black boxes
– Requires trust, then must have
transparency
– No technical explanations (too many
acronyms), no invented scores
• Inaction
– “nothing will happen, if no action is
taken”
Why AI
• Be aware of your focus
• Understand the data (common
theme)
• Scalability
• Take action
Machine Learning using Python
• Machine Learning:
– Continuously improving models
– Cost reduction
– Classification of space data
• Definitions of various models
– Regression - Pattern Recognizer
– Classification
– Clustering
Classification
• Supervised
– Trained with data, fully labeled, user
involved with training
• Unsupervised
– No training data, groupings of similar
attributes (characteristics), computer
uses techniques such as clustering
• Discrete vs Continuous values
Understand Which Algorithm to
Use
Categorical Continuous
(Discreet)
Supervised Classification Regression
Unsupervised Clustering
Algorithms
• Logistic Regression
– Simple, large scale, can be parallelized
• Neural Networks
– Unstructured data, no limit to
complexity, good on large datasets
• Decision Trees
– Easy to interpret, fast prediction, rules
based
Evaluate Model
• All data available
– Split to training and testing data
• Run through the model
• Output
– Train model, measure performance
Examples
• Predict Price of houses
• Book recommendation
• Petal vs Sepal of Iris
• Walmart – beer & diapers
Other
• Confusion Matrix
– Solve binary problem, how wrong
• Train/Test
– Cross validation; split data into slices,
then have a different assessment and
average it out
• More data or more model
– Build a learning curve
Jupyter Navigator
• Jupyter Notebook
• Examples in Python
• Not enough time
Data Visualization
• Know your audience
• Mechanism for feedback
• How to direct the focus
– Charts, images
• Develop a sense of storytelling
• Know your data
– Relationship to user
• Be creative
Data Science
• May be a data artist
– Problem & data = acceptable solution
• Storytelling
– Make the analytics tell a more focused
story
• Don’t undervalue hands-on
experience
• Target something useful
• Analytics is AI
Robotics & AI
• Validated topics introduced
– Statistics
– Data Analytic techniques
– Data visualization
• Not all science, there is some art
• Python programming
• “AI is first”

You might also like