Machine Learning & Data Science: Sylvia Unwin Faculty, Program Chair Assistant Dean, iBIT
Machine Learning & Data Science: Sylvia Unwin Faculty, Program Chair Assistant Dean, iBIT
Science
Sylvia Unwin
Faculty, Program Chair
Assistant Dean, iBIT
Machine Learning
• Attended TDWI in Oct 2017
• Focus on Machine Learning, Data
Science, Python, AI
• Started with a catchy opening speech
– “BS-Free AI For Business”
– Top 5 BS List
AI
• What’s the BS?
– AI is first
– According to the speaker, doesn’t solve
a necessary real-world problem
– Startups (investments) in scaling AI
– Doesn’t show ROI without promise of
more, perfect and better data
Avoid
• Big data problem will only provide a
small data solution
– Thinking more data will solve the
problem (if perfect data, will work)
• Not defining what is the problem?
– Be specific (reduce waste by 10x)
• Know who owns the data
• Avoid scaling too quickly
Avoid
• No Black boxes
– Requires trust, then must have
transparency
– No technical explanations (too many
acronyms), no invented scores
• Inaction
– “nothing will happen, if no action is
taken”
Why AI
• Be aware of your focus
• Understand the data (common
theme)
• Scalability
• Take action
Machine Learning using Python
• Machine Learning:
– Continuously improving models
– Cost reduction
– Classification of space data
• Definitions of various models
– Regression - Pattern Recognizer
– Classification
– Clustering
Classification
• Supervised
– Trained with data, fully labeled, user
involved with training
• Unsupervised
– No training data, groupings of similar
attributes (characteristics), computer
uses techniques such as clustering
• Discrete vs Continuous values
Understand Which Algorithm to
Use
Categorical Continuous
(Discreet)
Supervised Classification Regression
Unsupervised Clustering
Algorithms
• Logistic Regression
– Simple, large scale, can be parallelized
• Neural Networks
– Unstructured data, no limit to
complexity, good on large datasets
• Decision Trees
– Easy to interpret, fast prediction, rules
based
Evaluate Model
• All data available
– Split to training and testing data
• Run through the model
• Output
– Train model, measure performance
Examples
• Predict Price of houses
• Book recommendation
• Petal vs Sepal of Iris
• Walmart – beer & diapers
Other
• Confusion Matrix
– Solve binary problem, how wrong
• Train/Test
– Cross validation; split data into slices,
then have a different assessment and
average it out
• More data or more model
– Build a learning curve
Jupyter Navigator
• Jupyter Notebook
• Examples in Python
• Not enough time
Data Visualization
• Know your audience
• Mechanism for feedback
• How to direct the focus
– Charts, images
• Develop a sense of storytelling
• Know your data
– Relationship to user
• Be creative
Data Science
• May be a data artist
– Problem & data = acceptable solution
• Storytelling
– Make the analytics tell a more focused
story
• Don’t undervalue hands-on
experience
• Target something useful
• Analytics is AI
Robotics & AI
• Validated topics introduced
– Statistics
– Data Analytic techniques
– Data visualization
• Not all science, there is some art
• Python programming
• “AI is first”