Data Science Python
Data Science Python
Zakaria KERKAOU
Zakaria.kerkaou@e-
polytechnique.ma
What is Data Science ?
• It's a set of methodologies for taking in thousands of forms of data
that are available to us today, and using them to draw meaningful
conclusions.
• Data is being collected all around us. Every like, click, email, credit
card swipe, or tweet is a new piece of data that can be used to
better describe the present or better predict the future.
What can data do ?
• Data can describe our current state.
• It can help detect anomalous events.
• Data can also diagnose the causes of observed events and
behaviours.
• Finally, Data can predict future events.
Why now ?
• We're collecting more data than ever before.
Data science workflow
• In data science, we generally have four steps to any project.
Data collection
Data Preparation
Data exploration and visualization
Experimentation and predictions on the data.
Application of Data science
• Some areas of data science :
Machine learning (ML).
Internet Of Things (IoT).
Deep Learning.
Application of Data science
• Example Machine Learning :
Fraud detection
Application of Data science
• Data science problem begins with a well-defined question:
What is the probability that this transaction is fraudulent?
• A set of example Data
Transaction (from database) labelled as “valid” or “Fraudulent”.
• A new set of data to use our algorithm on
New transaction.
Application of Data science
Another example of IoT :
Monitor and auto-detect different activities.
Application of Data science
Your smart watch is part of a fast growing field called "the Internet
of Things", also known as IoT, which is often combined with Data
Science.
IoT refers to gadgets that are not standard computers, but still have
the ability to transmit data.
Smart w atches.
Internet connected home security systems.
Electronic toll collection systems.
Building energy management systems.
Much more !
IoT data is a great resource for data science projects
Data science roles.
Generally, there's four categories of jobs in data science:
• Data Engineer,
• Data Analyst,
• Data Scientist,
• Machine Learning Scientist.
Data science roles: Data
Engineer.
Data engineers control the flow of data:
• They build custom data pipelines and storage systems.
• Information architects.
• Maintain Data access.
Data science roles: Data
Engineer.
Data engineers tools:
• SQL.
To store and organize data.
• Java, Scala or python.
Programming languages to process data
• Shell.
Command lines to automate and run tasks
• Cloud computing.
AWS, Azure, Google cloud platform.
Data science roles: Data
Analyst.
Data analysts describe the present via data:
• Perform simpler analyses that describe data.
• Create reports and dashboards to summarize data.
• Clean data for analysis.
Data science roles: Data
Analyst.
Data analysts tools:
• SQL.
Retreive and agregate Data.
• Spreadsheets (Excel, Google Sheets).
Simple analysis.
• BI tools (Power BI, Tableau, Looker).
Dashboard and visualisation.
• Sometimes Python.
Data science roles: Data
Scientist.
Data Scientists have a strong background in statistics, enabling
them to find new insights from data.
• Versed in statistical methods.
• Run experiments and analyses for insights.
• Traditional machine learning.
Data science roles: Data
Scientist.
Data scientist tools:
• SQL.
Retreive and agregate Data.
• Python or/and R.
Datascience libraries, for example
Panda (python), tidyverse.
Machine learning libraries such as
Sklearn.
Data science roles: Machine
Learning Scientist.
Machine learning scientists are similar to data scientists, but with a
machine learning specialization.
• Predictions and extrapolations.
• Classification.
• Deep learning:
Image processing, computer vision.
Natural language processing.
Data science roles: Machine
Learning Scientist.
Machine learning scientists tools.
• Python or/and R.
Datascience libraries, for example
Pandas (python), tidyverse
Machine learning and deep learning
libraries : SkLearn, Tensorflow,
Spark.
Data science roles.
Introduction to
What is Python? And What can Python do?
• Python is an interpreted high-level general-purpose computer
programming language often used to build websites and software,
automate tasks, and conduct data analysis.
• Python is a general purpose language, it can be used to create a
variety of different programs and isn’t specialized for any specific
problems.
• Its versatility and beginner-friendliness, has made it one of the
most-used programming languages today.
• The most recent version of Python is Python 3.
• Python 2, although not being updated with anything other than
security updates, is still quite popular.
What is Python? And What can Python do?
• Python can be used for :
web development (server-side),
software development,
mathematics,
System scripting.
Game development.
Scientific computing and Datascience
AI and machine learning.
Google Colaboratory
• Creating a python file on using the .py file extension, and running
it in the Command Line.
In python there are no command for
declaring a variable. They are created
the moment you first assign a value to it.
Variables do not need to be declared with
any particular type, and can even change
type after they have been set.
However, If you want to specify the data
type of a variable, this can be done with
casting.
Variable names
In Python a variable name :
Must start with a letter or the underscore character
Cannot start with a number
Can only contain alpha-numeric characters and underscores
(A-z, 0-9, and _ )
Is case-sensitive (age, Age and AGE are three different
variables)
Built-in Data Types
Python has the following data types built-in by default, in
these categories:
Text Type: str
Numeric Types: int, float, complex
Sequence Types: list, tuple, range
Mapping Type: dict
Set Types: set, frozenset
Boolean Type: bool
Binary Types: bytes, bytearray, memoryview
To get the datatype of any object you can use the function
type()
Input / output
Python provides numerous built-in functions that are readily available to us to perform
I/O task in Python.
The input() function helps to enter data at run time by the user and the output
function print() is used to display the result of the program on the screen after
execution.
Python 3.6 uses the input() method and Python 2.7 uses the raw_input() method.
Lists
Lists are just like dynamically sized arrays similar to vector in C++ and ArrayList in
Java.
List items are ordered, changeable, and allow duplicate values.
In Python, a list is created by placing elements inside square brackets [], separated
by commas.
We can use the index operator [] to access an item in a list which starts at 0.
Python allows negative indexing for its sequences. The index of -1 refers to the last
item.
To determine how many items a list has, use the len() function.
Arithmetic operators
+ Add two operands or unary plus
- Subtract right operand from the left or unary minus
* Multiply two operands
/ Divide left operand by the right one (always results into float)
% Modulus - remainder of the division of left operand by the right
// Floor division
** Exponent - left operand raised to the power of right
Comparison operators
> Greater than - True if left operand is greater than the right x > y
< Less than - True if left operand is less than the right x < y
== Equal to - True if both operands are equal x == y
!= Not equal to - True if operands are not equal x != y
>= Greater than or equal to x >= y
<= Less than or equal to - True if left operand is less than or equal to the right
Logical operators
and True if both the operands are true x and y
or True if either of the operands is true x or y
not True if operand is false (complements the operand) not x
Bitwise operators
& Bitwise AND x & y = 0 (0000 0000)
| Bitwise OR x | y = 14 (0000 1110)
~ Bitwise NOT ~x = -11 (1111 0101)
^ Bitwise XOR x ^ y = 14 (0000 1110)
>> Bitwise right shift x >> 2 = 2 (0000 0010)
<< Bitwise left shift x << 2 = 40 (0010 1000)
Assignment operators
= x=5 x=5
//= x //= 5 x = x // 5
+= x += 5 x=x+5
**= x **= 5 x = x ** 5
-= x -= 5 x=x–5
&= x &= 5 x=x&5
*= x *= 5 x=x*5
|= x |= 5 x=x|5
/= x /= 5 x=x/5
^= x ^= 5 x=x^5
%= x %= 5 x = x % 5
>>= x >>= 5 x = x >> 5
Special operators
Is True if the operands are identical (refer to the same object) x is True
is not True if the operands are not identical x is not True
Membership operators
in True if value/variable is found in the sequence 5 in x
not in True if value/variable is not found in the sequence 5 not in x
Conditions and If statements
Conditional statements are required when we want to execute a code only if a certain
condition is satisfied.
The if…elif…else statement is used in Python for decision making.
Example : we check if the number is positive ornegative or zero and display an
appropriate message
Example : Program to add natural
numbers up to
sum = 1+2+3+...+n
Python break and continue
In Python, break and continue statements can change the normal flow of a loop.
The break statement terminates the loop containing it.
The continue statement is used to skip the rest of the code inside a loop for the
current iteration only.