Data Science Workshop - Day 1
Data Science Workshop - Day 1
(3 DAYS WORKSHOP)
Day 1
Workshop
By
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
Structured Data
Unstructured data
Machine-generated data
Graph-based or network data
Data Science Life Cycle
Phase 1: Business Understanding
Once the model is built, it is ready to deploy in the real world. The
deployment can occur offline, on the web, on the cloud, any android or
iOS app.
The Data Science project is monitored and maintained to work in the long
run. If there is any performance downgrade, then relevant changes can
be made as a part of the maintenance.
Data Scientist
Python
It is one of the best language used by data scientist for various data
science projects/application.
Best tool for data analysis, data visualization and machine learning
tasks
Download
https://www.python.org/downloads/
Install
Download and install Anaconda
Launch Anaconda
Launch Jupyter Notebook
Anaconda
num1 = 15
num2 = 12
# printing values
print("Sum of {0} and {1} is {2}" .format(num1,
num2, sum))
Comments
name = 'John'
message = f'Hi {name}'
print(message)
Concatenating Python strings
>>> 10 == 10
True
>>> 10 == 11
False
>>> "jack" == "jack"
True
>>> "jack" == "jake"
False
inequality
>>> 10 != 10
False
>>> 10 != 11
True
>>> "jack" != "jack"
False
>>> "jack" != "jake"
True
Dictionaries
>>> words={'apple':'red','lemon':'yellow'}
>>> words
{'lemon': 'yellow', 'apple': 'red'}
>>> words['apple']
'red'
>>> words['lemon']
'yellow'
Function
return num3
# Driver code
num1, num2 = 5, 15
ans = add(num1, num2)
print(f"The addition of {num1} and {num2} results {ans}.")
Pandas with Python
import numpy as np
import pandas as pd
Series
Series is a one-dimensional labeled array capable of holding any data type (integers,
strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred
s = pd.Series(data, index=index)
a Python dict
an ndarray
a = [1, 7, 2]
pd.Series(d)
myvar = pd.Series(a)
Output print(myvar)
b 1
a 0
c 2
dtype: int64
Pass index
pd.Series(d, index=["b", "c", "d", "a"])
d = {"a": 0.0, "b": 1.0, "c": 2.0}
Output
pd.Series(d) b 1.0
c 2.0
d NaN
Output
a 0.0
dtype: float64
a 0.0
b 1.0
NaN (not a number) is the standard missing data marker
c 2.0 used in pandas.
dtype: float64
Pandas Data Frames
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Locate Row
# list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
df.head(10)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts = pd.Series(np.random.randn(1000),
index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
ts.plot();
Plot
df = df.cumsum()
plt.figure();
df.plot();
Bar Plot
plt.figure();
df.iloc[5].plot(kind="bar");
Scatter Matrix
df =
pd.DataFrame(np.random.randn(1000, 4),
columns=["a", "b", "c", "d"])
ser = pd.Series(np.random.randn(1000))
ser.plot.kde();
Numpy
Numpy Operations
Size and Shape
Reshape and Slicing
Minimum, Maximum and Sum
Basic function
import numpy
print(arr)
Numpy as np
import numpy as np
print(arr)
Checking NumPy Version
import numpy as np
print(np.__version__)
Create a NumPy ndarray Object
import numpy as np
import numpy as np
arr = np.array((1, 2, 3, 4, 5))
arr = np.array([1, 2, 3, 4, 5]) print(arr)
print(arr)
type(): This built-in Python function tells us the type of the
print(type(arr)) object passed to it. Like in above code it shows that arr is
numpy.ndarray type.
Dimensions in Arrays
0-D Arrays 2-D Arrays
import numpy as np import numpy as np
arr = np.array(42)
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
print(arr)
1-D Arrays
3-D arrays
import numpy as np
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr) arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Access Array Elements
print(arr[0])
import numpy as np
print(arr[2] + arr[3])
NumPy Array Slicing
print(arr[1:5])
import numpy as np
print(arr[:4])
Checking the Data Type of an
Array
import numpy as np import numpy as np
print(arr.dtype)
Shape of an Array
import numpy as np
print(arr.shape)
NumPy Array Iterating
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)
Splitting NumPy Arrays
import numpy as np
print(newarr)
print(newarr)
7299119900
www.whyglobalservices.com
www.whytap.in
www.abhisoverseas.com