Skip to content

Commit b8b932c

Browse files
Update Cap09
1 parent a6bab80 commit b8b932c

17 files changed

+334744
-0
lines changed

Cap09/Mini-Projeto/Mini-Projeto2 - Analise1.ipynb

Lines changed: 205 additions & 0 deletions
Large diffs are not rendered by default.

Cap09/Mini-Projeto/Mini-Projeto2 - Analise2.ipynb

Lines changed: 168 additions & 0 deletions
Large diffs are not rendered by default.

Cap09/Mini-Projeto/Mini-Projeto2 - Analise3.ipynb

Lines changed: 170 additions & 0 deletions
Large diffs are not rendered by default.

Cap09/Mini-Projeto/Mini-Projeto2 - Analise4.ipynb

Lines changed: 217 additions & 0 deletions
Large diffs are not rendered by default.

Cap09/Mini-Projeto/dataset/autos.csv

Lines changed: 313984 additions & 0 deletions
Large diffs are not rendered by default.
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

Cap09/Notebooks/DSA-Python-Cap09-Analise-Exploratoria-de-Dados.ipynb

Lines changed: 2857 additions & 0 deletions
Large diffs are not rendered by default.

Cap09/Notebooks/DSA-Python-Cap09-Exercicio-Solucao.ipynb

Lines changed: 1164 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 358 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,358 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# <font color='blue'>Data Science Academy - Python Fundamentos - Capítulo 9</font>\n",
8+
"\n",
9+
"## Download: http://github.com/dsacademybr\n",
10+
"\n",
11+
"## Exercício: Análise Exploratória de Dados com Python\n",
12+
"\n",
13+
"Neste exercício, você vai realizar uma análise exploratória em um dos mais famosos datasets para Machine Learning, o dataset iris com informações sobre 3 tipos de plantas. Esse dataset é comumente usado em problemas de Machine Learning de classificação, quando nosso objetivo é prever a classe dos dados. No caso deste dataset, prever a categoria de uma planta a partir de medidas da planta (sepal e petal).\n",
14+
"\n",
15+
"Em cada célula, você encontra a tarefa a ser realizada. Faça todo o exercício e depois compare com a solução proposta.\n",
16+
"\n",
17+
"Dataset (já disponível com o Scikit-Learn): https://archive.ics.uci.edu/ml/datasets/iris"
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": 1,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"# Imports\n",
27+
"import time\n",
28+
"import numpy as np\n",
29+
"import pandas as pd\n",
30+
"from matplotlib import pyplot as plt\n",
31+
"from sklearn.datasets import load_iris\n",
32+
"%matplotlib inline\n",
33+
"\n",
34+
"fontsize = 14\n",
35+
"ticklabelsize = 14"
36+
]
37+
},
38+
{
39+
"cell_type": "code",
40+
"execution_count": 2,
41+
"metadata": {},
42+
"outputs": [
43+
{
44+
"name": "stdout",
45+
"output_type": "stream",
46+
"text": [
47+
"150\n"
48+
]
49+
},
50+
{
51+
"data": {
52+
"text/html": [
53+
"<div>\n",
54+
"<style scoped>\n",
55+
" .dataframe tbody tr th:only-of-type {\n",
56+
" vertical-align: middle;\n",
57+
" }\n",
58+
"\n",
59+
" .dataframe tbody tr th {\n",
60+
" vertical-align: top;\n",
61+
" }\n",
62+
"\n",
63+
" .dataframe thead th {\n",
64+
" text-align: right;\n",
65+
" }\n",
66+
"</style>\n",
67+
"<table border=\"1\" class=\"dataframe\">\n",
68+
" <thead>\n",
69+
" <tr style=\"text-align: right;\">\n",
70+
" <th></th>\n",
71+
" <th>sepal length (cm)</th>\n",
72+
" <th>sepal width (cm)</th>\n",
73+
" <th>petal length (cm)</th>\n",
74+
" <th>petal width (cm)</th>\n",
75+
" </tr>\n",
76+
" </thead>\n",
77+
" <tbody>\n",
78+
" <tr>\n",
79+
" <th>0</th>\n",
80+
" <td>5.1</td>\n",
81+
" <td>3.5</td>\n",
82+
" <td>1.4</td>\n",
83+
" <td>0.2</td>\n",
84+
" </tr>\n",
85+
" <tr>\n",
86+
" <th>1</th>\n",
87+
" <td>4.9</td>\n",
88+
" <td>3.0</td>\n",
89+
" <td>1.4</td>\n",
90+
" <td>0.2</td>\n",
91+
" </tr>\n",
92+
" <tr>\n",
93+
" <th>2</th>\n",
94+
" <td>4.7</td>\n",
95+
" <td>3.2</td>\n",
96+
" <td>1.3</td>\n",
97+
" <td>0.2</td>\n",
98+
" </tr>\n",
99+
" <tr>\n",
100+
" <th>3</th>\n",
101+
" <td>4.6</td>\n",
102+
" <td>3.1</td>\n",
103+
" <td>1.5</td>\n",
104+
" <td>0.2</td>\n",
105+
" </tr>\n",
106+
" <tr>\n",
107+
" <th>4</th>\n",
108+
" <td>5.0</td>\n",
109+
" <td>3.6</td>\n",
110+
" <td>1.4</td>\n",
111+
" <td>0.2</td>\n",
112+
" </tr>\n",
113+
" </tbody>\n",
114+
"</table>\n",
115+
"</div>"
116+
],
117+
"text/plain": [
118+
" sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n",
119+
"0 5.1 3.5 1.4 0.2\n",
120+
"1 4.9 3.0 1.4 0.2\n",
121+
"2 4.7 3.2 1.3 0.2\n",
122+
"3 4.6 3.1 1.5 0.2\n",
123+
"4 5.0 3.6 1.4 0.2"
124+
]
125+
},
126+
"execution_count": 2,
127+
"metadata": {},
128+
"output_type": "execute_result"
129+
}
130+
],
131+
"source": [
132+
"# Carregando o dataset\n",
133+
"iris = load_iris()\n",
134+
"df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
135+
"print(len(df))\n",
136+
"df.head()"
137+
]
138+
},
139+
{
140+
"cell_type": "markdown",
141+
"metadata": {},
142+
"source": [
143+
"## Extração e Transformação de Dados"
144+
]
145+
},
146+
{
147+
"cell_type": "code",
148+
"execution_count": 3,
149+
"metadata": {},
150+
"outputs": [],
151+
"source": [
152+
"# Imprima os valores numéricos da Variável target (o que queremos prever), \n",
153+
"# uma de 3 possíveis categorias de plantas: setosa, versicolor ou virginica\n"
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": 4,
159+
"metadata": {},
160+
"outputs": [],
161+
"source": [
162+
"# Imprima os valores numéricos da Variável target (o que queremos prever), \n",
163+
"# uma de 3 possíveis categorias de plantas: 0, 1 ou 2\n"
164+
]
165+
},
166+
{
167+
"cell_type": "code",
168+
"execution_count": 5,
169+
"metadata": {},
170+
"outputs": [],
171+
"source": [
172+
"# Adicione ao dataset uma nova coluna com os nomes das espécies, pois é isso que vamos tentar prever (variável target)\n"
173+
]
174+
},
175+
{
176+
"cell_type": "code",
177+
"execution_count": 6,
178+
"metadata": {},
179+
"outputs": [],
180+
"source": [
181+
"# Inclua no dataset uma coluna com os valores numéricos da variável target\n"
182+
]
183+
},
184+
{
185+
"cell_type": "code",
186+
"execution_count": 7,
187+
"metadata": {},
188+
"outputs": [],
189+
"source": [
190+
"# Extraia as features (atributos) do dataset e imprima \n"
191+
]
192+
},
193+
{
194+
"cell_type": "code",
195+
"execution_count": 8,
196+
"metadata": {},
197+
"outputs": [],
198+
"source": [
199+
"# Calcule a média de cada feature para as 3 classes\n"
200+
]
201+
},
202+
{
203+
"cell_type": "markdown",
204+
"metadata": {},
205+
"source": [
206+
"## Exploração de Dados"
207+
]
208+
},
209+
{
210+
"cell_type": "code",
211+
"execution_count": 9,
212+
"metadata": {},
213+
"outputs": [],
214+
"source": [
215+
"# Imprima uma Transposta do dataset (transforme linhas e colunas e colunas em linhas)\n"
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": 10,
221+
"metadata": {},
222+
"outputs": [],
223+
"source": [
224+
"# Utilize a função Info do dataset para obter um resumo sobre o dataset \n"
225+
]
226+
},
227+
{
228+
"cell_type": "code",
229+
"execution_count": 11,
230+
"metadata": {},
231+
"outputs": [],
232+
"source": [
233+
"# Faça um resumo estatístico do dataset\n"
234+
]
235+
},
236+
{
237+
"cell_type": "code",
238+
"execution_count": 12,
239+
"metadata": {},
240+
"outputs": [],
241+
"source": [
242+
"# Verifique se existem valores nulos no dataset\n"
243+
]
244+
},
245+
{
246+
"cell_type": "code",
247+
"execution_count": 13,
248+
"metadata": {
249+
"scrolled": true
250+
},
251+
"outputs": [],
252+
"source": [
253+
"# Faça uma contagem de valores de sepal length\n"
254+
]
255+
},
256+
{
257+
"cell_type": "markdown",
258+
"metadata": {},
259+
"source": [
260+
"## Plot"
261+
]
262+
},
263+
{
264+
"cell_type": "code",
265+
"execution_count": 14,
266+
"metadata": {
267+
"scrolled": true
268+
},
269+
"outputs": [],
270+
"source": [
271+
"# Crie um Histograma de sepal length\n"
272+
]
273+
},
274+
{
275+
"cell_type": "code",
276+
"execution_count": 15,
277+
"metadata": {},
278+
"outputs": [],
279+
"source": [
280+
"# Crie um Gráficos de Dispersão (scatter Plot) da variável sepal length versus número da linha, \n",
281+
"# colorido por marcadores da variável target\n"
282+
]
283+
},
284+
{
285+
"cell_type": "code",
286+
"execution_count": 16,
287+
"metadata": {},
288+
"outputs": [],
289+
"source": [
290+
"# Crie um Scatter Plot de 2 Features (atributos)\n"
291+
]
292+
},
293+
{
294+
"cell_type": "code",
295+
"execution_count": 17,
296+
"metadata": {},
297+
"outputs": [],
298+
"source": [
299+
"# Crie um Scatter Matrix das Features (atributos)\n"
300+
]
301+
},
302+
{
303+
"cell_type": "code",
304+
"execution_count": 18,
305+
"metadata": {},
306+
"outputs": [],
307+
"source": [
308+
"# Crie um Histograma de todas as features\n"
309+
]
310+
},
311+
{
312+
"cell_type": "markdown",
313+
"metadata": {},
314+
"source": [
315+
"Conheça a Formação Cientista de Dados, um programa completo, 100% online e 100% em português, com 340 horas, mais de 1.200 aulas em vídeos e 26 projetos, que vão ajudá-lo a se tornar um dos profissionais mais cobiçados do mercado de análise de dados. Clique no link abaixo, faça sua inscrição, comece hoje mesmo e aumente sua empregabilidade:\n",
316+
"\n",
317+
"https://www.datascienceacademy.com.br/pages/formacao-cientista-de-dados"
318+
]
319+
},
320+
{
321+
"cell_type": "markdown",
322+
"metadata": {
323+
"collapsed": true
324+
},
325+
"source": [
326+
"# Fim"
327+
]
328+
},
329+
{
330+
"cell_type": "markdown",
331+
"metadata": {},
332+
"source": [
333+
"### Obrigado - Data Science Academy - <a href=http://facebook.com/dsacademy>facebook.com/dsacademybr</a>"
334+
]
335+
}
336+
],
337+
"metadata": {
338+
"kernelspec": {
339+
"display_name": "Python 3",
340+
"language": "python",
341+
"name": "python3"
342+
},
343+
"language_info": {
344+
"codemirror_mode": {
345+
"name": "ipython",
346+
"version": 3
347+
},
348+
"file_extension": ".py",
349+
"mimetype": "text/x-python",
350+
"name": "python",
351+
"nbconvert_exporter": "python",
352+
"pygments_lexer": "ipython3",
353+
"version": "3.6.4"
354+
}
355+
},
356+
"nbformat": 4,
357+
"nbformat_minor": 1
358+
}

0 commit comments

Comments
 (0)