Skip to content

Commit f45bcbf

Browse files
authored
Merge pull request dennybritz#123 from BAILOOL/master
Updated link to Sutton's book. Changed Lambda to Gamma in FA
2 parents 74d301c + 9ee6cdd commit f45bcbf

File tree

10 files changed

+37
-59
lines changed

10 files changed

+37
-59
lines changed

DP/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828

2929
**Optional:**
3030

31-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 4: Dynamic Programming
31+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 4: Dynamic Programming
3232

3333

3434
### Exercises

FA/Q-Learning with Value Function Approximation Solution.ipynb

Lines changed: 12 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,7 @@
33
{
44
"cell_type": "code",
55
"execution_count": 1,
6-
"metadata": {
7-
"collapsed": false
8-
},
6+
"metadata": {},
97
"outputs": [],
108
"source": [
119
"%matplotlib inline\n",
@@ -31,9 +29,7 @@
3129
{
3230
"cell_type": "code",
3331
"execution_count": 2,
34-
"metadata": {
35-
"collapsed": false
36-
},
32+
"metadata": {},
3733
"outputs": [
3834
{
3935
"name": "stderr",
@@ -50,9 +46,7 @@
5046
{
5147
"cell_type": "code",
5248
"execution_count": 3,
53-
"metadata": {
54-
"collapsed": false
55-
},
49+
"metadata": {},
5650
"outputs": [
5751
{
5852
"data": {
@@ -74,7 +68,7 @@
7468
"scaler = sklearn.preprocessing.StandardScaler()\n",
7569
"scaler.fit(observation_examples)\n",
7670
"\n",
77-
"# Used to converte a state to a featurizes represenation.\n",
71+
"# Used to convert a state to a featurizes represenation.\n",
7872
"# We use RBF kernels with different variances to cover different parts of the space\n",
7973
"featurizer = sklearn.pipeline.FeatureUnion([\n",
8074
" (\"rbf1\", RBFSampler(gamma=5.0, n_components=100)),\n",
@@ -88,9 +82,7 @@
8882
{
8983
"cell_type": "code",
9084
"execution_count": 4,
91-
"metadata": {
92-
"collapsed": false
93-
},
85+
"metadata": {},
9486
"outputs": [],
9587
"source": [
9688
"class Estimator():\n",
@@ -151,9 +143,7 @@
151143
{
152144
"cell_type": "code",
153145
"execution_count": 5,
154-
"metadata": {
155-
"collapsed": false
156-
},
146+
"metadata": {},
157147
"outputs": [],
158148
"source": [
159149
"def make_epsilon_greedy_policy(estimator, epsilon, nA):\n",
@@ -182,9 +172,7 @@
182172
{
183173
"cell_type": "code",
184174
"execution_count": 14,
185-
"metadata": {
186-
"collapsed": false
187-
},
175+
"metadata": {},
188176
"outputs": [],
189177
"source": [
190178
"def q_learning(env, estimator, num_episodes, discount_factor=1.0, epsilon=0.1, epsilon_decay=1.0):\n",
@@ -196,7 +184,7 @@
196184
" env: OpenAI environment.\n",
197185
" estimator: Action-Value function estimator\n",
198186
" num_episodes: Number of episodes to run for.\n",
199-
" discount_factor: Lambda time discount factor.\n",
187+
" discount_factor: Gamma discount factor.\n",
200188
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
201189
" epsilon_decay: Each episode, epsilon is decayed by this factor\n",
202190
" \n",
@@ -283,9 +271,7 @@
283271
{
284272
"cell_type": "code",
285273
"execution_count": 16,
286-
"metadata": {
287-
"collapsed": false
288-
},
274+
"metadata": {},
289275
"outputs": [
290276
{
291277
"name": "stdout",
@@ -305,9 +291,7 @@
305291
{
306292
"cell_type": "code",
307293
"execution_count": 17,
308-
"metadata": {
309-
"collapsed": false
310-
},
294+
"metadata": {},
311295
"outputs": [
312296
{
313297
"data": {
@@ -384,9 +368,9 @@
384368
"name": "python",
385369
"nbconvert_exporter": "python",
386370
"pygments_lexer": "ipython3",
387-
"version": "3.5.1"
371+
"version": "3.5.2"
388372
}
389373
},
390374
"nbformat": 4,
391-
"nbformat_minor": 0
375+
"nbformat_minor": 1
392376
}

FA/Q-Learning with Value Function Approximation.ipynb

Lines changed: 11 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "code",
55
"execution_count": 1,
66
"metadata": {
7-
"collapsed": false
7+
"collapsed": true
88
},
99
"outputs": [],
1010
"source": [
@@ -31,9 +31,7 @@
3131
{
3232
"cell_type": "code",
3333
"execution_count": 2,
34-
"metadata": {
35-
"collapsed": false
36-
},
34+
"metadata": {},
3735
"outputs": [
3836
{
3937
"name": "stderr",
@@ -50,9 +48,7 @@
5048
{
5149
"cell_type": "code",
5250
"execution_count": 3,
53-
"metadata": {
54-
"collapsed": false
55-
},
51+
"metadata": {},
5652
"outputs": [
5753
{
5854
"data": {
@@ -89,7 +85,7 @@
8985
"cell_type": "code",
9086
"execution_count": 4,
9187
"metadata": {
92-
"collapsed": false
88+
"collapsed": true
9389
},
9490
"outputs": [],
9591
"source": [
@@ -149,7 +145,7 @@
149145
"cell_type": "code",
150146
"execution_count": 5,
151147
"metadata": {
152-
"collapsed": false
148+
"collapsed": true
153149
},
154150
"outputs": [],
155151
"source": [
@@ -180,7 +176,7 @@
180176
"cell_type": "code",
181177
"execution_count": 18,
182178
"metadata": {
183-
"collapsed": false
179+
"collapsed": true
184180
},
185181
"outputs": [],
186182
"source": [
@@ -193,7 +189,7 @@
193189
" env: OpenAI environment.\n",
194190
" estimator: Action-Value function estimator\n",
195191
" num_episodes: Number of episodes to run for.\n",
196-
" discount_factor: Lambda time discount factor.\n",
192+
" discount_factor: Gamma discount factor.\n",
197193
" epsilon: Chance the sample a random action. Float betwen 0 and 1.\n",
198194
" epsilon_decay: Each episode, epsilon is decayed by this factor\n",
199195
" \n",
@@ -237,9 +233,7 @@
237233
{
238234
"cell_type": "code",
239235
"execution_count": 20,
240-
"metadata": {
241-
"collapsed": false
242-
},
236+
"metadata": {},
243237
"outputs": [
244238
{
245239
"name": "stdout",
@@ -259,9 +253,7 @@
259253
{
260254
"cell_type": "code",
261255
"execution_count": 21,
262-
"metadata": {
263-
"collapsed": false
264-
},
256+
"metadata": {},
265257
"outputs": [
266258
{
267259
"data": {
@@ -326,9 +318,9 @@
326318
"name": "python",
327319
"nbconvert_exporter": "python",
328320
"pygments_lexer": "ipython3",
329-
"version": "3.4.3"
321+
"version": "3.5.2"
330322
}
331323
},
332324
"nbformat": 4,
333-
"nbformat_minor": 0
325+
"nbformat_minor": 1
334326
}

FA/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
**Required:**
2626

2727
- David Silver's RL Course Lecture 6 - Value Function Approximation ([video](https://www.youtube.com/watch?v=UoPei5o4fps), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf))
28-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 9: On-policy Prediction with Approximation
29-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 10: On-policy Control with Approximation
28+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 9: On-policy Prediction with Approximation
29+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 10: On-policy Control with Approximation
3030

3131
**Optional:**
3232

@@ -35,6 +35,8 @@
3535

3636
### Exercises
3737

38+
- Get familiar with the [Mountain Car Playground](MountainCar%20Playground.ipynb)
39+
3840
- Solve Mountain Car Problem using Q-Learning with Linear Function Approximation
3941
- [Exercise](Q-Learning%20with%20Value%20Function%20Approximation.ipynb)
4042
- [Solution](Q-Learning%20with%20Value%20Function%20Approximation%20Solution.ipynb)

Introduction/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
**Required:**
1919

20-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 1: The Reinforcement Learning Problem
20+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 1: The Reinforcement Learning Problem
2121
- David Silver's RL Course Lecture 1 - Introduction to Reinforcement Learning ([video](https://www.youtube.com/watch?v=2pWv7GOvuf0), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf))
2222
- [OpenAI Gym Tutorial](https://gym.openai.com/docs)
2323

MC/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
**Required:**
2828

29-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 5: Monte Carlo Methods
29+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 5: Monte Carlo Methods
3030

3131

3232
**Optional:**

MDP/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626
**Required:**
2727

28-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 3: Finite Markov Decision Processes
28+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 3: Finite Markov Decision Processes
2929
- David Silver's RL Course Lecture 2 - Markov Decision Processes ([video](https://www.youtube.com/watch?v=lfHX2hHRMVQ), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf))
3030

3131

PolicyGradient/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636

3737
**Optional:**
3838

39-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 11: Policy Gradient Methods (Under Construction)
39+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 11: Policy Gradient Methods (Under Construction)
4040
- [Deterministic Policy Gradient Algorithms](http://jmlr.org/proceedings/papers/v32/silver14.pdf)
4141
- [Deterministic Policy Gradient Algorithms (Talk)](http://techtalks.tv/talks/deterministic-policy-gradient-algorithms/61098/)
4242
- [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This repository provides code, exercises and solutions for popular Reinforcement Learning algorithms. These are meant to serve as a learning tool to complement the theoretical materials from
44

5-
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf)
5+
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf)
66
- [David Silver's Reinforcement Learning Course](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)
77

88
Each folder in corresponds to one or more chapters of the above textbook and/or course. In addition to exercises and solution, each folder also contains a list of learning goals, a brief concept summary, and links to the relevant readings.
@@ -50,7 +50,7 @@ All code is written in Python 3 and uses RL environments from [OpenAI Gym](https
5050

5151
Textbooks:
5252

53-
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf)
53+
- [Reinforcement Learning: An Introduction (2nd Edition)](http://incompleteideas.net/book/bookdraft2017nov5.pdf)
5454

5555
Classes:
5656

TD/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,14 @@
2828

2929
**Required:**
3030

31-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 6: Temporal-Difference Learning
31+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 6: Temporal-Difference Learning
3232
- David Silver's RL Course Lecture 4 - Model-Free Prediction ([video](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MC-TD.pdf))
3333
- David Silver's RL Course Lecture 5 - Model-Free Control ([video](https://www.youtube.com/watch?v=0g4j2k_Ggc4), [slides](http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/control.pdf))
3434

3535
**Optional:**
3636

37-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 7: Multi-Step Bootstrapping
38-
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/sutton/book/bookdraft2017june.pdf) - Chapter 12: Eligibility Traces
37+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 7: Multi-Step Bootstrapping
38+
- [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf) - Chapter 12: Eligibility Traces
3939

4040

4141
### Exercises

0 commit comments

Comments
 (0)