Skip to content

Commit c562bc6

Browse files
committed
MC refactor
1 parent 01fac51 commit c562bc6

6 files changed

+19
-13
lines changed

MC/MC Control with Epsilon-Greedy Policies (Solution).ipynb renamed to MC/MC Control with Epsilon-Greedy Policies Solution.ipynb

File renamed without changes.

MC/Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb renamed to MC/Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb

File renamed without changes.

MC/README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,13 @@
3737

3838
### Exercises
3939

40-
1. [Get familar with the Blackjack environment (Blackjack-v0)](Blackjack Playground.ipynb)
41-
2. Implement the Monte Carlo Prediction to estimate state-action values in Python ([Exercise](MC Prediction.ipynb), [Solution](MC Prediction (Solution).ipynb))
42-
3. Implement the on-policy first-visit Monte Carlo Control algorithm in Python ([Exercise](MC Control with Epsilon-Greedy Policies.ipynb), [Solution](MC Control with Epsilon-Greedy Policies (Solution).ipynb))
43-
4. Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampliing algorithm in Python ([Exercise](Off-Policy MC Control with Weighted Importance Sampling.ipynb), [Solution](Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb))
40+
- [Get familar with the Blackjack environment (Blackjack-v0)](Blackjack Playground.ipynb)
41+
- Implement the Monte Carlo Prediction to estimate state-action values
42+
- [Exercise](MC Prediction.ipynb),
43+
- [Solution](MC Prediction Solution.ipynb)
44+
- Implement the on-policy first-visit Monte Carlo Control algorithm
45+
- [Exercise](MC Control with Epsilon-Greedy Policies.ipynb)
46+
- [Solution](MC Control with Epsilon-Greedy Policies Solution.ipynb
47+
- Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampliing algorithm
48+
- [Exercise](Off-Policy MC Control with Weighted Importance Sampling.ipynb)
49+
- [Solution](Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb)

PolicyGradient/Continuous MountainCar Actor Critic Solution.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@
118118
},
119119
{
120120
"cell_type": "code",
121-
"execution_count": 51,
121+
"execution_count": 56,
122122
"metadata": {
123123
"collapsed": false
124124
},
@@ -179,7 +179,7 @@
179179
},
180180
{
181181
"cell_type": "code",
182-
"execution_count": 52,
182+
"execution_count": 57,
183183
"metadata": {
184184
"collapsed": false
185185
},
@@ -224,7 +224,7 @@
224224
},
225225
{
226226
"cell_type": "code",
227-
"execution_count": 53,
227+
"execution_count": 58,
228228
"metadata": {
229229
"collapsed": true
230230
},
@@ -312,15 +312,15 @@
312312
"name": "stdout",
313313
"output_type": "stream",
314314
"text": [
315-
"Step 11168 @ Episode 1/50 (0.0)"
315+
"Step 884 @ Episode 27/50 (59.40512586945543)"
316316
]
317317
}
318318
],
319319
"source": [
320320
"tf.reset_default_graph()\n",
321321
"\n",
322322
"global_step = tf.Variable(0, name=\"global_step\", trainable=False)\n",
323-
"policy_estimator = PolicyEstimator(learning_rate=0.01)\n",
323+
"policy_estimator = PolicyEstimator(learning_rate=0.001)\n",
324324
"value_estimator = ValueEstimator(learning_rate=0.1)\n",
325325
"\n",
326326
"with tf.Session() as sess:\n",

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ All code is written in Python 3 and use RL environments from [OpenAI Gym](https:
1414

1515
- [Introduction to RL problems, OpenAI gym](Introduction/)
1616
- [MDPs and Bellman Equations](MDP/)
17-
- [Model-Based RL: Policy and Value Iteration using Dynamic Programming](DP/)
18-
- [Model-Free Prediction & Control with Monte Carlo (MC)](MC/)
17+
- [Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration](DP/)
18+
- [Monte Carlo Model-Free Prediction & Control](MC/)
1919
- [Model-Free Prediction & Control with Temporal Difference (TD)](TD/)
2020
- [Function Approximation](FA/)
2121
- [Deep Q Learning](DQN/) (WIP)
@@ -34,8 +34,8 @@ All code is written in Python 3 and use RL environments from [OpenAI Gym](https:
3434
- [Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb)
3535
- [SARSA (On Policy TD Learning)](TD/SARSA Solution.ipynb)
3636
- [Q-Learning (Off Policy TD Learning)](TD/Q-Learning Solution.ipynb)
37-
- [Deep Q-Learning for Atari Games](DQN/Deep Q Learning Solution.ipynb))
38-
- [Double Deep-Q Learning for Atari Games](DQN/Double DQN Solution.ipynb))
37+
- [Deep Q-Learning for Atari Games](DQN/Deep Q Learning Solution.ipynb)
38+
- [Double Deep-Q Learning for Atari Games](DQN/Double DQN Solution.ipynb)
3939
- Deep Q-Learning with Prioritized Experience Replay (WIP)
4040
- [Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk REINFORCE with Baseline Solution.ipynb))
4141
- [Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk Actor Critic Solution.ipynb))

0 commit comments

Comments
 (0)