kastnerkyle
diff --git a/‎…Epsilon-Greedy Policies (Solution).ipynb‎ ‎…h Epsilon-Greedy Policies Solution.ipynb‎MC/MC Control with Epsilon-Greedy Policies (Solution).ipynb renamed to MC/MC Control with Epsilon-Greedy Policies Solution.ipynb b/‎…Epsilon-Greedy Policies (Solution).ipynb‎ ‎…h Epsilon-Greedy Policies Solution.ipynb‎MC/MC Control with Epsilon-Greedy Policies (Solution).ipynb renamed to MC/MC Control with Epsilon-Greedy Policies Solution.ipynb
diff --git a/‎MC/MC Prediction (Solution).ipynb‎ ‎MC/MC Prediction Solution.ipynb‎MC/MC Prediction (Solution).ipynb renamed to MC/MC Prediction Solution.ipynb b/‎MC/MC Prediction (Solution).ipynb‎ ‎MC/MC Prediction Solution.ipynb‎MC/MC Prediction (Solution).ipynb renamed to MC/MC Prediction Solution.ipynb
diff --git a/‎…ted Importance Sampling (Solution).ipynb‎ ‎…ghted Importance Sampling Solution.ipynb‎MC/Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb renamed to MC/Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb b/‎…ted Importance Sampling (Solution).ipynb‎ ‎…ghted Importance Sampling Solution.ipynb‎MC/Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb renamed to MC/Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb
diff --git a/‎MC/README.md‎
Lines changed: 10 additions & 4 deletions b/‎MC/README.md‎
Lines changed: 10 additions & 4 deletions
diff --git a/‎PolicyGradient/Continuous MountainCar Actor Critic Solution.ipynb‎
Lines changed: 5 additions & 5 deletions b/‎PolicyGradient/Continuous MountainCar Actor Critic Solution.ipynb‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎README.md‎
Lines changed: 4 additions & 4 deletions b/‎README.md‎
Lines changed: 4 additions & 4 deletions
@@ -37,7 +37,13 @@
 
 ### Exercises
 
-1. [Get familar with the Blackjack environment (Blackjack-v0)](Blackjack Playground.ipynb)
-2. Implement the Monte Carlo Prediction to estimate state-action values in Python ([Exercise](MC Prediction.ipynb), [Solution](MC Prediction (Solution).ipynb))
-3. Implement the on-policy first-visit Monte Carlo Control algorithm in Python ([Exercise](MC Control with Epsilon-Greedy Policies.ipynb), [Solution](MC Control with Epsilon-Greedy Policies (Solution).ipynb))
-4. Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampliing algorithm in Python ([Exercise](Off-Policy MC Control with Weighted Importance Sampling.ipynb), [Solution](Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb))
+- [Get familar with the Blackjack environment (Blackjack-v0)](Blackjack Playground.ipynb)
+- Implement the Monte Carlo Prediction to estimate state-action values
+  - [Exercise](MC Prediction.ipynb),
+  - [Solution](MC Prediction Solution.ipynb)
+- Implement the on-policy first-visit Monte Carlo Control algorithm
+  - [Exercise](MC Control with Epsilon-Greedy Policies.ipynb)
+  - [Solution](MC Control with Epsilon-Greedy Policies Solution.ipynb
+- Implement the off-policy every-visit Monte Carlo Control using Weighted Important Sampliing algorithm
+  - [Exercise](Off-Policy MC Control with Weighted Importance Sampling.ipynb)
+  - [Solution](Off-Policy MC Control with Weighted Importance Sampling Solution.ipynb)
@@ -118,7 +118,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 51,
+   "execution_count": 56,
    "metadata": {
     "collapsed": false
    },
@@ -179,7 +179,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 52,
+   "execution_count": 57,
    "metadata": {
     "collapsed": false
    },
@@ -224,7 +224,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 53,
+   "execution_count": 58,
    "metadata": {
     "collapsed": true
    },
@@ -312,15 +312,15 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Step 11168 @ Episode 1/50 (0.0)"
+      "Step 884 @ Episode 27/50 (59.40512586945543)"
      ]
     }
    ],
    "source": [
     "tf.reset_default_graph()\n",
     "\n",
     "global_step = tf.Variable(0, name=\"global_step\", trainable=False)\n",
-    "policy_estimator = PolicyEstimator(learning_rate=0.01)\n",
+    "policy_estimator = PolicyEstimator(learning_rate=0.001)\n",
     "value_estimator = ValueEstimator(learning_rate=0.1)\n",
     "\n",
     "with tf.Session() as sess:\n",
 
@@ -14,8 +14,8 @@ All code is written in Python 3 and use RL environments from [OpenAI Gym](https:
 
 - [Introduction to RL problems, OpenAI gym](Introduction/)
 - [MDPs and Bellman Equations](MDP/)
-- [Model-Based RL: Policy and Value Iteration using Dynamic Programming](DP/)
-- [Model-Free Prediction & Control with Monte Carlo (MC)](MC/)
+- [Dynamic Programming: Model-Based RL, Policy Iteration and Value Iteration](DP/)
+- [Monte Carlo Model-Free Prediction & Control](MC/)
 - [Model-Free Prediction & Control with Temporal Difference (TD)](TD/)
 - [Function Approximation](FA/)
 - [Deep Q Learning](DQN/) (WIP)
@@ -34,8 +34,8 @@ All code is written in Python 3 and use RL environments from [OpenAI Gym](https:
 - [Monte Carlo Off-Policy Control with Importance Sampling](MC/Off-Policy MC Control with Weighted Importance Sampling (Solution).ipynb)
 - [SARSA (On Policy TD Learning)](TD/SARSA Solution.ipynb)
 - [Q-Learning (Off Policy TD Learning)](TD/Q-Learning Solution.ipynb)
-- [Deep Q-Learning for Atari Games](DQN/Deep Q Learning Solution.ipynb))
-- [Double Deep-Q Learning for Atari Games](DQN/Double DQN Solution.ipynb))
+- [Deep Q-Learning for Atari Games](DQN/Deep Q Learning Solution.ipynb)
+- [Double Deep-Q Learning for Atari Games](DQN/Double DQN Solution.ipynb)
 - Deep Q-Learning with Prioritized Experience Replay (WIP)
 - [Policy Gradient: REINFORCE with Baseline](PolicyGradient/CliffWalk REINFORCE with Baseline Solution.ipynb))
 - [Policy Gradient: Actor Critic with Baseline](PolicyGradient/CliffWalk Actor Critic Solution.ipynb))