Skip to content

Commit 5334a6f

Browse files
authored
Merge pull request dennybritz#134 from keithmgould/master
update value estimator only after calculating advantage
2 parents 2a6fe49 + 30326df commit 5334a6f

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

PolicyGradient/CliffWalk REINFORCE with Baseline Solution.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -196,11 +196,11 @@
196196
" for t, transition in enumerate(episode):\n",
197197
" # The return after this timestep\n",
198198
" total_return = sum(discount_factor**i * t.reward for i, t in enumerate(episode[t:]))\n",
199-
" # Update our value estimator\n",
200-
" estimator_value.update(transition.state, total_return)\n",
201199
" # Calculate baseline/advantage\n",
202200
" baseline_value = estimator_value.predict(transition.state) \n",
203201
" advantage = total_return - baseline_value\n",
202+
" # Update our value estimator\n",
203+
" estimator_value.update(transition.state, total_return)\n",
204204
" # Update our policy estimator\n",
205205
" estimator_policy.update(transition.state, advantage, transition.action)\n",
206206
" \n",

0 commit comments

Comments
 (0)