You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is the code of reinforce.py for action, r in zip(self.saved_actions, rewards): action.reinforce(r)
And there is the code of actor-critic.py: for (action, value), r in zip(saved_actions, rewards): reward = r - value.data[0,0] action.reinforce(reward) value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))
So i consider it is Asynchronous Advantage Actor-Critic, A3C, not Actor-critic
changed the title [-]RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.[/-][+]experience replay of reinforcement_learning/reinforce.py[/+]on Apr 25, 2017
changed the title [-]experience replay of reinforcement_learning/reinforce.py[/-][+]A3C instead of actor-critic in reinforcement_learning/reinforce.py [/+]on Apr 26, 2017
Activity
[-]RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.[/-][+]experience replay of reinforcement_learning/reinforce.py[/+][-]experience replay of reinforcement_learning/reinforce.py[/-][+]A3C instead of actor-critic in reinforcement_learning/reinforce.py [/+]jeasinema commentedon Oct 31, 2017
Yes, I'm partly agree with you, but with a small correction, the algorithm implemented should be an offline
version A2C(Advantage Actor Critic).