We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://vdeamov.github.io/2019/03/11/%E7%AC%AC%E5%9B%9B%E8%AF%BE-%E6%97%A0%E6%A8%A1%E5%9E%8B%E7%9A%84%E9%A2%84%E6%B5%8B/
--- layout: post title: "第四课 无模型的预测" date: 2019-03-11 categories: ReinforceLearning tags: ["ReinforceLearning", "强化学习"] --- 第四课 无模型的预测 这一课帅小哥主要讲的内容是预测的部分,在第五课会加入控制的部分。其中预测的部分主要是两个相似的算法,一个为 Monte-Carlo(MC),另一个为 Temporal-Difference(TD)。两者的区别主要在于,MC 为需要在出现终止状态后,才能得到 Reward,而 TD 则是实时的。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
https://vdeamov.github.io/2019/03/11/%E7%AC%AC%E5%9B%9B%E8%AF%BE-%E6%97%A0%E6%A8%A1%E5%9E%8B%E7%9A%84%E9%A2%84%E6%B5%8B/
--- layout: post title: "第四课 无模型的预测" date: 2019-03-11 categories: ReinforceLearning tags: ["ReinforceLearning", "强化学习"] --- 第四课 无模型的预测 这一课帅小哥主要讲的内容是预测的部分,在第五课会加入控制的部分。其中预测的部分主要是两个相似的算法,一个为 Monte-Carlo(MC),另一个为 Temporal-Difference(TD)。两者的区别主要在于,MC 为需要在出现终止状态后,才能得到 Reward,而 TD 则是实时的。
The text was updated successfully, but these errors were encountered: