第四课无模型的预测 | Pre-Demo-Field #54

Duan-JM · 2019-03-10T16:35:09Z

https://vdeamov.github.io/2019/03/11/%E7%AC%AC%E5%9B%9B%E8%AF%BE-%E6%97%A0%E6%A8%A1%E5%9E%8B%E7%9A%84%E9%A2%84%E6%B5%8B/

--- layout: post title: "第四课无模型的预测" date: 2019-03-11 categories: ReinforceLearning tags: ["ReinforceLearning", "强化学习"] --- 第四课无模型的预测这一课帅小哥主要讲的内容是预测的部分，在第五课会加入控制的部分。其中预测的部分主要是两个相似的算法，一个为 Monte-Carlo（MC），另一个为 Temporal-Difference（TD）。两者的区别主要在于，MC 为需要在出现终止状态后，才能得到 Reward，而 TD 则是实时的。

Duan-JM added gitalk labels Mar 10, 2019

Duan-JM closed this as completed Apr 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

第四课无模型的预测 | Pre-Demo-Field #54

第四课无模型的预测 | Pre-Demo-Field #54

Duan-JM commented Mar 10, 2019

第四课 无模型的预测 | Pre-Demo-Field #54

第四课 无模型的预测 | Pre-Demo-Field #54

Comments

Duan-JM commented Mar 10, 2019

第四课无模型的预测 | Pre-Demo-Field #54

第四课无模型的预测 | Pre-Demo-Field #54