|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Reinforcement Learning" |
| 8 | + ] |
| 9 | + }, |
| 10 | + { |
| 11 | + "cell_type": "markdown", |
| 12 | + "metadata": {}, |
| 13 | + "source": [ |
| 14 | + "This notebook serves as the supporting material for the chapter **Reinforcement Learning**. Here we'll examine how an agent can learn what to do in the absence of labeled examples of what to do, from rewards and punishments. This notebook illustrates the use of the [reinforcement](https://github.com/aimacode/aima-java/tree/AIMA3e/aima-core/src/main/java/aima/core/learning/reinforcement) package of the code repository. So let's begin with \"What is reinforcement?\". " |
| 15 | + ] |
| 16 | + }, |
| 17 | + { |
| 18 | + "cell_type": "markdown", |
| 19 | + "metadata": {}, |
| 20 | + "source": [ |
| 21 | + "Consider an example of a problem of learning chess. A supervised agent needs to be told the correct move for each position it encounters, but such feedback is seldom available. Therefore, in the absence of feedback, the agent needs to know, that something good has happened when it accidentally checkmates its opponent and that something bad has happened when it gets checkmated. This kind of feedback is called a **reward** or **reinforcement**. Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer label with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent itself decides what to do to perform the given task. In the absence of the training dataset, it is bound to learn from its experience. \n", |
| 22 | + "\n", |
| 23 | + "Usually, in game playing, it is very hard for a human to provide accurate and consistent evaluations of a large number of positions. Therefore, the program is told when it has won or lost, and the agent uses this information to learn a reasonably accurate evaluation function." |
| 24 | + ] |
| 25 | + }, |
| 26 | + { |
| 27 | + "cell_type": "code", |
| 28 | + "execution_count": 1, |
| 29 | + "metadata": {}, |
| 30 | + "outputs": [ |
| 31 | + { |
| 32 | + "data": { |
| 33 | + "application/vnd.jupyter.widget-view+json": { |
| 34 | + "model_id": "8dfc4597-806f-4c74-bc22-dcb502bbb568", |
| 35 | + "version_major": 2, |
| 36 | + "version_minor": 0 |
| 37 | + }, |
| 38 | + "method": "display_data" |
| 39 | + }, |
| 40 | + "metadata": {}, |
| 41 | + "output_type": "display_data" |
| 42 | + } |
| 43 | + ], |
| 44 | + "source": [ |
| 45 | + "%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar" |
| 46 | + ] |
| 47 | + }, |
| 48 | + { |
| 49 | + "cell_type": "code", |
| 50 | + "execution_count": null, |
| 51 | + "metadata": {}, |
| 52 | + "outputs": [], |
| 53 | + "source": [] |
| 54 | + } |
| 55 | + ], |
| 56 | + "metadata": { |
| 57 | + "kernelspec": { |
| 58 | + "display_name": "Groovy", |
| 59 | + "language": "groovy", |
| 60 | + "name": "groovy" |
| 61 | + }, |
| 62 | + "language_info": { |
| 63 | + "codemirror_mode": "groovy", |
| 64 | + "file_extension": ".groovy", |
| 65 | + "mimetype": "", |
| 66 | + "name": "Groovy", |
| 67 | + "nbconverter_exporter": "", |
| 68 | + "version": "2.4.3" |
| 69 | + }, |
| 70 | + "toc": { |
| 71 | + "base_numbering": 1, |
| 72 | + "nav_menu": {}, |
| 73 | + "number_sections": false, |
| 74 | + "sideBar": false, |
| 75 | + "skip_h1_title": false, |
| 76 | + "title_cell": "Table of Contents", |
| 77 | + "title_sidebar": "Contents", |
| 78 | + "toc_cell": false, |
| 79 | + "toc_position": {}, |
| 80 | + "toc_section_display": false, |
| 81 | + "toc_window_display": false |
| 82 | + } |
| 83 | + }, |
| 84 | + "nbformat": 4, |
| 85 | + "nbformat_minor": 2 |
| 86 | +} |
0 commit comments