This is a collection of Python (numpy + tensorflow) implementations of common RL algorithms.
- MDP Solutions - Value Iteration, Policy Iteration, Fitted Value iteration through function approximation, Policy Gradient
- Model-free Solutions - Q-Iteration, Q-Learning, Monte-Carlo Policy iteration, REINFORCE (Vanilla policy gradient), SARSA, n-Step SARSA, SARSA-Lambda, Actor-Critic, Deep Q-Network