3D environment:

For more details, the code can be found in ./lib/env/threedmountain_car.py
Currently the rendering only shows the x and z axises(i.e. The 3D graphics is projected onto the (0,1,0) plane)

action space = Discrete(5)
0 = neutral
1 = push left along x = west
2 = push right along x = east
3 = push left along y = south
4 = push right along y = north

observation space = Box(4,)
obs[0] = x
obs[1] = y
obs[2] = x_dot
obs[3] = y_dot

done is defined as:
	x and y are both greater or equal than the goal position (goal position = 0.5)
	
yellow flag: shows x axis projection
cyan flag: shows y axis projection

3D env test:

The test is performed using Q learning. 
The relevant files are:
	3D mountaincar Q learning.ipynb
	3dmountaincar_qlearning.py
	
They are the same code, just in different formats

Taylor's MASTER algorithm method 1 (obtaining the mapping)

Algorithm

1. Do Qlearning on source task (2d mountain car env) and obtain a replay memory (currently 100000 long)
2. Do RandomAction on target task (3d mountain car env) and obtain a replay memory (currently 100000 long)
3. Train neural nets to get target one-step transition model 
4. Find MSE for source task replay memory using one-step transition model
5. For all combinations of states/actions mappings for (4) find out which one has the least MSE and use that as the state/action mappings

Taylor's MASTER algorithm method 2 (Q-Value Re-use the mapping)

Algorithm

1. Get Agent's current state
2. Choose a(t) as the current action to evaluate (3d mountain car env)
3. For each source action a(s), calculate SUM += 1/MSE(at, as)
4. For each source action a(s):
    1. Q(s, a(t)) += Q(x, x_dot, a(s))*1/SUM*1/MSE(at, as)
    2. Q(s, a(t)) += Q(y, y_dot, a(s))*1/SUM*1/MSE(at, as)
5. Q(a, a(t)) += Q(x, y, x_dot, y_dot, a(t))

Relevant Graphs to compare to non-transfer case

1. # of Episodes vs. Average reward

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
data		data
img		img
lib		lib
papers		papers
videos		videos
.gitignore		.gitignore
2D MountainCar Playground.ipynb		2D MountainCar Playground.ipynb
3D MountainCar playground.ipynb		3D MountainCar playground.ipynb
3D mountaincar Q learning.ipynb		3D mountaincar Q learning.ipynb
3d_mountaincar_env_test.py		3d_mountaincar_env_test.py
3dmountaincar_qlearning.py		3dmountaincar_qlearning.py
MASTER.py		MASTER.py
MASTER_v2.py		MASTER_v2.py
QLearning_unit_test.py		QLearning_unit_test.py
README.md		README.md
dsource_qlearn.npz		dsource_qlearn.npz
dsource_random.npz		dsource_random.npz
dtarget_random.npz		dtarget_random.npz
generate_instances.py		generate_instances.py
mountain cart - Q-Learning with Value Function Approximation Solution.ipynb		mountain cart - Q-Learning with Value Function Approximation Solution.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

3D environment:

3D environment:

3D env test:

Taylor's MASTER algorithm method 1 (obtaining the mapping)

Algorithm

Taylor's MASTER algorithm method 2 (Q-Value Re-use the mapping)

Algorithm

Relevant Graphs to compare to non-transfer case

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

TransferRL/taylor_improved

Folders and files

Latest commit

History

Repository files navigation

3D environment:

3D environment:

3D env test:

Taylor's MASTER algorithm method 1 (obtaining the mapping)

Algorithm

Taylor's MASTER algorithm method 2 (Q-Value Re-use the mapping)

Algorithm

Relevant Graphs to compare to non-transfer case

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages