This slide introduces the model which is one of the deep Q network. Dueling Network is the successor model of DQN or DDQN. You can easily understand the architecture of Dueling Network.
まず強化学習の基本から
the value ofthe state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Vπ
s( )
7.
the advantage functionを定義
thevalue of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Aπ
s,a( )= Qπ
s,a( )−Vπ
s( )the advantage function
Vπ
s( )
差をとってる
から を引いて とする Vπ
Qπ
Aπ