introduction to Dueling network

ディープラーニングの最新動向
強化学習とのコラボ編③　Dueling Network
2016/7/5
株式会社ウェブファーマー
大政　孝充

今回取り上げるのはこれ
[1] Z. Wang, et. al “Dueling Network Architectures for Deep
Reinforcement Learning.”
arXiv1511.06581. 2016.
Q値をV値と行動aに分離することにより性能を向上させ
た！

DQNやDDQNの解説は
DQNの解説に関しては私の[2]「ディープラーニングの最新動向　強化
学習とのコラボ編①　DQN」
http://www.slideshare.net/ssuser07aa33/introduction-to-deep-q-learning
DDQNの解説に関しては私の[3]「ディープラーニングの最新動向　強化
学習とのコラボ編②　DDQN」
http://www.slideshare.net/ssuser07aa33/introduction-to-double-deep-
qlearning
などを参考にして下さい

Dueling Networkの仕組み
[1]のFigure 1より
このへんが
特徴
DQN
Dueling
Network

DQNからDueling Networkまで
DQN
2013Nips
評価のQと選択
のQを分ける
DQN
2015Nature
DDQN
Prioritized
Replay
Qを時々コピー
学習用データを
選別？
Dualing
Networks
状態 s と行動 a の
advantageを分ける

まず強化学習の基本から
the value of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Vπ
s( )

the advantage functionを定義
the value of the state-action Qπ
s,a( )= E Rt st = s,at = a,π⎡⎣ ⎤⎦
Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦the value of the state
st
st+1 st+2
st+2st+1
st+1
at
1
at
2
at
3
Qπ
s,a( )
Aπ
s,a( )= Qπ
s,a( )−Vπ
s( )the advantage function
Vπ
s( )
差をとってる
　　から　　　を引いて　　　とする Vπ
Qπ
Aπ

the advantage functionとは
st
st+1
st+1
st+1
at
1
at
2
at
3
Qπ
s,a1
( )= 3
それってどういうこと？
例えば状態　　からの行動　　に対する　　値がそれぞれ・・・
Qπ
s,a2
( )= 4
Qπ
s,a3
( )= 2
・・・の時
st
at Q

st
st+1
st+1
st+1
at
1
at
2
at
3
Qπ
s,a1
( )= 3
はざっくり・・・
Qπ
s,a2
( )= 4
Qπ
s,a3
( )= 2
V Vπ
s( )= E
a≈π a( )
Qπ
s,a( )⎡⎣ ⎤⎦=
3+ 4+ 2
3
= 3
Vπ
s( )

st
st+1
st+1
st+1
at
1
at
2
at
3
Qπ
s,a1
( )= 3
は・・・
Qπ
s,a2
( )= 4
Qπ
s,a3
( )= 2
A Aπ
s,a( )= Qπ
s,a( )−Vπ
s( )=
4−3=1!Aπ
s,a1( )
3−3= 0!Aπ
s,a2( )
2 −3= −1!Aπ
s,a3( )
⎧
⎨
⎪
⎪
⎩
⎪
⎪
となる
Aπ
s,a1
( )
Aπ
s,a3
( )
Aπ
s,a2
( )
Vπ
s( )

Dueling Networkのモデル
st
st+1
st+1
st+1
at
1
at
2
at
3
Vπ
Qπ
Aπ
ここで
ここで
両方足して
実際のモデルではこうなってる

実際の計算
Aの平均を０として足し合わせる
Q s,a;θ,α( )=V s;θ,β( )+ A s,a;θ,β( )−
1
Α
A s,a';θ,α( )
a'
∑
⎛
⎝
⎜⎜
⎞
⎠
⎟⎟
平均を引く
Q s,a;θ,α( )
V s;θ,β( )
A s,a;θ,β( )

introduction to Dueling network

More Related Content

What's hot

Viewers also liked

Similar to introduction to Dueling network

More from WEBFARMER. ltd.

introduction to Dueling network