@@ -86,7 +86,7 @@ Extract as much static shape information from a tensor as possible.
8686 statically-known number of dimensions.
8787
8888
89- ### [ ` categorical_dist_double_qlearning(atoms_tm1, logits_q_tm1, a_tm1, r_t, pcont_t, atoms_t, logits_q_t, q_t_selector, name='CategoricalDistDoubleQLearning') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/dist_value_ops.py?l=149 ) <!-- RULE: categorical_dist_double_qlearning .code-reference -->
89+ ### [ ` categorical_dist_double_qlearning(atoms_tm1, logits_q_tm1, a_tm1, r_t, pcont_t, atoms_t, logits_q_t, q_t_selector, name='CategoricalDistDoubleQLearning') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/dist_value_ops.py?l=148 ) <!-- RULE: categorical_dist_double_qlearning .code-reference -->
9090
9191Implements Distributional Double Q-learning as TensorFlow ops.
9292
@@ -132,7 +132,7 @@ Hessel, Modayil, van Hasselt, Schaul et al.
132132* ` ValueError ` : If the tensors do not have the correct rank or compatibility.
133133
134134
135- ### [ ` categorical_dist_qlearning(atoms_tm1, logits_q_tm1, a_tm1, r_t, pcont_t, atoms_t, logits_q_t, name='CategoricalDistQLearning') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/dist_value_ops.py?l=74 ) <!-- RULE: categorical_dist_qlearning .code-reference -->
135+ ### [ ` categorical_dist_qlearning(atoms_tm1, logits_q_tm1, a_tm1, r_t, pcont_t, atoms_t, logits_q_t, name='CategoricalDistQLearning') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/dist_value_ops.py?l=73 ) <!-- RULE: categorical_dist_qlearning .code-reference -->
136136
137137Implements Distributional Q-learning as TensorFlow ops.
138138
@@ -172,7 +172,7 @@ Dabney and Munos. (https://arxiv.org/abs/1707.06887).
172172* ` ValueError ` : If the tensors do not have the correct rank or compatibility.
173173
174174
175- ### [ ` categorical_dist_td_learning(atoms_tm1, logits_v_tm1, r_t, pcont_t, atoms_t, logits_v_t, name='CategoricalDistTDLearning') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/dist_value_ops.py?l=231 ) <!-- RULE: categorical_dist_td_learning .code-reference -->
175+ ### [ ` categorical_dist_td_learning(atoms_tm1, logits_v_tm1, r_t, pcont_t, atoms_t, logits_v_t, name='CategoricalDistTDLearning') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/dist_value_ops.py?l=230 ) <!-- RULE: categorical_dist_td_learning .code-reference -->
176176
177177Implements Distributional TD-learning as TensorFlow ops.
178178
@@ -617,7 +617,7 @@ The update rule is:
617617 An op that periodically updates ` target_variables ` with ` source_variables ` .
618618
619619
620- ### [ ` periodically(body, period, name='periodically') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/periodic_ops.py?l=34 ) <!-- RULE: periodically .code-reference -->
620+ ### [ ` periodically(body, period, counter=None, name='periodically') ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/periodic_ops.py?l=34 ) <!-- RULE: periodically .code-reference -->
621621
622622Periodically performs a tensorflow op.
623623
@@ -637,6 +637,10 @@ If `period` is 0 or `None`, it would not perform any op and would return a
637637 an internal counter is divisible by the period. The op must have no
638638 output (for example, a tf.group()).
639639* ` period ` : inverse frequency with which to perform the op.
640+ * ` counter ` : an optional tensorflow variable to use as a counter relative to the
641+ period. It will be incremented per call and reset to 1 in every update. In
642+ order to ensure that ` body ` is run in the first count, initialize the
643+ counter at a value bigger than ` period ` .
640644* ` name ` : name of the variable_scope.
641645
642646##### Raises:
@@ -685,7 +689,7 @@ by Bellemare, Ostrovski, Guez et al. (https://arxiv.org/abs/1512.04860).
685689 * ` td_error ` : batch of temporal difference errors, shape ` [B] ` .
686690
687691
688- ### [ ` pixel_control_loss(observations, actions, action_values, cell_size, discount_factor, scale, crop_height_dim=(None, None), crop_width_dim=(None, None)) ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/pixel_control_ops.py?l=92 ) <!-- RULE: pixel_control_loss .code-reference -->
692+ ### [ ` pixel_control_loss(observations, actions, action_values, cell_size, discount_factor, scale, crop_height_dim=(None, None), crop_width_dim=(None, None)) ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/pixel_control_ops.py?l=95 ) <!-- RULE: pixel_control_loss .code-reference -->
689693
690694Calculate n-step Q-learning loss for pixel control auxiliary task.
691695
@@ -735,7 +739,7 @@ Mnih, Czarnecki et al. (https://arxiv.org/abs/1611.05397).
735739 the pseudo-rewards derived from the observations.
736740
737741
738- ### [ ` pixel_control_rewards(observations, cell_size) ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/pixel_control_ops.py?l=42 ) <!-- RULE: pixel_control_rewards .code-reference -->
742+ ### [ ` pixel_control_rewards(observations, cell_size) ` ] ( https://github.com/deepmind/trfl/blob/master/trfl/pixel_control_ops.py?l=41 ) <!-- RULE: pixel_control_rewards .code-reference -->
739743
740744Calculates pixel control task rewards from observation sequence.
741745
0 commit comments