Multi Brain Training and Recurrent state encoder #166

vincentpierre · 2017-12-09T01:33:47Z

Modifying the trainer class to be more flexible.
Added some TODO elements.
Added inline docs to the buffer.
Enables multi brain training.

Note that work needs to be done for discrete and observations if you want to use recurrent state encoding, pass in --use-recurrent and --sequence-length=<n> in the options of ppo

…process

…d of **python ppo.py**

…r training

improved the curriculum (now reset takes lesson and not progress as input improved the learn.py : now uses a configuration file the graph scope is now displayed after training if there is only one brain, empty graph scope is used improved on CoreInternalBrain so that recurrent_in and now are used

romerocesar

throughout the code use spaces instead of tabs

romerocesar · 2018-01-17T02:06:48Z

python/learn.py

+import json
+from trainers.ppo_models import *
+from trainers.ppo_trainer import Trainer
+from unityagents import UnityEnvironment, UnityEnvironmentException


clean up imports according to pep8 for improved redability:

sort asciibetically

separate stdlib from 3rd party from out code

https://www.python.org/dev/peps/pep-0008/#imports

romerocesar · 2018-01-17T02:07:02Z

python/learn.py

+
+
+
+


remove empty lines

romerocesar · 2018-01-17T02:08:41Z

python/learn.py

+'''
+
+options = docopt(_USAGE)
+print(options)


fixme
don't print, log. Besides, logging all options should probably be log.debug and not sent to stdout every time

all it takes is logging.basicConfig() to add a basic console output to the root logger:

import logging logging.basicConfig(level=logging.DEBUG) options = docopt(_USAGE) logging.debug(options)

romerocesar · 2018-01-17T02:10:29Z

python/learn.py

+
+env = UnityEnvironment(file_name=env_name, worker_id=worker_id, curriculum=curriculum_file)
+env.curriculum.set_lesson_number(lesson)
+print(str(env))


fixme
log > print

romerocesar · 2018-01-17T02:11:14Z

python/learn.py

+    else:
+        return None
+
+try:


don't throw functions in between code - put script logic under if __name__ == '__main__' for readability

romerocesar · 2018-01-17T03:25:10Z

python/learn.py

+# use_recurrent = options['--use-recurrent']
+# sequence_length = int(options['--sequence-length'])
+# summary_freq = int(options['--summary-freq'])
+# run_path = str(options['--run-path'])


fixme is this dead code that should be removed or relevant documentation that should move into the module's docstring?

romerocesar · 2018-01-17T04:09:05Z

python/trainers/buffer.py

+					self[k].reset_field()
+				except:
+					print(k)
+		def __getitem__(self, key):


add empty line between functions

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:12:38Z

python/trainers/buffer.py

+			if key not in self.keys():
+				self[key] = self.AgentBufferField()
+			return super(Buffer.AgentBuffer, self).__getitem__(key)
+		def check_length(self, key_list):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:12:45Z

python/trainers/buffer.py

+					return False
+				l = len(self[key])
+			return True
+		def shuffle(self, key_list = None):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:27:08Z

python/trainers/buffer.py

+			for key in key_list:
+				if key not in self.keys():
+					return False
+				if ((l != None) and (l!=len(self[key]))):


you can omit the left side of the and - it's always safe to compare l to the output of len(self[key]) since that call can never return None

romerocesar

pending ppo_trainer.py

romerocesar · 2018-01-17T04:29:27Z

python/trainers/buffer.py

+				# 	self[key] += l
+				# self[key] = Buffer.AgentBuffer.AgentBufferField([self[key][i] for i in s])
+				# self[key].reorder(s)
+	def __init__(self):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:29:38Z

python/trainers/buffer.py

+	def __init__(self):
+		self.global_buffer = self.AgentBuffer() 
+		super(Buffer, self).__init__()
+	def __str__(self):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:29:46Z

python/trainers/buffer.py

+	def __str__(self):
+		return "global buffer :\n\t{0}\nlocal_buffers :\n{1}".format(str(self.global_buffer), 
+			'\n'.join(['\tagent {0} :{1}'.format(k, str(self[k])) for k in self.keys()]))
+	def __getitem__(self, key):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:29:54Z

python/trainers/buffer.py

+		if key not in self.keys():
+			self[key] = self.AgentBuffer()
+		return super(Buffer, self).__getitem__(key)
+	def append_BrainInfo(self, info):


stick to snake_case and not UpperCamelCase in function names

https://www.python.org/dev/peps/pep-0008/#function-names

blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:31:41Z

python/trainers/buffer.py

+		raise BufferException("This method is not yet implemented")
+		# TODO: Find how useful this would be
+		# TODO: Implementation
+	def reset_global(self):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:32:31Z

python/trainers/buffer.py

+		agent_ids = list(self.keys())
+		for k in agent_ids:
+			self[k].reset_agent()
+	def append_global(self, agent_id ,key_list = None,  batch_size = None, training_length = None):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:36:02Z

python/trainers/buffer.py

+			key_list = self[agent_id].keys()
+		if not self[agent_id].check_length(key_list):
+			raise BufferException("The length of the fields {0} for agent {1} where not of comparable length"
+				.format(key_list, agent_id))


consider using f-strings in python 3.6+

raise BufferException(f'The length of the fields {key_list} for agent {agent_id} where not of comparable length')

https://www.python.org/dev/peps/pep-0498/#abstract

p.s. that message is ambiguous because lengths can always be compared. what does it mean for those to not be comparable as the exception claims?

romerocesar · 2018-01-17T04:36:31Z

python/trainers/buffer.py

+			self.global_buffer[field_key].extend(
+				self[agent_id][field_key].get_batch(batch_size =batch_size, training_length =training_length)
+			)
+	def append_all_agent_batch_to_global(self, key_list = None,  batch_size = None, training_length = None):


blank lines

https://www.python.org/dev/peps/pep-0008/#blank-lines

romerocesar · 2018-01-17T04:38:08Z

python/trainers/buffer.py

+			self.append_global(agent_id ,key_list,  batch_size, training_length)
+
+
+#TODO: Put these functions into a utils class


actually remove the TODO. Utils class are rarely justified because they are abused and become a soup of ambiguous functions w/o a home. Find a proper home for these functions rather than creating technical debt that utils modules typically are

romerocesar · 2018-01-17T04:38:34Z

python/trainers/ppo_models.py

        self.new_reward = tf.placeholder(shape=[], dtype=tf.float32, name='new_reward')
        self.update_reward = tf.assign(self.last_reward, self.new_reward)

+    def create_recurrent_encoder(self, s_size, input_state):


fixme needs docstring

added docstring

romerocesar

looks like there are many new functions w/o unit tests. We should invest in writing those tests to avoid creating technical debt

romerocesar · 2018-01-17T19:44:14Z