TimeDistributed LSTM Middleware

For many real-world situations, the task may have hidden state or partially observable features, making the Markovian assumption only semi-valid.

One way around this is to use frame stacking - doable already in Coach with `filters.observation.observation_stacking_filter`. It may be even better to use LSTM (and bi-directional) LSTM. Agents for this already exist, with the very well cited [DRQN](https://arxiv.org/abs/1507.06527) being one of them.

In Coach currently, there is the `LSTMMiddleware` layer. However, from what I understand of the source code it runs along the observations axis (for inputs such as text). Tensorflow of course has the `TimeDistributed` wrapper (with `return_sequences=True`) to run LSTM along the temporal axis between transitions. 

Could timedistributed LSTM be added as a middleware? (or at the very least "hacked" in, as it would be of immense benefit to my current research, which I am using with a simple behavioural cloning agent)     

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TimeDistributed LSTM Middleware #461

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TimeDistributed LSTM Middleware #461

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions