-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Hi,
I was looking at the weight initialization and it looks like you use xavier_uniform_ for the main LSTM input-to-hidden-weights
Line 203 in 7f50825
| init.xavier_uniform_(self.linear_ih.weight.data) |
While in the paper they define that both the input-to-hidden-weights and the hidden-to-hidden-weights should use Orthogonal initialization. On page 20 Section A.2.3
Orthogonal initialization is applied to the Wh and Wx
Although I am not sure if the tensorflow implementation follows the paper or not, could you elaborate on why you decided to use xavier uniform or was that just a copy of the tensorflow implementation of the model, or possibly an error in the code?
Metadata
Metadata
Assignees
Labels
No labels