Closed

Description
According to the expression in line 95, the KL-divergence term is calculated from
0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
but I think the code in line 96-97 represents
0.5 * sum(1 + log(sigma^2) - mu^2 - sigma)
This might not be essential because whether the last term is squared or not, the loss descending behavior stays unchanged.
Metadata
Metadata
Assignees
Labels
No labels