VAE loss

According to the expression in line 95, the KL-divergence term is calculated from
`0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)`
but I think the code in line 96-97 represents
`0.5 * sum(1 + log(sigma^2) - mu^2 - sigma)`

This might not be essential because whether the last term is squared or not, the loss descending behavior stays unchanged.