I encountered an issue while implementing the code: initially, the probability of adding CoT is always lower than not adding CoT, and a good CoT does not necessarily lead to a higher probability. It eventually converges to an empty CoT. What could be the problem?