Closed
Description
I suggest we can add an alpha
parameter in the enable_lora
function. Currently, only the rank
can be specified in the function.
Typically, LoRA decomposes the weight update into two matrices A and B and then multiply their product by α/rank to control its magnitude.
For backward compatibility, we can set the default alpha
= rank
. That way, if you were already using LoRA with a given rank, setting the same rank for alpha reproduces the old behavior exactly.
And I think this could be implemented by multiplying the product of matrix A and B by (self.alpha
/ self.rank
) when calling the kernel.