I am currently learning to do differential expression analysis on bulk and pseudobulk RNAseq data with limma::voom. In the function documentation it says
Note that edgeR::voomLmFit is now recommended over voom for sparse counts with a medium to high proportion of zeros.
However, edgeR::voomLmFit does not seem to be very often used by the community (yet?), is not included in the standard limma tutorials, and is not an option in the pseudo-bulk differential state analysis function of the muscat package (muscat::pbDS), which makes me a bit hesitant.
Is there a specific reason to not use voomLmFit at least if the data is sparse, if not always, instead of limma::voom + limma::lmFit? The paper and the documentation don't mention any downsides.
Reitering James' answer, there is no reason not to use
voomLmFit
if you were planning to followvoom
bylmFit
. The reason whyvoomLmFit
isn't more widely used is because it is relatively new, because we haven't yet publicized it in a publication and because it is in the edgeR package rather than limma and therefore a bit hidden. The muscat package and the limma tutorials were all written beforevoomLmFit
existed.Just wondering--why was this function put into
edgeR
rather thanlimma
? Maybe a bit technical, but it is a little confusing from an outside perspective. Also wondering why it isn't mentioned in the edgeR v4 publication in Nucleic Acids Research (I suppose because it's more part of the limma workflow than edgeR?)voomLmFit() is in edgeR instead of limma because it needs to call edgeR::glmFit() internally to determine which of the zero observations also correspond to zero fitted values. If we put the function in limma with an import of glmFit(), then it would make limma dependent on edgeR, meaning that everyone who used limma for any purpose would have to install edgeR as well. I have always tried to keep the limma load as lightweight as possible, so that people installing it for a particular purpose don't have to install lots of dependencies unrelated to their application.
An alternative strategy would have been to put voomLmFit() in limma but with an internal check and load of edgeR::glmFit availability, which would make edgeR a suggested package for limma. I am still seriously considering doing it that way. In this approach, the voomLmFit() function would do the load of edgeR::glmFit() itself, and would give an error message if edgeR was not an installed package. (limma used to treat the statmod package in this way, although I converted that to a preemptive import a few years ago.)
voomLmFit() is not mentioned in the edgeR v4 paper because it is not part of any edgeR pipeline.
See also this paper, which is the first to voomLmFit explicitly: https://www.biorxiv.org/content/10.1101/2025.04.07.647659v1
Thank you for the context and the preprint!