-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hi, thanks a lot for releasing this repo and the description of LPLB!
From the README, my understanding is that LPLB does a linear-programming (LP) optimization over a graph where:
- each redundant expert is linked to an original expert, forming edges between GPUs, and
- each edge has a capacity equal to the number of tokens assigned to the redundant expert in the current batch,
- and LPLB redistributes tokens along these edges while respecting capacities, to minimize load imbalance within an EP group.
This sounds very similar to a network-flow / min-cost-max-flow style formulation: variables as flows on edges, capacity constraints, and a linear objective for load balancing.
I have two questions:
-
Conceptually, do you see LPLB as essentially a network-flow formulation implemented via a general LP solver, or are there key constraints/objectives that make it fundamentally more general than standard min-cost-max-flow?
-
Are there specific reasons (algorithmic or engineering) you chose to present/implement it as a general LP instead of explicitly as a network-flow problem (e.g., easier to add extra linear constraints, use off-the-shelf LP solvers, etc.)?
For context, I’ve been working on a related idea where MoE load balancing is explicitly modeled as a network-flow / min-cost-max-flow problem. The paper is:
- Maximum Score Routing For Mixture-of-Experts (https://aclanthology.org/2025.findings-acl.653/).
The formulation is not identical to LPLB, but conceptually it also uses capacities on edges between experts and a flow-based objective to minimize imbalance.
If you find it relevant, I’d be very happy if you consider it as related work in future revisions of your paper, and I’d also really appreciate any comments on how you view the relationship between LPLB and a “pure” network-flow formulation.
Thanks again for the great work and for open-sourcing the implementation!
(Chinese summary) 简单说明一下:我在自己的工作中也尝试用网络流 / 最小费用流的方式来建模 MoE 的负载均衡,因此很好奇 LPLB 与网络流建模在本质上有哪些异同。如果您觉得有参考价值,也非常欢迎在后续版本中将这篇工作加入相关工作部分。再次感谢!