Are there any plans to support true tensor parallelism in the future? #13013

lingyezhixing · 2025-04-18T19:42:37Z

lingyezhixing
Apr 18, 2025

The current performance overhead for multi-GPU inference is substantial. What optimization methods are available?