Skip to content

Commit 8b7f6dd

Browse files
committed
update
1 parent 41c900e commit 8b7f6dd

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

_posts/2025-04-18-openrlhf-vllm.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ As illustrated above, OpenRLHF uses [Ray’s Placement Group API](https://docs.r
3030
OpenRLHF and vLLM provide a clean and efficient set of APIs to simplify interaction within RLHF pipelines. By implementing a custom `WorkerExtension` class, users can handle weight synchronization between training and inference components. The environment variables `VLLM_RAY_PER_WORKER_GPUS` and `VLLM_RAY_BUNDLE_INDICES` allows fine-grained GPU resource allocation per worker, enabling hybrid engine configurations where multiple components share a GPU group:
3131

3232
```python
33+
# rlhf_utils.py
3334
class ColocateWorkerExtension:
3435
"""
3536
Extension class for vLLM workers to handle weight synchronization.
@@ -55,6 +56,7 @@ class ColocateWorkerExtension:
5556
self.model_runner.model.load_weights(weights=weights)
5657
torch.cuda.synchronize()
5758

59+
# main.py
5860
class MyLLM(LLM):
5961
"""
6062
Custom LLM class to handle GPU resource allocation and bundle indices.
@@ -69,7 +71,7 @@ class MyLLM(LLM):
6971
super().__init__(*args, **kwargs)
7072

7173

72-
# Create placement group for GPU allocation
74+
# Create Ray's placement group for GPU allocation
7375
pg = placement_group([{"GPU": 1, "CPU": 0}] * 4)
7476
ray.get(pg.ready())
7577

@@ -86,7 +88,7 @@ for bundle_indices in [[0, 1], [2, 3]]:
8688
tensor_parallel_size=2,
8789
distributed_executor_backend="ray",
8890
gpu_memory_utilization=0.4,
89-
worker_extension_cls="__main__.ColocateWorkerExtension",
91+
worker_extension_cls="rlhf_utils.ColocateWorkerExtension",
9092
bundle_indices=bundle_indices
9193
)
9294
inference_engines.append(llm)

0 commit comments

Comments
 (0)