-
Notifications
You must be signed in to change notification settings - Fork 128
Description
I tried running your code with the default setup given. I get the following results:
Model=LLama-3.2-1B
Reward model=RLHF LLama model
search_batch_size=25
n=4
4xV100 32GB GPUs
seed=42
search=BoN
n acc_naive acc_weighted acc_maj
0 1 24.6 24.6 24.6
1 2 31.4 31.4 24.6
2 4 33.4 35.0 31.0
They have the same increasing trend for TTC.
But why are they slightly different than the ones you report for BoN? Does the seed have an impact on the results?
I see this comment in your docs: # Repeat for seeds 0-4 - Why do you suggest so?
Looking at the results you report, I see that the scores noticeably change from one seed to another. Is there a good seed that we should choose?
Note: I did this change to get the model to run on my V100:
I believe that should not cause a huge difference.
llm = LLM(
model=config.model_path,
gpu_memory_utilization=config.gpu_memory_utilization,
enable_prefix_caching=True,
seed=config.seed,
tensor_parallel_size=num_gpus,
# V100 change
dtype = "float"
)