-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
- In
_dvts, the finalization step appends allbeam_widthcandidates from the last iteration to the outputs for each beam, without checking whether each candidate is actually completed (e.g., EOS-terminated). As a result, the final result set contains N completions per beam, some of which are not guaranteed to be finished.
Where in code
# end of _dvts
output: list[Beam] = []
for beam in beams:
for i in range(config.beam_width):
output.append(
Beam(
prompt=beam.prompt,
index=beam.index,
current_text=beam.previous_text + beam.next_texts[i],
next_texts=None,
lookahead_texts=None,
stop_reasons=None,
best_scores=beam.all_scores[i],
all_scores=beam.all_scores,
previous_text=beam.current_text,
pruned=beam.pruned,
history=beam.history,
)
)
)
This runs when the selected path for that beam ended on EOS and the beam consequently is marked pruned, but the other candidates may not be ended.
Why it matters
- Returning N last-step candidates “as is” mixes completed and non-complete continuations, which increases parsing failures downstream.
I ran more than 40 DVTS experiments with 3-8-70B generative models and various RMs with a custom DeepSpeed DVTS version of your code, which contains the same issue.
- Unparsable completions (%):
- DVTS: 21.9±15.8
- Best-of-N: 10.4±20.4
Note: the high std could be because of the different models I tested from very small to big on different datasets.
Of course this at big N is not noticeable as the final selection will always pick something parsable.
Suggested fix
- Only return completed candidates, or at least only the selected candidate when a beam is EOS-pruned; even better would be to make it adaptable to this, so we can continue the generation and have real N solutions.
Please let me know if it's an actual problem or I am mistaken somehow :)
Metadata
Metadata
Assignees
Labels
No labels