Skip to content

DVTS finalization adds N last-step candidates regardless of completion status #57

@davromano

Description

@davromano
  • In _dvts, the finalization step appends all beam_width candidates from the last iteration to the outputs for each beam, without checking whether each candidate is actually completed (e.g., EOS-terminated). As a result, the final result set contains N completions per beam, some of which are not guaranteed to be finished.

Where in code

# end of _dvts
output: list[Beam] = []
for beam in beams:
    for i in range(config.beam_width):
        output.append(
            Beam(
                prompt=beam.prompt,
                index=beam.index,
                current_text=beam.previous_text + beam.next_texts[i],
                next_texts=None,
                lookahead_texts=None,
                stop_reasons=None,
                best_scores=beam.all_scores[i],
                all_scores=beam.all_scores,
                previous_text=beam.current_text,
                pruned=beam.pruned,
                history=beam.history,
            )
        )
)

This runs when the selected path for that beam ended on EOS and the beam consequently is marked pruned, but the other candidates may not be ended.

Why it matters

  • Returning N last-step candidates “as is” mixes completed and non-complete continuations, which increases parsing failures downstream.

I ran more than 40 DVTS experiments with 3-8-70B generative models and various RMs with a custom DeepSpeed DVTS version of your code, which contains the same issue.

  • Unparsable completions (%):
    • DVTS: 21.9±15.8
    • Best-of-N: 10.4±20.4

Note: the high std could be because of the different models I tested from very small to big on different datasets.

Of course this at big N is not noticeable as the final selection will always pick something parsable.

Suggested fix

  • Only return completed candidates, or at least only the selected candidate when a beam is EOS-pruned; even better would be to make it adaptable to this, so we can continue the generation and have real N solutions.

Please let me know if it's an actual problem or I am mistaken somehow :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions