Description
The original shape of this API assumed we would always generate messages in the sequence of: user, assistant, user, assistant, ...
Various changes since then have complicated the situation:
- The addition of
initialPrompts
, which includes multiple messages at once with no response - Allowing multiple consecutive messages with the same role, as shown in this example
- Multimodal input, including the complexities around one message with multiple parts vs. multiple messages discussed in Add multiple modalities in a single message #89.
At this point, we are assuming a model architecture that allows an arbitrary sequence of user + assistant messages, in any order. In such cases, it might be useful to allow appending messages without immediately asking the model to generate a response. This allows different parts of the application to prepare the session by sending some messages, before another part of the application is finally ready to use them to generate a response.
There are two main API proposals for this. A new option to prompt()
, such as
await session.prompt(messages, { delayResponse: true });
vs. a new method, e.g.
await session.append(messages);
I like the second option slightly more:
- This new feature doesn't make sense for
promptStreaming()
, so we'd only include a new option forprompt()
. It's a bit strange to have options that apply toprompt()
but notpromptStreaming()
. - Because web platform boolean options are strongly encouraged to default to false, for the new-option version we have to pick a slightly strange name like my above
{ delayResponse: true }
, instead of something more natural like{ respond: false }
. - It's shorter and clearer that you're doing something different.