Skip to content

Allow appending messages without receiving a response #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
domenic opened this issue Mar 28, 2025 · 15 comments · Fixed by #95
Closed

Allow appending messages without receiving a response #92

domenic opened this issue Mar 28, 2025 · 15 comments · Fixed by #95
Labels
enhancement New feature or request

Comments

@domenic
Copy link
Collaborator

domenic commented Mar 28, 2025

The original shape of this API assumed we would always generate messages in the sequence of: user, assistant, user, assistant, ...

Various changes since then have complicated the situation:

  • The addition of initialPrompts, which includes multiple messages at once with no response
  • Allowing multiple consecutive messages with the same role, as shown in this example
  • Multimodal input, including the complexities around one message with multiple parts vs. multiple messages discussed in Add multiple modalities in a single message #89.

At this point, we are assuming a model architecture that allows an arbitrary sequence of user + assistant messages, in any order. In such cases, it might be useful to allow appending messages without immediately asking the model to generate a response. This allows different parts of the application to prepare the session by sending some messages, before another part of the application is finally ready to use them to generate a response.

There are two main API proposals for this. A new option to prompt(), such as

await session.prompt(messages, { delayResponse: true });

vs. a new method, e.g.

await session.append(messages);

I like the second option slightly more:

  • This new feature doesn't make sense for promptStreaming(), so we'd only include a new option for prompt(). It's a bit strange to have options that apply to prompt() but not promptStreaming().
  • Because web platform boolean options are strongly encouraged to default to false, for the new-option version we have to pick a slightly strange name like my above { delayResponse: true }, instead of something more natural like { respond: false }.
  • It's shorter and clearer that you're doing something different.
@domenic domenic added the enhancement New feature or request label Mar 28, 2025
domenic added a commit that referenced this issue Apr 2, 2025
@clarkduvall
Copy link

If we end up going with the separate method approach which I also slightly prefer, another more extreme option is to get rid of the ability for the prompt() method to add input at all so the only way to add input is through the append() method (we may want to rename prompt() to generate() or something like that at that point). So then a normal session would be:

// Add some context.
session.append(messages);
...
// Add some more context.
session.append(moreMessages);
// Now get the output.
await session.generate();

@domenic
Copy link
Collaborator Author

domenic commented Apr 8, 2025

I can see the appeal of that, although I'd keep prompt() as syntactic sugar around append() + generate(), for the simple case.

@lozy219
Copy link
Contributor

lozy219 commented Apr 17, 2025

For the abort signal, the spec says

Since append()ed prompts are not responded to immediately, they can be aborted until a subsequent call to prompt() or promptStreaming() happens and that response has been finished.

Does it mean that we can abort a specific appended prompt any time before the response is generated.

For example, if the user makes these calls one by one:

session.append("1");
session.append("2", signal);
session.append("3");

signal.abort();
session.prompt();

Which of the followings happens to the prompt()?
a. the prompt is sent with input "123"
b. the prompt is sent with input "13"
c. the prompt itself is aborted

I assume it's b. But what if context overflow happens when appending "2"?


I think this leads to another question: are these appended inputs added to the prompt history one by one or as a whole? In other words, assuming the session can only keep 2 characters, if the following calls are made

session.append("1");
session.append("2");
session.append("3"); // context overflow here

session.prompt();

Is it equivalent to sending a prompt with "23", or it would throw a QuotaExceeded error?

@domenic
Copy link
Collaborator Author

domenic commented Apr 17, 2025

Does it mean that we can abort a specific appended prompt any time before the response is generated.

Yes, that's what the paragraph was intending to convey! Let me know if there's some phrasing you think would be clearer.

Which of the followings happens to the prompt()?

b was my intention

But what if context overflow happens when appending "2"?

I guess in that case we have to send just 3.

Is it equivalent to sending a prompt with "23", or it would throw a QuotaExceeded error?

Good question. I think it should be equivalent to sending "23". That maps to the natural implementation strategy of each append call mapping to a call into the language model. Otherwise we'd have to do some sort of batching on the frontend, which seems fragile and unpredictable.

@lozy219
Copy link
Contributor

lozy219 commented Apr 18, 2025

Thanks for the clarification.

Let me know if there's some phrasing you think would be clearer.

I think the current one is quite clear, I just wanted to confirm on some details.

There is another quesiton regarding the Promise<undefined> returned by append(), when will this be resolved? Is it 1) when the prompt is appended or 2) when it's actually sent to the model? From the sample code it seems to be 1).

@clarkduvall
Copy link

I think it makes the most sense for the promise to resolve when the prompt has been fully processed by the model. So the request can be cancelled up to the point when the promise resolves, and once the promise resolves the prompt is fully committed to the session.

@lozy219
Copy link
Contributor

lozy219 commented Apr 18, 2025

One concern is the caller may not know if the request is actually appended or not since it's asynchronous.

session.append("1");
await session.prompt("2"); // at this point the previous request might not be appended yet?

Even ater the promise is resolved, the abort controller should still be there for us to cancel the request.

@clarkduvall
Copy link

Is this how the abort controller for the prompt/promptStreaming methods behave? I didn't think it retroactively removes the prompts from the session if it has already completed. As a user of the API, I think of abort as "cancel this operation if it has not finished" and not "rewrite the history of the session so this never happened".

One way we might be able to remove the ambiguity of some of the cancel logic is have append() behave like the current prompt() impl which I believe cancels any previous prompt calls if they haven't finished.

@lozy219
Copy link
Contributor

lozy219 commented Apr 21, 2025

For append() there are three states:

  1. prompt appended
  2. prompt sent for processing
  3. response is sent back

For prompt()/promptStreaming() it's just 2 and 3, so it's clear that the promise is resolved at 3, and the user can abort the prompt before that (while the state is 2)

If we want the similar behavior, we should also let the promise returned by append() be resolved at 3, and the request can be aborted anytime before that.

But looking at this example:

session.append("1");
session.append("2", signal);
session.append("3");

session.prompt("4");
signal.abort();

Should the signal.abort() cancel the prompt request, and resend a new one with "134"? It's more intuitive to me that when the prompt() happens, the append() is already completed, so it can no longer be aborted. Like mentioned in #92 (comment) we could treat the prompt() method as append() + generate(), so aborting prompt() is actually aborting the generate() part, but that cannot be done when aborting the append().

@domenic
Copy link
Collaborator Author

domenic commented Apr 21, 2025

Good questions...

For append() there are three states:

1. prompt appended

2. prompt sent for processing

3. response is sent back

For prompt()/promptStreaming() it's just 2 and 3, so it's clear that the promise is resolved at 3, and the user can abort the prompt before that (while the state is 2)

This is basically right, but there is also the "prompt is queued behind other prompts" state.


There are two issues we're discussing here: when should the append() promise resolve, and for how long should you be able to abort an append.

It is somewhat natural to assume these are tied together: that is, to assume you can abort an append up until the promise resolves, but not after. However, this tie is not necessary. For example, the AbortSignal passed to LanguageModel.create() can be used to destroy the language model, even after the promise returned from LanguageModel.create() resolves.

For when the append() promise should resolve, I think we want to resolve it as soon as possible, and not wait until the generate step. The reason is that it will be natural for web developers to write code like this:

await session.append("1");
await session.append("2");
await session.append("3");
const result = await session.prompt("4");

If we waited until the generate step to resolve, this code would be stuck forever on line one, awaiting the first append.

Telling web developers to omit the await is bad, because then they have no clear error channel. Any errors appending will just be unhandled.


Now, regarding how long the AbortSignal passed to append() should work for. My original vision was that until the generate step completes, nothing is locked into the session's history, and everything can be unwound by the appropriate AbortSignal. This would mean you could abort the append even after the append() promise resolves. That would even include after calling prompt(), like in @lozy219's last example.

In this version, the AbortSignal for append() switches between a few modes:

  • While queued behind other appends: dequeues the append and rejects the append() promise
  • While performing the append: aborts the append operation and rejects the append() promise
  • After appending complete, before generate step started: removes the appended prompt
  • After generate step started, before generate step completed: cancels the generate step, removes the appended prompt, and then restarts the generate step with the modified prompt history
  • After generate step completed: does nothing

This is similar to the current spec for the AbortSignal for prompt():

  • While queued behind other prompts, before generate step started: dequeues the prompt and rejects the prompt() promise
  • After generate step started: aborts the prompt operation, dequeues the prompt, and rejects the prompt() promise
  • After generate step completed: does nothing

However, all this discussion makes me feel this is unnecessarily complicated. Do we actually have use cases for such fine-grained management of the prompt history?

So I guess we have three options:

  1. Remove the AbortSignal from the append() call.
  2. Make the AbortSignal for the append() call only work in some of the states, e.g. "while queued behind other appends" and "while performing the append"
  3. Make the AbortSignal for the append() call work in all cases, per the above.

I feel like (1) is maybe best. We can always move to (2) or (3) later if we find a strong use case?

@lozy219
Copy link
Contributor

lozy219 commented Apr 21, 2025

I agree we should start with (1) first. I haven't checked the usage but I believe that even for prompt(), the abort signal is rarely used.

However from the implementation perspective, it might make a bit more differences among those options. The main question is how do we handle those appended request: is it going to be part of the prompt history that's added into the context? or is it part of the new prompt? Maybe this issue is not the right place for this discussion, but we would start with the easiest way of implementation if the explainer is updated with option (1).

@domenic
Copy link
Collaborator Author

domenic commented Apr 21, 2025

I agree we should start with (1) first. I haven't checked the usage but I believe that even for prompt(), the abort signal is rarely used.

Great. I will update the explainer shortly.

However from the implementation perspective, it might make a bit more differences among those options. The main question is how do we handle those appended request: is it going to be part of the prompt history that's added into the context? or is it part of the new prompt? Maybe this issue is not the right place for this discussion, but we would start with the easiest way of implementation if the explainer is updated with option (1).

I am not sure I understand the difference, personally :). But starting with the easiest way makes sense to me!

@clarkduvall
Copy link

I think we may be looking at the purpose of abort from two different perspectives. To me, abort should not be used primarily to manage prompt history, but instead clone() should be used for cases where a user may want to "rewrite" history of the session by creating checkpoints at various stages in the session that can then be used. In my view, abort should be used to cancel expensive operations that may not be needed due to some change in state of the app or user input.

If append() is called with thousands of tokens, it will take a non-trivial amount of time to process even if there is no queueing involved (tens of seconds on some machines). We need to have a way to cancel this operation if it is no longer needed, otherwise all other calls need to wait on this to finish before beginning processing.

In addition, rewriting history is very inefficient in the backend. For example, imagine this example:

session.append("starter text")
session.append("some small text", signal)
await session.append("some super huge text") // May take 10+ seconds to process.

// Oh no, "some small text" is no longer valid!
signal.abort()
session.prompt("more text")

If in this case we allow removing "some small text" from the session, this requires re-processing "some super huge text" before getting any output from the prompt() call. If we don't allow this behavior, it prevents users from accidentally shooting themselves in the foot performance wise. If a user wants to do something similar to this, I would recommend the pattern:

session.append("starter text")
clone = await session.clone()
session.append("some small text", signal)
await session.append("some super huge text") // May take 10+ seconds to process.

// Oh no, "some small text" is no longer valid!
session = clone
session.append("some super huge text")
session.prompt("more text")

Then it's explicit to the user that "some super huge text" must be reprocessed and may take awhile.

Due to these reasons, I think it practically makes the most sense to allow aborting until the append() promise resolves.

For the original example:

session.append("1");
session.append("2", signal);
session.append("3");

signal.abort();
session.prompt();

I would say the end result of the session will depend on when abort() was called in reference to when the promise ended up resolving. So the code as given would not know exactly what the session would look like (either "13" if promise hadn't resolved or "123" if it had). Like I mentioned earlier, I think using clone() is the better way to handle this type of session management rather than using abort, as it makes it much more clear what the intent is.

@domenic
Copy link
Collaborator Author

domenic commented Apr 22, 2025

Thanks for the detailed comment. I guess you have convinced me to move from (1) to (2) immediately. It sounds like we definitely don't want to support (3), and (2) is strictly more useful than (1). So I will update #104.

@lozy219
Copy link
Contributor

lozy219 commented Apr 22, 2025

If append() is called with thousands of tokens, it will take a non-trivial amount of time to process even if there is no queueing involved (tens of seconds on some machines).

FYI the current implementation in Chromium manages the session history from the API level, so append() will not send any data to the backend and it probably won't be that expensive (also there is no queueing from the model side so it's not going to affect other sessions). All the heavy work only happens after the user runs prompt(). Most likely the append() call will be fast enough and leave no time for the abort() to happen :)

Anyway that's the implementation details, from the spec level it's always good to support the abort functionalities.

domenic added a commit that referenced this issue Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants