-
Notifications
You must be signed in to change notification settings - Fork 39
Allow appending messages without receiving a response #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If we end up going with the separate method approach which I also slightly prefer, another more extreme option is to get rid of the ability for the prompt() method to add input at all so the only way to add input is through the append() method (we may want to rename prompt() to generate() or something like that at that point). So then a normal session would be: // Add some context.
session.append(messages);
...
// Add some more context.
session.append(moreMessages);
// Now get the output.
await session.generate(); |
I can see the appeal of that, although I'd keep |
For the abort signal, the spec says
Does it mean that we can abort a specific appended prompt any time before the response is generated. For example, if the user makes these calls one by one:
Which of the followings happens to the I assume it's b. But what if context overflow happens when appending "2"? I think this leads to another question: are these appended inputs added to the prompt history one by one or as a whole? In other words, assuming the session can only keep 2 characters, if the following calls are made
Is it equivalent to sending a prompt with "23", or it would throw a QuotaExceeded error? |
Yes, that's what the paragraph was intending to convey! Let me know if there's some phrasing you think would be clearer.
b was my intention
I guess in that case we have to send just 3.
Good question. I think it should be equivalent to sending "23". That maps to the natural implementation strategy of each append call mapping to a call into the language model. Otherwise we'd have to do some sort of batching on the frontend, which seems fragile and unpredictable. |
Thanks for the clarification.
I think the current one is quite clear, I just wanted to confirm on some details. There is another quesiton regarding the |
I think it makes the most sense for the promise to resolve when the prompt has been fully processed by the model. So the request can be cancelled up to the point when the promise resolves, and once the promise resolves the prompt is fully committed to the session. |
One concern is the caller may not know if the request is actually appended or not since it's asynchronous. session.append("1");
await session.prompt("2"); // at this point the previous request might not be appended yet? Even ater the promise is resolved, the abort controller should still be there for us to cancel the request. |
Is this how the abort controller for the prompt/promptStreaming methods behave? I didn't think it retroactively removes the prompts from the session if it has already completed. As a user of the API, I think of abort as "cancel this operation if it has not finished" and not "rewrite the history of the session so this never happened". One way we might be able to remove the ambiguity of some of the cancel logic is have append() behave like the current prompt() impl which I believe cancels any previous prompt calls if they haven't finished. |
For
For If we want the similar behavior, we should also let the promise returned by append() be resolved at 3, and the request can be aborted anytime before that. But looking at this example: session.append("1");
session.append("2", signal);
session.append("3");
session.prompt("4");
signal.abort(); Should the |
Good questions...
This is basically right, but there is also the "prompt is queued behind other prompts" state. There are two issues we're discussing here: when should the It is somewhat natural to assume these are tied together: that is, to assume you can abort an append up until the promise resolves, but not after. However, this tie is not necessary. For example, the For when the await session.append("1");
await session.append("2");
await session.append("3");
const result = await session.prompt("4"); If we waited until the generate step to resolve, this code would be stuck forever on line one, awaiting the first append. Telling web developers to omit the Now, regarding how long the In this version, the
This is similar to the current spec for the
However, all this discussion makes me feel this is unnecessarily complicated. Do we actually have use cases for such fine-grained management of the prompt history? So I guess we have three options:
I feel like (1) is maybe best. We can always move to (2) or (3) later if we find a strong use case? |
I agree we should start with (1) first. I haven't checked the usage but I believe that even for However from the implementation perspective, it might make a bit more differences among those options. The main question is how do we handle those appended request: is it going to be part of the prompt history that's added into the context? or is it part of the new prompt? Maybe this issue is not the right place for this discussion, but we would start with the easiest way of implementation if the explainer is updated with option (1). |
Great. I will update the explainer shortly.
I am not sure I understand the difference, personally :). But starting with the easiest way makes sense to me! |
See discussions in #92 (comment) onward.
I think we may be looking at the purpose of abort from two different perspectives. To me, abort should not be used primarily to manage prompt history, but instead clone() should be used for cases where a user may want to "rewrite" history of the session by creating checkpoints at various stages in the session that can then be used. In my view, abort should be used to cancel expensive operations that may not be needed due to some change in state of the app or user input. If append() is called with thousands of tokens, it will take a non-trivial amount of time to process even if there is no queueing involved (tens of seconds on some machines). We need to have a way to cancel this operation if it is no longer needed, otherwise all other calls need to wait on this to finish before beginning processing. In addition, rewriting history is very inefficient in the backend. For example, imagine this example:
If in this case we allow removing "some small text" from the session, this requires re-processing "some super huge text" before getting any output from the prompt() call. If we don't allow this behavior, it prevents users from accidentally shooting themselves in the foot performance wise. If a user wants to do something similar to this, I would recommend the pattern:
Then it's explicit to the user that "some super huge text" must be reprocessed and may take awhile. Due to these reasons, I think it practically makes the most sense to allow aborting until the append() promise resolves. For the original example:
I would say the end result of the session will depend on when abort() was called in reference to when the promise ended up resolving. So the code as given would not know exactly what the session would look like (either "13" if promise hadn't resolved or "123" if it had). Like I mentioned earlier, I think using clone() is the better way to handle this type of session management rather than using abort, as it makes it much more clear what the intent is. |
Thanks for the detailed comment. I guess you have convinced me to move from (1) to (2) immediately. It sounds like we definitely don't want to support (3), and (2) is strictly more useful than (1). So I will update #104. |
FYI the current implementation in Chromium manages the session history from the API level, so Anyway that's the implementation details, from the spec level it's always good to support the abort functionalities. |
See discussions in #92 (comment) onward.
Uh oh!
There was an error while loading. Please reload this page.
The original shape of this API assumed we would always generate messages in the sequence of: user, assistant, user, assistant, ...
Various changes since then have complicated the situation:
initialPrompts
, which includes multiple messages at once with no responseAt this point, we are assuming a model architecture that allows an arbitrary sequence of user + assistant messages, in any order. In such cases, it might be useful to allow appending messages without immediately asking the model to generate a response. This allows different parts of the application to prepare the session by sending some messages, before another part of the application is finally ready to use them to generate a response.
There are two main API proposals for this. A new option to
prompt()
, such asvs. a new method, e.g.
I like the second option slightly more:
promptStreaming()
, so we'd only include a new option forprompt()
. It's a bit strange to have options that apply toprompt()
but notpromptStreaming()
.{ delayResponse: true }
, instead of something more natural like{ respond: false }
.The text was updated successfully, but these errors were encountered: