-
Notifications
You must be signed in to change notification settings - Fork 39
Use cases for parallel requests #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Big +1 on making this work. Anything that requires responsiveness and more than a single LLM call (map-reduce style summarization, agents, RAG over multiple sources) benefits massively from parallelization. At the very least the burden shouldn't be on the user to run things serially and the model should implement its own robust queuing strategy. |
When it comes to developing extensions for Chrome (which I think will be the main use case for prompt API), any content analysis in real time is not gonna work without parallelization (or context window increase, but I guess that's not something that can be done). To give you a specific example, I was looking into building an extension that would leverage some existing tool to get an a11y report and then feed the HTML elements, which have a11y violations detected and the corresponding violation description into prompt API to get the elements modified and then replace them in DOM. The bottleneck was the lack of parallelization - being able to have |
This is very common for other LLM use. (See the name "chain" in langchain) There are heterogeneous activities that you might want to do that benefit from separate requests, and you want to show users results as they come in. For instance for a single paragraph you might want to do all of these in parallel
and then present the results of this to the user as results arrive. In addition you might do additional work based on, say, sentiment classification, (i.e. get a list of complaints from a negative review) and you wouldn't want to block/await other activities to begin that task. |
The API described here supports parallel requests (e.g. via multiple sessions), so I think this is more of a Chromium implementation issue. Please file it at https://crbug.new :) |
Sorry, I did not get. API described where? |
Uh oh!
There was an error while loading. Please reload this page.
The Prompt API currently doesn't support parallel requests, but there are use cases for this feature, like, for example, analyzing
n
(say,10
) post items in an RSS feed to see if any of the post items is about a given topic that the user isn't interested in. While you could process the post items one-by-one, ideally, you process them in parallel, which raises the question of what an ideal maximum number ofm
(say,5
) requests would be. This issue is for collecting other use cases or examples.The text was updated successfully, but these errors were encountered: