Skip to content

Use cases for parallel requests #59

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tomayac opened this issue Nov 12, 2024 · 6 comments
Closed

Use cases for parallel requests #59

tomayac opened this issue Nov 12, 2024 · 6 comments

Comments

@tomayac
Copy link
Contributor

tomayac commented Nov 12, 2024

The Prompt API currently doesn't support parallel requests, but there are use cases for this feature, like, for example, analyzing n (say, 10) post items in an RSS feed to see if any of the post items is about a given topic that the user isn't interested in. While you could process the post items one-by-one, ideally, you process them in parallel, which raises the question of what an ideal maximum number of m (say, 5) requests would be. This issue is for collecting other use cases or examples.

@jacoblee93
Copy link

Big +1 on making this work. Anything that requires responsiveness and more than a single LLM call (map-reduce style summarization, agents, RAG over multiple sources) benefits massively from parallelization.

At the very least the burden shouldn't be on the user to run things serially and the model should implement its own robust queuing strategy.

@kowalczyk-krzysztof
Copy link

When it comes to developing extensions for Chrome (which I think will be the main use case for prompt API), any content analysis in real time is not gonna work without parallelization (or context window increase, but I guess that's not something that can be done).

To give you a specific example, I was looking into building an extension that would leverage some existing tool to get an a11y report and then feed the HTML elements, which have a11y violations detected and the corresponding violation description into prompt API to get the elements modified and then replace them in DOM. The bottleneck was the lack of parallelization - being able to have 10 API calls running in parallel would be more than enough to make this work in real time.

@alecf
Copy link

alecf commented Nov 12, 2024

This is very common for other LLM use. (See the name "chain" in langchain)

There are heterogeneous activities that you might want to do that benefit from separate requests, and you want to show users results as they come in. For instance for a single paragraph you might want to do all of these in parallel

  1. extract a list of entity names
  2. create a 1 sentence summary
  3. do sentiment classification

and then present the results of this to the user as results arrive. In addition you might do additional work based on, say, sentiment classification, (i.e. get a list of complaints from a negative review) and you wouldn't want to block/await other activities to begin that task.

@domenic
Copy link
Collaborator

domenic commented Nov 13, 2024

The API described here supports parallel requests (e.g. via multiple sessions), so I think this is more of a Chromium implementation issue. Please file it at https://crbug.new :)

@domenic domenic closed this as completed Nov 13, 2024
@nilinswap
Copy link

here

Sorry, I did not get. API described where?

@domenic
Copy link
Collaborator

domenic commented Nov 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants