Skip to content

Error on chat openai with litellm api for self hosted llm #1308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Christhian16 opened this issue Apr 7, 2025 · 8 comments
Open

Error on chat openai with litellm api for self hosted llm #1308

Christhian16 opened this issue Apr 7, 2025 · 8 comments
Labels
bug Bugs reported by users

Comments

@Christhian16
Copy link

Description

I'm currently using self hosted llm through litellm api.
When I use the Openai library with "openai.OpenAI.completions.create" on a notebook it worked but when I try to configure the jupyterlab-ai chat extension [filled in api_url, api_key, model_id] I get the following error:
"openai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"

Reproduce

  1. Go to the configuration zone for the chat
  2. Choose for completion model "OpenAI (general inteface)"
  3. Fill in your model id ("google/gemma2-27b" in my case)
  4. Fill in the base api url
  5. Fill in the api keys
  6. Save changes
  7. Go back to the chat and send a message
  8. See error "openr code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"
  9. Note that for model id "meta-llama/Llama-3.1-70B" I don' get this error , I have a good answer but after it send me another bug we can discuss later "ai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"
  10. Note that using the jupyter_ai_magics like below work for me:
    %%ai openai-chat-custom:google/gemma2-27b
    Write a short poem

Expected behavior

I was expecting the chat to return an answer without an error

Context

  • Operating System and version: Ubuntu 22.04.2 LTS
  • JupyterLab version: 4.1.6
  • Python version: 3.10
  • Jupyter_ai version: 2.31.1
@Christhian16 Christhian16 added the bug Bugs reported by users label Apr 7, 2025
@dlqqq
Copy link
Member

dlqqq commented Apr 7, 2025

@Christhian16 Thanks for reporting this issue. From the error logs you provided, it doesn't seem like this is an issue with Jupyter AI. The error logs suggest that this is an issue with the LiteLLM server that you are connecting to. Specifically, meta-llama/Llama-3.1-70B is available on that server, but google/gemma2-27b is not.

To remedy this, I recommend looking through the LiteLLM docs to see if google/gemma2-27b is available. They share a list of model providers here: https://docs.litellm.ai/docs/providers. If they do not have a google/gemma2-27b, then this is an issue with LiteLLM, not Jupyter AI.

Hope this helps! Let me know if you have other questions.

@Christhian16
Copy link
Author

Christhian16 commented Apr 8, 2025

@dlqqq Thank you for your quick response.
As I mentionned on my first message (my model is well installed and when launching it through litellm I am abe to use it like a classic llm api and also using the jupyter_ai_magics that you have developped but I get an error in the chat) :

  1. Note that for model id "meta-llama/Llama-3.1-70B" I don' get this error , I have a good answer but after it send me another bug we can discuss later "ai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"

10.Note that using the jupyter_ai_magics like below work for me:
%%ai openai-chat-custom:google/gemma2-27b
Write a short poem

See the full error below

Traceback (most recent call last):
File "/opt/conda/lib/python3.12/site-packages/jupyter_ai/chat_handlers/base.py", line 229, in on_message
await self.process_message(message)
File "/opt/conda/lib/python3.12/site-packages/jupyter_ai/chat_handlers/default.py", line 72, in process_message
await self.stream_reply(inputs, message)
File "/opt/conda/lib/python3.12/site-packages/jupyter_ai/chat_handlers/base.py", line 567, in stream_reply
async for chunk in chunk_generator:
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5630, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5630, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3467, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3449, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2319, in _atransform_stream_with_config
chunk: Output = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3416, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5669, in atransform
async for item in self.bound.atransform(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5012, in atransform
async for output in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2319, in _atransform_stream_with_config
chunk: Output = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 4992, in _atransform
async for chunk in output.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5630, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3467, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3449, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2319, in _atransform_stream_with_config
chunk: Output = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3416, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.12/site-packages/langchain_core/output_parsers/transform.py", line 87, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2278, in _atransform_stream_with_config
final_input: Optional[Input] = await py_anext(input_for_tracing, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/utils/aiter.py", line 78, in anext_impl
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/utils/aiter.py", line 128, in tee_peer
item = await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 1478, in atransform
async for output in self.astream(final, config, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 540, in astream
async for chunk in self._astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 2376, in _astream
async for chunk in super()._astream(*args, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 1069, in _astream
response = await self.async_client.create(**payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 2000, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1547, in _request
return await self._retry_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1594, in _retry_request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1547, in _request
return await self._retry_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1594, in _retry_request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}

@Christhian16
Copy link
Author

hello @dlqqq

It only works with "meta-llama/Llama-3.1-70B" and litellm for the api.
Now I get this error

See below:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 229, in on_message
await self.process_message(message)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/default.py", line 72, in process_message
await self.stream_reply(inputs, message)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 567, in stream_reply
async for chunk in chunk_generator:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5657, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5657, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3481, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3463, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3430, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5696, in atransform
async for item in self.bound.atransform(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5039, in atransform
async for output in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5019, in _atransform
async for chunk in output.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5657, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3481, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3463, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3430, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/output_parsers/transform.py", line 87, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/output_parsers/transform.py", line 41, in _atransform
async for chunk in input:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/utils/aiter.py", line 128, in tee_peer
item = await iterator.anext()
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1481, in atransform
async for output in self.astream(final, config, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 519, in astream
async for chunk in self._astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 2376, in _astream
async for chunk in super()._astream(*args, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 1074, in _astream
async for chunk in response:
File "/opt/conda/lib/python3.10/site-packages/openai/_streaming.py", line 147, in aiter
async for item in self._iterator:
File "/opt/conda/lib/python3.10/site-packages/openai/_streaming.py", line 174, in stream
raise APIError(
openai.APIError: litellm.APIError: Error building chunks for logging/streaming usage calculation

Thank you in advance for your response.

Kind regards

@srdas
Copy link
Collaborator

srdas commented Apr 14, 2025

@Christhian16 @dlqqq Let me look into it this week. I am working on another PR and should be able to rotate to this one after that.

@Christhian16
Copy link
Author

@srdas I'd like to reiterate my needs regarding this bug.

@srdas
Copy link
Collaborator

srdas commented Apr 23, 2025

@Christhian16 Can you briefly explain how you have deployed your self-hosted litellm server? (OS, etc, or a link to the instructions you are using?) I'll try and replicate that as I currently do not have a working litellm local installation.

@Christhian16
Copy link
Author

@srdas
Find below the information:

Thank you in advance for the help on this

@srdas
Copy link
Collaborator

srdas commented Apr 25, 2025

I have been trying to install LiteLLM using Docker on an old Intel machine running Ubuntu but the BIOS does not support it. However, I am able to install a simple proxy server after installing LiteLLM, i.e., pip install 'litellm[proxy]. This works fine when started for the ollama/llama3.2 model:

Image

After running a local Ollama instance, LiteLLM is able to access the model.

Image

The set up uses the LiteLLM port 4000 as required, note the settings below:

Image

I retested this on a mac and it works the same. Other models such as OpenAI models are also accessed with no error through the 4000 port as well. I suspect the litellm.APIError is on the server side, but to test, first see if it works without the Docker install, using the proxy as above? Is the same thing happening, i.e., that it works for the llama3.1 model but not for the gemma2 one? I tried it and it works for gemma also:

Image

Image

Note the OK response from the LiteLLM proxy server:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs reported by users
Projects
None yet
Development

No branches or pull requests

3 participants