Error on chat openai with litellm api for self hosted llm #1308

Christhian16 · 2025-04-07T05:13:00Z

Description

I'm currently using self hosted llm through litellm api.
When I use the Openai library with "openai.OpenAI.completions.create" on a notebook it worked but when I try to configure the jupyterlab-ai chat extension [filled in api_url, api_key, model_id] I get the following error:
"openai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"

Reproduce

Go to the configuration zone for the chat
Choose for completion model "OpenAI (general inteface)"
Fill in your model id ("google/gemma2-27b" in my case)
Fill in the base api url
Fill in the api keys
Save changes
Go back to the chat and send a message
See error "openr code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"
Note that for model id "meta-llama/Llama-3.1-70B" I don' get this error , I have a good answer but after it send me another bug we can discuss later "ai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"
Note that using the jupyter_ai_magics like below work for me:
%%ai openai-chat-custom:google/gemma2-27b
Write a short poem

Expected behavior

I was expecting the chat to return an answer without an error

Context

Operating System and version: Ubuntu 22.04.2 LTS
JupyterLab version: 4.1.6
Python version: 3.10
Jupyter_ai version: 2.31.1

dlqqq · 2025-04-07T16:22:03Z

@Christhian16 Thanks for reporting this issue. From the error logs you provided, it doesn't seem like this is an issue with Jupyter AI. The error logs suggest that this is an issue with the LiteLLM server that you are connecting to. Specifically, meta-llama/Llama-3.1-70B is available on that server, but google/gemma2-27b is not.

To remedy this, I recommend looking through the LiteLLM docs to see if google/gemma2-27b is available. They share a list of model providers here: https://docs.litellm.ai/docs/providers. If they do not have a google/gemma2-27b, then this is an issue with LiteLLM, not Jupyter AI.

Hope this helps! Let me know if you have other questions.

Christhian16 · 2025-04-08T04:39:34Z

@dlqqq Thank you for your quick response.
As I mentionned on my first message (my model is well installed and when launching it through litellm I am abe to use it like a classic llm api and also using the jupyter_ai_magics that you have developped but I get an error in the chat) :

Note that for model id "meta-llama/Llama-3.1-70B" I don' get this error , I have a good answer but after it send me another bug we can discuss later "ai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}"

10.Note that using the jupyter_ai_magics like below work for me:
%%ai openai-chat-custom:google/gemma2-27b
Write a short poem

See the full error below

Traceback (most recent call last):
File "/opt/conda/lib/python3.12/site-packages/jupyter_ai/chat_handlers/base.py", line 229, in on_message
await self.process_message(message)
File "/opt/conda/lib/python3.12/site-packages/jupyter_ai/chat_handlers/default.py", line 72, in process_message
await self.stream_reply(inputs, message)
File "/opt/conda/lib/python3.12/site-packages/jupyter_ai/chat_handlers/base.py", line 567, in stream_reply
async for chunk in chunk_generator:
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5630, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5630, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3467, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3449, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2319, in _atransform_stream_with_config
chunk: Output = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3416, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5669, in atransform
async for item in self.bound.atransform(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5012, in atransform
async for output in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2319, in _atransform_stream_with_config
chunk: Output = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 4992, in _atransform
async for chunk in output.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 5630, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3467, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3449, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2319, in _atransform_stream_with_config
chunk: Output = await asyncio.create_task( # type: ignore[call-arg]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 3416, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.12/site-packages/langchain_core/output_parsers/transform.py", line 87, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 2278, in _atransform_stream_with_config
final_input: Optional[Input] = await py_anext(input_for_tracing, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/utils/aiter.py", line 78, in anext_impl
return await anext(iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/utils/aiter.py", line 128, in tee_peer
item = await iterator.anext()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/langchain_core/runnables/base.py", line 1478, in atransform
async for output in self.astream(final, config, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_core/language_models/chat_models.py", line 540, in astream
async for chunk in self._astream(
File "/opt/conda/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 2376, in _astream
async for chunk in super()._astream(*args, **kwargs):
File "/opt/conda/lib/python3.12/site-packages/langchain_openai/chat_models/base.py", line 1069, in _astream
response = await self.async_client.create(**payload)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/resources/chat/completions/completions.py", line 2000, in create
return await self._post(
^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1767, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1461, in request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1547, in _request
return await self._retry_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1594, in _retry_request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1547, in _request
return await self._retry_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1594, in _retry_request
return await self._request(
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/openai/_base_client.py", line 1562, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'message': 'litellm.APIError: APIError: Hosted_vllmException - Internal Server Error\nReceived Model Group=google/gemma2-27b\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '500'}}

Christhian16 · 2025-04-14T20:10:05Z

hello @dlqqq

It only works with "meta-llama/Llama-3.1-70B" and litellm for the api.
Now I get this error

See below:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 229, in on_message
await self.process_message(message)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/default.py", line 72, in process_message
await self.stream_reply(inputs, message)
File "/opt/conda/lib/python3.10/site-packages/jupyter_ai/chat_handlers/base.py", line 567, in stream_reply
async for chunk in chunk_generator:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5657, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5657, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3481, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3463, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3430, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5696, in atransform
async for item in self.bound.atransform(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5039, in atransform
async for output in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5019, in _atransform
async for chunk in output.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 5657, in astream
async for item in self.bound.astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3481, in astream
async for chunk in self.atransform(input_aiter(), config, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3463, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 3430, in _atransform
async for output in final_pipeline:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/output_parsers/transform.py", line 87, in atransform
async for chunk in self._atransform_stream_with_config(
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 2331, in _atransform_stream_with_config
chunk = cast("Output", await py_anext(iterator))
File "/opt/conda/lib/python3.10/site-packages/langchain_core/output_parsers/transform.py", line 41, in _atransform
async for chunk in input:
File "/opt/conda/lib/python3.10/site-packages/langchain_core/utils/aiter.py", line 128, in tee_peer
item = await iterator.anext()
File "/opt/conda/lib/python3.10/site-packages/langchain_core/runnables/base.py", line 1481, in atransform
async for output in self.astream(final, config, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_core/language_models/chat_models.py", line 519, in astream
async for chunk in self._astream(
File "/opt/conda/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 2376, in _astream
async for chunk in super()._astream(*args, **kwargs):
File "/opt/conda/lib/python3.10/site-packages/langchain_openai/chat_models/base.py", line 1074, in _astream
async for chunk in response:
File "/opt/conda/lib/python3.10/site-packages/openai/_streaming.py", line 147, in aiter
async for item in self._iterator:
File "/opt/conda/lib/python3.10/site-packages/openai/_streaming.py", line 174, in stream
raise APIError(
openai.APIError: litellm.APIError: Error building chunks for logging/streaming usage calculation

Thank you in advance for your response.

Kind regards

srdas · 2025-04-14T20:47:52Z

@Christhian16 @dlqqq Let me look into it this week. I am working on another PR and should be able to rotate to this one after that.

Christhian16 · 2025-04-23T19:36:06Z

@srdas I'd like to reiterate my needs regarding this bug.

srdas · 2025-04-23T21:58:13Z

@Christhian16 Can you briefly explain how you have deployed your self-hosted litellm server? (OS, etc, or a link to the instructions you are using?) I'll try and replicate that as I currently do not have a working litellm local installation.

Christhian16 · 2025-04-24T10:15:09Z

@srdas
Find below the information:

OS : Linux server - Ubuntu 22.04.1
litellm version : LiteLLM API 1.53.7
Installation : https://docs.litellm.ai/docs/proxy/deploy (Litellm was deployed using Docker and use vllm to run model and installation guide https://docs.vllm.ai/en/stable/deployment/docker.html)

Thank you in advance for the help on this

srdas · 2025-04-25T14:48:58Z

I have been trying to install LiteLLM using Docker on an old Intel machine running Ubuntu but the BIOS does not support it. However, I am able to install a simple proxy server after installing LiteLLM, i.e., pip install 'litellm[proxy]. This works fine when started for the ollama/llama3.2 model:

After running a local Ollama instance, LiteLLM is able to access the model.

The set up uses the LiteLLM port 4000 as required, note the settings below:

I retested this on a mac and it works the same. Other models such as OpenAI models are also accessed with no error through the 4000 port as well. I suspect the litellm.APIError is on the server side, but to test, first see if it works without the Docker install, using the proxy as above? Is the same thing happening, i.e., that it works for the llama3.1 model but not for the gemma2 one? I tried it and it works for gemma also:

Note the OK response from the LiteLLM proxy server:

Christhian16 added the bug Bugs reported by users label Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Error on chat openai with litellm api for self hosted llm #1308

Error on chat openai with litellm api for self hosted llm #1308

Christhian16 commented Apr 7, 2025

dlqqq commented Apr 7, 2025

Uh oh!

Christhian16 commented Apr 8, 2025 •

edited

Loading

Uh oh!

Christhian16 commented Apr 14, 2025

Uh oh!

srdas commented Apr 14, 2025

Uh oh!

Christhian16 commented Apr 23, 2025

Uh oh!

srdas commented Apr 23, 2025

Uh oh!

Christhian16 commented Apr 24, 2025

Uh oh!

srdas commented Apr 25, 2025

Uh oh!

Uh oh!

Error on chat openai with litellm api for self hosted llm #1308

Error on chat openai with litellm api for self hosted llm #1308

Comments

Christhian16 commented Apr 7, 2025

Description

Reproduce

Expected behavior

Context

dlqqq commented Apr 7, 2025

Uh oh!

Christhian16 commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

See the full error below

Uh oh!

Christhian16 commented Apr 14, 2025

Uh oh!

srdas commented Apr 14, 2025

Uh oh!

Christhian16 commented Apr 23, 2025

Uh oh!

srdas commented Apr 23, 2025

Uh oh!

Christhian16 commented Apr 24, 2025

Uh oh!

srdas commented Apr 25, 2025

Uh oh!

Christhian16 commented Apr 8, 2025 •

edited

Loading