Description
Hi @dolfim-ibm, I'm trying to use do_picture_description with a locally downloaded model (HuggingFaceTB--SmolVLM-256M-Instruct
) via the /v1alpha/convert/file/async
endpoint.
I'm passing the parameter as:
"do_picture_description": true,
"picture_description_local": {
"repo_id": "/docling-models/HuggingFaceTB--SmolVLM-256M-Instruct"
}
However, I consistently receive a 422 error:
Input should be a valid dictionary or object to extract fields from
Before calling the endpoint, I downloaded all required models using docling-tools models download -o /docling-models, and I’m launching the API with:
docling-serve run --artifacts-path /docling-models
Here are the model folders I have under /docling-models:
I saw in another issue that do_picture_description might not yet be fully supported in /v1alpha/convert/file/async.
Could you please confirm:
Whether do_picture_description is supported for this endpoint?
Whether using a local model for picture_description_local should work in this context?
If not, is there an alternative endpoint or method recommended for image captioning with local models?
Thanks in advance for the clarification!