do_picture_description doesnt work with local HuggingFaceTB--SmolVLM-256M-Instruct

Hi @dolfim-ibm, I'm trying to use do_picture_description with a locally downloaded model (`HuggingFaceTB--SmolVLM-256M-Instruct`) via the `/v1alpha/convert/file/async` endpoint.

I'm passing the parameter as: 

```
"do_picture_description": true,
"picture_description_local": {
  "repo_id": "/docling-models/HuggingFaceTB--SmolVLM-256M-Instruct"
}
```

However, I consistently receive a 422 error:

`Input should be a valid dictionary or object to extract fields from`

Before calling the endpoint, I downloaded all required models using docling-tools models download -o /docling-models, and I’m launching the API with:

`docling-serve run --artifacts-path /docling-models`

Here are the model folders I have under /docling-models:

![Image](https://github.com/user-attachments/assets/126d1e13-913c-48f3-97b1-eebe26944310)

I saw in another [issue ](https://github.com/docling-project/docling-serve/issues/196#issuecomment-2904856033) that do_picture_description might not yet be fully supported in /v1alpha/convert/file/async.

Could you please confirm:

Whether do_picture_description is supported for this endpoint?

Whether using a local model for picture_description_local should work in this context?

If not, is there an alternative endpoint or method recommended for image captioning with local models?

Thanks in advance for the clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

do_picture_description doesnt work with local HuggingFaceTB--SmolVLM-256M-Instruct #204

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

do_picture_description doesnt work with local HuggingFaceTB--SmolVLM-256M-Instruct #204

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions