Description
I pulled the image docling-serve-cu124, pushed it to an Azure Container App with enabled GPU and tried to convert a file using /v1alpha/convert/source endpoint.
But I got an error 404 and
{
"detail": "Task result not found. Please wait for a completion status."
}
In order to get more error information, I pulled the repo, added some debug messages, created another docker image and used it for the container app.
These are the errors I got.
--- Processing url endpoint...
Sources being passed to convert_documents: ['https://arxiv.org/pdf/2408.09869']
ERROR:docling_serve.engines.async_local.worker:Worker 1 failed to process job f9891b2b-56b0-407f-880c-e7a84205abb3: 500: [digital envelope routines] unsupported
ERROR:docling_serve.engines.async_local.worker:Traceback (most recent call last):
File "/opt/app-root/src/docling_serve/response_preparation.py", line 136, in process_results
conv_results = list(conv_results)
^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling/document_converter.py", line 243, in convert_all
for conv_res in conv_res_iter:
^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling/document_converter.py", line 265, in _convert
for input_batch in chunkify(
^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling/utils/utils.py", line 15, in chunkify
for first in iterator: # Take the first element from the iterator
^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling/datamodel/document.py", line 264, in docs
yield InputDocument(
^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling/datamodel/document.py", line 147, in __init__
self._init_doc(backend, path_or_stream)
File "/opt/app-root/lib64/python3.12/site-packages/docling/datamodel/document.py", line 183, in _init_doc
self._backend = backend(self, path_or_stream=path_or_stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling/backend/docling_parse_v4_backend.py", line 152, in __init__
self.dp_doc: PdfDocument = self.parser.load(path_or_stream=self.path_or_stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.12/site-packages/docling_parse/pdf_parser.py", line 458, in load
hasher = hashlib.md5()
^^^^^^^^^^^^^
_hashlib.UnsupportedDigestmodError: [digital envelope routines] unsupported
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/app-root/src/docling_serve/engines/async_local/worker.py", line 103, in loop
response = await asyncio.to_thread(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/asyncio/threads.py", line 25, in to_thread
return await loop.run_in_executor(None, func_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/src/docling_serve/engines/async_local/worker.py", line 74, in run_conversion
response = process_results(
^^^^^^^^^^^^^^^^
File "/opt/app-root/src/docling_serve/response_preparation.py", line 145, in process_results
raise HTTPException(status_code=500, detail=str(e))
fastapi.exceptions.HTTPException: 500: [digital envelope routines] unsupported
INFO: 100.100.0.45:54190 - "POST /v1alpha/convert/source HTTP/1.1" 404 Not Found
The issue seems to be hashlib.md5(), that fails when the image has GPU support.
I ended rebuilding the image with OpenSSL and no-fips, which worked.
However, this is not ideal, because the workflow becomes a bit complicated.
I have Mac, so I had to build the image on a VM and then push it to the correct container registry.
Any ideas about what might be wrong in the first place and how to resolve it..?