Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (Language Policy).
- Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
ge97fd2b5
RAGFlow image version
v0.19.x-40-ge97fd2b5 full
Other environment information
RTX A2000, Running on Docker
Actual behavior
Begin at:
Mon, 02 Jun 2025 18:14:20 GMT
Duration:
14728.60 s
Progress:
18:14:22 Task has been received.
18:15:27 Page(113): OCR started13): OCR finished (17.60s)
18:15:45 Page(1
18:16:14 Page(113): Layout analysis (28.97s)13): Text extraction (29.04s)
18:16:14 Page(1
18:16:28 Page(113): Start to generate keywords for every chunk ...13): Keywords generation 195 chunks completed in 132.48s
18:18:40 Page(1
18:18:40 Page(113): Start to generate questions for every chunk ...13): Question generation 195 chunks completed in 240.64s
18:22:41 Page(1
18:22:41 Page(113): Generate 195 chunks13): Embedding chunks (3.47s)
18:22:44 Page(1
18:22:55 Page(113): Indexing done (10.29s). Task done (512.39s)25): OCR started
18:14:41 Task has been received.
18:14:44 Page(13
18:15:00 Page(1325): OCR finished (16.73s)25): Layout analysis (22.43s)
18:15:23 Page(13
18:15:23 Page(1325): Text extraction (22.79s)25): Start to generate keywords for every chunk ...
18:15:35 Page(13
18:16:17 Page(1325): Keywords generation 230 chunks completed in 41.94s25): Start to generate questions for every chunk ...
18:16:17 Page(13
18:18:09 Page(1325): Question generation 230 chunks completed in 112.58s25): Generate 230 chunks
18:18:10 Page(13
18:18:14 Page(1325): Embedding chunks (4.85s)25): Indexing done (22.64s). Task done (236.46s)
18:18:37 Page(13
18:14:41 Task has been received.
18:16:17 Page(6173): OCR started73): OCR finished (10.33s)
18:16:27 Page(61
18:16:54 Page(6173): Layout analysis (26.22s)73): Text extraction (26.31s)
18:16:54 Page(61
18:17:10 Page(6173): Start to generate keywords for every chunk ...73): Keywords generation 331 chunks completed in 140.61s
18:19:31 Page(61
18:19:31 Page(6173): Start to generate questions for every chunk ...73): Question generation 331 chunks completed in 395.17s
18:26:06 Page(61
18:26:06 Page(6173): Generate 331 chunks73): Embedding chunks (5.52s)
18:26:12 Page(61
18:26:28 Page(6173): Indexing done (16.69s). Task done (707.03s)49): OCR started
18:14:41 Task has been received.
18:16:59 Page(37
18:17:09 Page(3749): OCR finished (10.06s)49): Layout analysis (25.91s)
18:17:35 Page(37
18:17:35 Page(3749): Text extraction (25.96s)49): Start to generate keywords for every chunk ...
18:17:51 Page(37
18:20:24 Page(3749): Keywords generation 375 chunks completed in 152.80s49): Start to generate questions for every chunk ...
18:20:24 Page(37
18:29:48 Page(3749): Question generation 375 chunks completed in 563.96s49): Generate 375 chunks
18:29:49 Page(37
18:29:55 Page(3749): Embedding chunks (6.57s)49): Indexing done (55.81s). Task done (970.12s)
18:30:51 Page(37
18:14:41 Task has been received.
18:17:40 Page(2537): OCR started37): OCR finished (8.79s)
18:17:49 Page(25
18:18:13 Page(2537): Layout analysis (24.25s)37): Text extraction (24.38s)
18:18:13 Page(25
18:18:31 Page(2537): Start to generate keywords for every chunk ...37): Keywords generation 261 chunks completed in 158.01s
18:21:09 Page(25
18:21:09 Page(2537): Start to generate questions for every chunk ...37): Question generation 261 chunks completed in 754.62s
18:33:44 Page(25
18:33:44 Page(2537): Generate 261 chunks37): Embedding chunks (4.59s)
18:33:49 Page(25
18:34:02 Page(2537): Indexing done (13.68s). Task done (1161.46s)61): OCR started
18:14:41 Task has been received.
18:18:18 Page(49
18:18:29 Page(4961): OCR finished (10.13s)61): Layout analysis (26.05s)
18:18:55 Page(49
18:18:55 Page(4961): Text extraction (26.34s)61): Start to generate keywords for every chunk ...
18:19:10 Page(49
18:23:30 Page(4961): Keywords generation 323 chunks completed in 260.04s61): Start to generate questions for every chunk ...
18:23:30 Page(49
18:36:55 Page(4961): Question generation 323 chunks completed in 805.44s61): Generate 323 chunks
18:36:55 Page(49
18:37:01 Page(4961): Embedding chunks (5.55s)61): Indexing done (15.19s). Task done (1354.89s)
18:37:16 Page(49
18:14:41 Task has been received.
18:18:59 Page(7385): OCR started85): OCR finished (10.63s)
18:19:10 Page(73
18:19:34 Page(7385): Layout analysis (23.97s)85): Text extraction (24.24s)
18:19:34 Page(73
18:19:50 Page(7385): Start to generate keywords for every chunk ...85): Keywords generation 330 chunks completed in 437.78s
18:27:07 Page(73
18:27:08 Page(7385): Start to generate questions for every chunk ...85): Question generation 330 chunks completed in 780.08s
18:40:08 Page(73
18:40:08 Page(7385): Generate 330 chunks85): Embedding chunks (5.45s)
18:40:13 Page(73
18:40:29 Page(7385): Indexing done (15.30s). Task done (1547.09s)109): OCR started
18:14:42 Task has been received.
18:19:39 Page(97
18:19:48 Page(97109): OCR finished (9.00s)109): Layout analysis (24.64s)
18:20:13 Page(97
18:20:13 Page(97109): Text extraction (24.89s)109): Start to generate keywords for every chunk ...
18:20:27 Page(97
18:30:48 Page(97109): Keywords generation 295 chunks completed in 620.71s109): Start to generate questions for every chunk ...
18:30:48 Page(97
18:44:47 Page(97109): Question generation 295 chunks completed in 839.32s109): Generate 295 chunks
18:44:47 Page(97
18:44:53 Page(97109): Embedding chunks (5.15s)109): Indexing done (16.60s). Task done (1827.16s)
18:45:09 Page(97
18:14:42 Task has been received.
18:20:17 Page(8597): OCR started97): OCR finished (9.62s)
18:20:27 Page(85
18:20:53 Page(8597): Layout analysis (25.65s)97): Text extraction (25.72s)
18:20:53 Page(85
18:21:04 Page(8597): Start to generate keywords for every chunk ...97): Keywords generation 264 chunks completed in 628.17s
18:31:33 Page(85
18:31:33 Page(8597): Start to generate questions for every chunk ...97): Question generation 264 chunks completed in 922.41s
18:46:55 Page(85
18:46:55 Page(8597): Generate 264 chunks97): Embedding chunks (4.25s)
18:47:00 Page(85
18:47:14 Page(8597): Indexing done (14.50s). Task done (1952.58s)121): OCR started
18:14:42 Task has been received.
18:20:57 Page(109
18:21:14 Page(109121): OCR finished (17.02s)121): Layout analysis (22.94s)
18:21:37 Page(109
18:21:37 Page(109121): Text extraction (23.25s)121): Start to generate keywords for every chunk ...
18:21:49 Page(109
18:34:24 Page(109121): Keywords generation 321 chunks completed in 755.37s121): Start to generate questions for every chunk ...
18:34:24 Page(109
18:51:43 Page(109121): Question generation 321 chunks completed in 1038.71s121): Generate 321 chunks
18:51:43 Page(109
18:51:48 Page(109121): Embedding chunks (4.87s)121): Indexing done (14.66s). Task done (2240.08s)
18:52:03 Page(109
18:18:37 Task has been received.
18:21:40 Page(121133): OCR started133): OCR finished (14.25s)
18:21:54 Page(121
18:22:18 Page(121133): Layout analysis (23.78s)133): Text extraction (23.86s)
18:22:18 Page(121
18:22:21 Page(121133): Start to generate keywords for every chunk ...133): Keywords generation 37 chunks completed in 731.82s
18:34:33 Page(121
18:34:33 Page(121133): Start to generate questions for every chunk ...133): Question generation 37 chunks completed in 1053.28s
18:52:06 Page(121
18:52:07 Page(121133): Generate 37 chunks133): Embedding chunks (1.13s)
18:52:09 Page(121
18:52:11 Page(121133): Indexing done (1.96s). Task done (2013.42s)145): OCR started
18:22:55 Task has been received.
18:22:55 Page(133
18:23:03 Page(133145): OCR finished (7.54s)145): Layout analysis (24.81s)
18:23:28 Page(133
18:23:28 Page(133145): Text extraction (25.19s)145): Start to generate keywords for every chunk ...
18:23:38 Page(133
18:37:28 Page(133145): Keywords generation 259 chunks completed in 829.47s145): Start to generate questions for every chunk ...
18:37:28 Page(133
18:54:17 Page(133145): Question generation 259 chunks completed in 1008.87s145): Generate 259 chunks
18:54:17 Page(133
18:54:23 Page(133145): Embedding chunks (5.61s)145): Indexing done (12.14s). Task done (1900.16s)
18:54:35 Page(133
18:26:29 Task has been received.
18:26:29 Page(145157): OCR started157): OCR finished (31.00s)
18:27:00 Page(145
18:27:24 Page(145157): Layout analysis (24.42s)157): Text extraction (24.55s)
18:27:24 Page(145
18:27:45 Page(145157): Start to generate keywords for every chunk ...157): Keywords generation 874 chunks completed in 882.74s
18:42:28 Page(145
18:42:28 Page(145157): Start to generate questions for every chunk ...157): Question generation 874 chunks completed in 1193.92s
19:02:22 Page(145
19:02:22 Page(145157): Generate 874 chunks157): Embedding chunks (13.75s)
19:02:36 Page(145
19:03:18 Page(145157): Indexing done (41.56s). Task done (2209.15s)167): OCR started
18:30:51 Task has been received.
18:30:52 Page(157
18:31:07 Page(157167): OCR finished (15.19s)167): Layout analysis (18.98s)
18:31:26 Page(157
18:31:26 Page(157167): Text extraction (19.08s)167): Start to generate keywords for every chunk ...
18:31:54 Page(157
18:49:47 Page(157167): Keywords generation 1376 chunks completed in 1072.87s167): Start to generate questions for every chunk ...
18:49:48 Page(157
19:12:30 Page(157167): Question generation 1376 chunks completed in 1362.05s167): Generate 1376 chunks
19:12:30 Page(157
19:12:50 Page(157167): Embedding chunks (19.77s)167): Indexing done (56.12s). Task done (2574.18s)
19:13:46 Page(157
19:13:48 created task raptor
21:52:11 Task has been received.
22:19:42 [ERROR][Exception]: Exceptions from Trio nursery (1 sub-exception) -- ERROR: POST predict: Post "http://127.0.0.1:33395/completion": EOF
Seemed to work ok but couldnt send the notification to the completion endpoint? Some Docker servers dont accept 127.0.0.1 due to networking - this should be renamed to be the ragflow-server container I guess?
What is the payload for Parse completion so I can mark these as complete manually via curl?
Expected behavior
Parsing should mark the document as complete
Steps to reproduce
Simple PDF Parse with RAPTOR only
Additional information
No response