You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/developer-manual/architecture.md
+39-1Lines changed: 39 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -318,7 +318,7 @@ This is a huge task—and that’s exactly what the **Path Template Worker** is
318
318
319
319
---
320
320
321
-
## S3 Worker
321
+
## S3 Worker – Part 1
322
322
323
323
**{{ extra.project }}** supports S3-compatible storage systems. When S3 is enabled, all documents are uploaded to the S3 bucket.
324
324
@@ -337,6 +337,44 @@ In deployments:
337
337
338
338
---
339
339
340
+
## S3 Worker – Part 2
341
+
342
+
The example from *S3 Worker – Part 1* was intentionally simple. However, its simplicity may obscure the real value of using S3 storage. S3 serves two key purposes:
In this scenario, there are two Kubernetes nodes, each running two app instances. Since nodes are isolated, `app1` and `app2` have access to different local storage than `app3` and `app4`. Now imagine `app4` receives a request to "merge two documents," but those documents are only present in node 1’s local storage. This is where S3 becomes essential: because **all documents are stored centrally in the S3 bucket**, `app4` can simply **download the necessary documents from S3** and proceed with the task.
352
+
353
+
This setup allows {{ extra.project }} containers to be **stateless**—they don’t need attached persistent storage. Local storage may be ephemeral, as pods move between nodes. But S3 remains the **single source of truth** for document storage. As a result, **any app can access any document at any time**, regardless of where it runs. This pattern enables horizontal scalability: add more apps, add more nodes—S3 makes it work.
354
+
355
+
---
356
+
357
+
### What About Performance?
358
+
359
+
You may wonder: how is performance related to S3?
360
+
361
+
Until now, it was implicitly assumed that app containers serve documents directly to end-users. This might work fine for low-traffic scenarios (e.g. 5–10 users), but it quickly becomes a bottleneck in high-load situations (e.g. 1000+ users). Here’s why:
362
+
363
+
> Serving PDF files is **very slow** compared to typical HTTP requests.
364
+
> (Think: 5 seconds to download a PDF vs. 200ms for a JSON API call)
365
+
366
+
In practical terms: instead of handling 25–50 lightweight API requests, the app might be **busy serving one large file**. Add to this the potential distance between user and server (e.g. user in the US, app running in Europe), and the situation worsens—users may wait **10+ seconds** just to download a 2–3 MB PDF.
367
+
368
+
S3 solves this by **offloading document delivery** from the app container:
369
+
370
+
* Users can download documents **directly from S3**
371
+
* App containers stay free to handle fast, critical API calls
372
+
* With **CDN integration**, files are served even faster and closer to the user
373
+
374
+
In fact, {{ extra.project }} integrates seamlessly with S3 and AWS CloudFront (CDN). The [demo instance](https://demo.papermerge.com) uses exactly this setup. With a bit of effort, you can adapt {{ extra.project }} to work with **any S3-compatible storage** and **any CDN provider**.
0 commit comments