Skip to content

Commit afe7c70

Browse files
committed
s3 part 2
1 parent 6671c8c commit afe7c70

File tree

3 files changed

+563
-5
lines changed

3 files changed

+563
-5
lines changed

docs/developer-manual/architecture.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ This is a huge task—and that’s exactly what the **Path Template Worker** is
318318

319319
---
320320

321-
## S3 Worker
321+
## S3 Worker – Part 1
322322

323323
**{{ extra.project }}** supports S3-compatible storage systems. When S3 is enabled, all documents are uploaded to the S3 bucket.
324324

@@ -337,6 +337,44 @@ In deployments:
337337

338338
---
339339

340+
## S3 Worker – Part 2
341+
342+
The example from *S3 Worker – Part 1* was intentionally simple. However, its simplicity may obscure the real value of using S3 storage. S3 serves two key purposes:
343+
344+
1. **Scalability**
345+
2. **Performance**
346+
347+
The diagram below illustrates the first point:
348+
349+
![S3 Worker](./architecture/6-s3-worker-multiple-apps.svg)
350+
351+
In this scenario, there are two Kubernetes nodes, each running two app instances. Since nodes are isolated, `app1` and `app2` have access to different local storage than `app3` and `app4`. Now imagine `app4` receives a request to "merge two documents," but those documents are only present in node 1’s local storage. This is where S3 becomes essential: because **all documents are stored centrally in the S3 bucket**, `app4` can simply **download the necessary documents from S3** and proceed with the task.
352+
353+
This setup allows {{ extra.project }} containers to be **stateless**—they don’t need attached persistent storage. Local storage may be ephemeral, as pods move between nodes. But S3 remains the **single source of truth** for document storage. As a result, **any app can access any document at any time**, regardless of where it runs. This pattern enables horizontal scalability: add more apps, add more nodes—S3 makes it work.
354+
355+
---
356+
357+
### What About Performance?
358+
359+
You may wonder: how is performance related to S3?
360+
361+
Until now, it was implicitly assumed that app containers serve documents directly to end-users. This might work fine for low-traffic scenarios (e.g. 5–10 users), but it quickly becomes a bottleneck in high-load situations (e.g. 1000+ users). Here’s why:
362+
363+
> Serving PDF files is **very slow** compared to typical HTTP requests.
364+
> (Think: 5 seconds to download a PDF vs. 200ms for a JSON API call)
365+
366+
In practical terms: instead of handling 25–50 lightweight API requests, the app might be **busy serving one large file**. Add to this the potential distance between user and server (e.g. user in the US, app running in Europe), and the situation worsens—users may wait **10+ seconds** just to download a 2–3 MB PDF.
367+
368+
S3 solves this by **offloading document delivery** from the app container:
369+
370+
* Users can download documents **directly from S3**
371+
* App containers stay free to handle fast, critical API calls
372+
* With **CDN integration**, files are served even faster and closer to the user
373+
374+
In fact, {{ extra.project }} integrates seamlessly with S3 and AWS CloudFront (CDN). The [demo instance](https://demo.papermerge.com) uses exactly this setup. With a bit of effort, you can adapt {{ extra.project }} to work with **any S3-compatible storage** and **any CDN provider**.
375+
376+
---
377+
340378
## OCR Worker
341379

342380
📌 *Coming soon*

docs/developer-manual/architecture/5-s3-worker-simple.svg

Lines changed: 4 additions & 4 deletions
Loading

0 commit comments

Comments
 (0)