Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

[Help] Wanted to know the volume of data nginx-s3-gateway can handle #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akhilputhiry opened this issue Sep 26, 2023 · 2 comments
Closed

Comments

@akhilputhiry
Copy link

We have buckets with several terabytes of data
Wanted to know whether some benchmarking is done by someone

@4141done
Copy link
Collaborator

Hello,
We haven't done any formal benchmarking on this project as far as I know (my co-maintainer may have earlier in the project and can correct me when they get back from vacation).

There will be a small amount of overhead from using the library on each request for a non-cached file. However, after the authorization steps are taken care of, the request is simply routed to your s3 bucket using NGINX's proxy_pass directive which has a pretty well understood performance profile.

Here's a link to an open source benchmark that uses the directive as part of its tests. I have not researched the quality of that particular benchmark but it should give you some idea.

After that, standard S3 access characteristics should apply. Requests for cached files if you configure caching in NGINX should be much faster.

Unless you're attempting to list out all your files using the index page feature, the size the bucket should not matter from the perspective of this project.

Configuring NGINX for caching and performance will be specific to your workload and hardware. Here's a basic guide we published a while back that I found helpful in understanding NGINX tuning.

@dekobon
Copy link
Collaborator

dekobon commented Oct 2, 2023

Hi there,

The data size of the bucket should not present a problem for performance. The biggest consumer of resources will be the number of times a AWS HTTP signature is recalculated. This would mean that a system that had many small objects with an access pattern where the same objects were not being frequently accessed would see the biggest performance issues. However, for most use cases even then, I do not imagine it being a big problem. Moreover, you can scale out to running multiple instances of nginx.

If you wanted to run a proper performance benchmark, I would suggest that you use COSBench for the benchmark and write a custom adapter that extends the S3 adapter. You should be able to reuse the write portion of the S3 adapter, but you will need to rewrite the portion that accesses the object such that it can point to the NGINX S3 Gateway.

@nginx nginx locked and limited conversation to collaborators Oct 2, 2023
@dekobon dekobon converted this issue into discussion #178 Oct 2, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants