Skip to content

Add conj-reaper management command #323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nicwolff
Copy link
Contributor

@nicwolff nicwolff commented May 2, 2019

We're running cacheops in a Django Rest Framework application that serves about 20,000 requests per minute, and our biggest conj sets – for which all filters were excluded from the DNF, and the table is rarely updated – had been growing to contain over 20 million keys.

On invalidation, this blocked Redis (and our entire application!) for many seconds while the invalidate EVALSHA looped over the keys in the conj set to delete query keys. (Even using UNLINK for the deletion of the conj set itself only helped halfway.)

So we've added this Django management command, which loops continually to scan cacheops' conj keys for already-expired cache keys and removes them. It can be configured with command-line options or environment variables to run as fast as your CPUs can accommodate, and to ignore conj sets under some reasonable size.

Some other users may find it valuable, so I submit it for inclusion with your excellent library.

@Suor
Copy link
Owner

Suor commented May 21, 2019

I suspect if you are using ENV here then that's what you are using in your app. There is, however, a major way in Django - use settings everywhere, if you need to set something from ENV then assign a setting from env var in settings.py:

CACHEOPS_REAPER_SLEEP = os.environ.get('CACHEOPS_REAPER_SLEEP', 0.01)

There are many libs to support this like django-environ or dj_database_url. This way every library doesn't need to deal with it. And you can use:

from cacheops.redis import redis_client

redis_client.sscan(...)

While using it with:

CACHEOPS_REDIS="redis://redis/7" CACHEOPS_REAPER_SLEEP=0.05 ./manage.py reap-conjs ...

@bharling
Copy link

bharling commented Sep 1, 2020

I'd prefer to see this implemented as a command in the library as with the invalidate_foo commands already available? or even just have it handled automatically without the developer needing to do anything.

if it was a library-level command it could easily be dropped into a periodic celery task or something?

@codeasashu
Copy link

I have memory issues with this library and I badly need this merged. Do we have any timeline for this?

@Suor
Copy link
Owner

Suor commented Mar 29, 2022

if it was a library-level command it could easily be dropped into a periodic celery task or something?

When it needs to be implemented as a celery task. Or maybe a function, which would be easy to wrap into celery or any other bg task. Anyway not a function.

I have memory issues with this library and I badly need this merged. Do we have any timeline for this?

For this to be merged it needs to be cleaned up. It might be used as is though.

browniebroke added a commit to browniebroke/django-cacheops that referenced this pull request Jul 8, 2022
Rework Suor#323 into a reusable function that can be wrapped in a Celery task by
browniebroke added a commit to browniebroke/django-cacheops that referenced this pull request Jul 8, 2022
Rework Suor#323 into a reusable function that can be wrapped in a Celery task by
Suor added a commit that referenced this pull request Jul 11, 2022
* Add conj-reaper management command

* Move main command logic into a function and call it from the command

Rework #323 into a reusable function that can be wrapped in a Celery task by

* Rename back to reaper and set default values in logic

Co-Authored-By: Alexander Schepanovski <[email protected]>

* Don't hardcode default DB alias

* Expose reap_conjs on the package's top level API

* Update documentation

* Remove f-string as we're still supporting Python 3.5

* Fix comments in documentation

* Pluralize management command

Co-authored-by: Nic Wolff <[email protected]>
Co-authored-by: Alexander Schepanovski <[email protected]>
@Suor
Copy link
Owner

Suor commented Jul 11, 2022

Implemented in #434.

@Suor Suor closed this Jul 11, 2022
Suor added a commit that referenced this pull request Feb 25, 2023
The idea is that instead of saving all dependent cache keys in conj set
we put a simple random stamp in those and store a checksum of all
related conj stamps along with cache data.

This makes cache reads more complicated - `MGET` key + conj keys,
validate stamps checksum. However, we no longer need to store
potentially big conj sets and invalidation becomes faster, including
model level invalidation.

It also removes strong link between conj and cache keys, i.e. loss of
conj keys no longer leads to a stale cache, instead we will simply drop
the key on next read. This opens easier way for maxmemory and cluster.

So:
- more friendly to `maxmemory`, even assumes that, see #143
- eliminates issues with big conj sets and long invalidation, see #340,
- `reapconjs` is not needed with it, see #323, #434

Followups:
- docs
- remove `CACHEOPS_LRU` as it's superseeded by this generally
- make insideout default or even drop the old ways?
Suor added a commit that referenced this pull request Feb 25, 2023
The idea is that instead of saving all dependent cache keys in conj set
we put a simple random stamp in those and store a checksum of all
related conj stamps along with cache data.

This makes cache reads more complicated - `MGET` key + conj keys,
validate stamps checksum. However, we no longer need to store
potentially big conj sets and invalidation becomes faster, including
model level invalidation.

It also removes strong link between conj and cache keys, i.e. loss of
conj keys no longer leads to a stale cache, instead we will simply drop
the key on next read. This opens easier way for maxmemory and cluster.

So:
- more friendly to `maxmemory`, even assumes that, see #143
- eliminates issues with big conj sets and long invalidation, see #340,
- `reapconjs` is not needed with it, see #323, #434

Followups:
- docs
- remove `CACHEOPS_LRU` as it's superseeded by this generally
- make insideout default or even drop the old ways?
Suor added a commit that referenced this pull request Feb 25, 2023
The idea is that instead of saving all dependent cache keys in conj set
we put a simple random stamp in those and store a checksum of all
related conj stamps along with cache data.

This makes cache reads more complicated - `MGET` key + conj keys,
validate stamps checksum. However, we no longer need to store
potentially big conj sets and invalidation becomes faster, including
model level invalidation.

It also removes strong link between conj and cache keys, i.e. loss of
conj keys no longer leads to a stale cache, instead we will simply drop
the key on next read. This opens easier way for maxmemory and cluster.

So:
- more friendly to `maxmemory`, even assumes that, see #143
- eliminates issues with big conj sets and long invalidation, see #340,
  #350, #444
- `reapconjs` is not needed with it, see #323, #434

Followups:
- docs
- remove `CACHEOPS_LRU` as it's superseeded by this generally
- make insideout default or even drop the old ways?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants