-
Notifications
You must be signed in to change notification settings - Fork 232
Add conj-reaper management command #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I suspect if you are using ENV here then that's what you are using in your app. There is, however, a major way in Django - use settings everywhere, if you need to set something from ENV then assign a setting from env var in CACHEOPS_REAPER_SLEEP = os.environ.get('CACHEOPS_REAPER_SLEEP', 0.01) There are many libs to support this like django-environ or dj_database_url. This way every library doesn't need to deal with it. And you can use:
While using it with: CACHEOPS_REDIS="redis://redis/7" CACHEOPS_REAPER_SLEEP=0.05 ./manage.py reap-conjs ... |
I'd prefer to see this implemented as a command in the library as with the if it was a library-level command it could easily be dropped into a periodic celery task or something? |
I have memory issues with this library and I badly need this merged. Do we have any timeline for this? |
When it needs to be implemented as a celery task. Or maybe a function, which would be easy to wrap into celery or any other bg task. Anyway not a function.
For this to be merged it needs to be cleaned up. It might be used as is though. |
Rework Suor#323 into a reusable function that can be wrapped in a Celery task by
Rework Suor#323 into a reusable function that can be wrapped in a Celery task by
* Add conj-reaper management command * Move main command logic into a function and call it from the command Rework #323 into a reusable function that can be wrapped in a Celery task by * Rename back to reaper and set default values in logic Co-Authored-By: Alexander Schepanovski <[email protected]> * Don't hardcode default DB alias * Expose reap_conjs on the package's top level API * Update documentation * Remove f-string as we're still supporting Python 3.5 * Fix comments in documentation * Pluralize management command Co-authored-by: Nic Wolff <[email protected]> Co-authored-by: Alexander Schepanovski <[email protected]>
Implemented in #434. |
The idea is that instead of saving all dependent cache keys in conj set we put a simple random stamp in those and store a checksum of all related conj stamps along with cache data. This makes cache reads more complicated - `MGET` key + conj keys, validate stamps checksum. However, we no longer need to store potentially big conj sets and invalidation becomes faster, including model level invalidation. It also removes strong link between conj and cache keys, i.e. loss of conj keys no longer leads to a stale cache, instead we will simply drop the key on next read. This opens easier way for maxmemory and cluster. So: - more friendly to `maxmemory`, even assumes that, see #143 - eliminates issues with big conj sets and long invalidation, see #340, - `reapconjs` is not needed with it, see #323, #434 Followups: - docs - remove `CACHEOPS_LRU` as it's superseeded by this generally - make insideout default or even drop the old ways?
The idea is that instead of saving all dependent cache keys in conj set we put a simple random stamp in those and store a checksum of all related conj stamps along with cache data. This makes cache reads more complicated - `MGET` key + conj keys, validate stamps checksum. However, we no longer need to store potentially big conj sets and invalidation becomes faster, including model level invalidation. It also removes strong link between conj and cache keys, i.e. loss of conj keys no longer leads to a stale cache, instead we will simply drop the key on next read. This opens easier way for maxmemory and cluster. So: - more friendly to `maxmemory`, even assumes that, see #143 - eliminates issues with big conj sets and long invalidation, see #340, - `reapconjs` is not needed with it, see #323, #434 Followups: - docs - remove `CACHEOPS_LRU` as it's superseeded by this generally - make insideout default or even drop the old ways?
The idea is that instead of saving all dependent cache keys in conj set we put a simple random stamp in those and store a checksum of all related conj stamps along with cache data. This makes cache reads more complicated - `MGET` key + conj keys, validate stamps checksum. However, we no longer need to store potentially big conj sets and invalidation becomes faster, including model level invalidation. It also removes strong link between conj and cache keys, i.e. loss of conj keys no longer leads to a stale cache, instead we will simply drop the key on next read. This opens easier way for maxmemory and cluster. So: - more friendly to `maxmemory`, even assumes that, see #143 - eliminates issues with big conj sets and long invalidation, see #340, #350, #444 - `reapconjs` is not needed with it, see #323, #434 Followups: - docs - remove `CACHEOPS_LRU` as it's superseeded by this generally - make insideout default or even drop the old ways?
We're running cacheops in a Django Rest Framework application that serves about 20,000 requests per minute, and our biggest conj sets – for which all filters were excluded from the DNF, and the table is rarely updated – had been growing to contain over 20 million keys.
On invalidation, this blocked Redis (and our entire application!) for many seconds while the
invalidate
EVALSHA
looped over the keys in the conj set to delete query keys. (Even usingUNLINK
for the deletion of the conj set itself only helped halfway.)So we've added this Django management command, which loops continually to scan cacheops' conj keys for already-expired cache keys and removes them. It can be configured with command-line options or environment variables to run as fast as your CPUs can accommodate, and to ignore conj sets under some reasonable size.
Some other users may find it valuable, so I submit it for inclusion with your excellent library.