-
-
Notifications
You must be signed in to change notification settings - Fork 766
Prune by either int or interval for all retention policies #8775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
All reactions
-
👍 1 reaction -
🚀 1 reaction
Accepts either int or interval, first tries parsing int then tries parsing as interval if that fails. Returns a timedelta for easy date math later. Now allows intervals of length 0 as a 0-length timedelta is perfectly fine to work with.
Support is added for setting prune retention with either an int (keep n archives) or an interval (keep within). This works much like --keep-within currently does, but extends support to all retention filters. Additionally adds a generic --keep flag to take over (or live alongside) both --keep-last and --keep-within. --keep-last is no longer an alias of --keep-secondly, now keeps archives made on the same second. Timestamp comparison is made inclusive, 'within 5 seconds' now means 'current second and the preceding 5 seconds' to match intuitive understanding of intervals in the past. Comparisons against archive timestamp are made to use local timezone instead of UTC. Should be equal result in practice, but allows for easier testing with frozen local time.
Default with tzinfo=None is local timezone anyway, no need to set it manually.
No more fetching new now() timestamp for every check. Hard to test for this, but could have once in a blue moon caused a missed archive due to the timestamp changing during iteration.
9beb1b7
to
9b828f5
Compare
This is loosely related to #8715 as well. I like the idea very much (I didn't and can't review the code, but I'm happy to help with the docs if desired). Borg's current retention policy IMHO is rather hard to understand for beginners (but very safe, because it usually keeps more than what users expect) and an interval-based approach feels more intuitive to me. However, special care is necessary to not cause unexpected behaviour, especially with frequent backups (e.g. daily), combined with overlapping rules (e.g. including all of within, daily, weekly, monthly, and yearly), and some missing backups (and thus the need to sometimes keep the "next best" to later fulfil less frequent rules). As far as I remember there were some issues with this in the early days of Borg that at least caused multiple major docs overhauls (not sure whether the behaviour actually changed substantially, but the docs definitely did because many users understood things wrong). Maybe Thomas can give some insights about the challenges back then? |
All reactions
Sorry, something went wrong.
Additionally simplifies the break-out logic and moves it up front for readability.
Additionally changes name of default func to be more descriptive.
82c9e4e
to
da77550
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Some minor stuff I found...
Sorry, something went wrong.
All reactions
Puts imports back on just two lines.
Uses itertools counter for more descriptive code.
Since timestamp is not used when given just int to retain, can pass None for simpler code.
Sets UTC timezone on MockArchives in helpers_test Is more true to how real archive timestamps are instantiated, as per `parse_timestamp` in time.py, which defaults to overriding tzinfo with timezone.utc.
Can simply replace tzinfo with timezone.utc to create zoned datetime.min without underflow issues.
64c48ed
to
38fbd43
Compare
It seems the GHA test runner did not like the previous |
All reactions
Sorry, something went wrong.
I'd vote for removing
IMHO this must be consistent with #8776. Question is what we agree on. I honestly believe differently: If it is |
All reactions
Sorry, something went wrong.
If you want to be able to backport this to 1.4-maint, you must not break compatibility in this PR, but deprecating some options that can be replaced by a better new option can be done here. |
All reactions
Sorry, something went wrong.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #8775 +/- ##
==========================================
+ Coverage 81.89% 81.94% +0.04%
==========================================
Files 74 74
Lines 13324 13351 +27
Branches 1968 1971 +3
==========================================
+ Hits 10912 10940 +28
+ Misses 1750 1746 -4
- Partials 662 665 +3 ☔ View full report in Codecov by Sentry. |
All reactions
Sorry, something went wrong.
How do you do separation of functionality like this when a breaking 2.0 change is to be backported with deprecations instead? The code might turn out very different, especially with changes introduced on master. |
All reactions
Sorry, something went wrong.
Fixes bug where changed is-atleast-one-flag-given check didn't work. Adds a test to verify.
I don't think #8776 is relevant, as it matches absolute timestamps based on a pattern (like how the I had a whole comment here written out defending the inclusive check based on comparing timestamps only using seconds-granularity. In that scenario this makes sense. Having taken a closer look I see that |
All reactions
Sorry, something went wrong.
Adds optional interval support for all prune retention flags Support is added for setting prune retention with either an int (keep n archives) or an interval (keep within). This works much like --keep-within currently does, but extends support to all retention filters. Additionally adds a generic --keep flag to take over (or live alongside) both --keep-last and --keep-within. --keep-last is no longer an alias of --keep-secondly, now keeps archives made on the same second. Comparisons against archive timestamp are made to use local timezone instead of UTC. Should be equal result in practice, but allows for easier testing with frozen local time.
Comments fixed, inclusive timestamp change reverted with tests made slightly more robust. If that's it for implementation comments I'll extend the documentation with an involved example like the retention policy I have described in the PR description and write a test for it. I am still not sure how to go about doing breaking change on |
All reactions
Sorry, something went wrong.
Easiest is to backport a non-breaking change and then do the breaking change in a separate PR afterwards (and not backport that). There are some code structure differences between master and 1.4-maint though, so even backporting the first PR might not be trivial. As 1.4.x is a stable release, we need to be very careful to not break anything there. That sometimes can mean "no backport" if the change is too big or too risky. |
All reactions
Sorry, something went wrong.
Last explicit default removed.
The changiest change introduced here is allowing intervals to be of length Am I correct in assuming I just put entries for |
All reactions
Sorry, something went wrong.
In regards to backporting this to I kinda feel like I accidentally caused some confusion, so: Do we even have breaking changes right now? The only "breaking" change is that |
All reactions
Sorry, something went wrong.
Ah I forgot there is one more thing: But if you want, it's pretty easy to just not introduce that change now, and do it in the breaking PR instead. |
All reactions
Sorry, something went wrong.
There's only a difference if someone actually managed to create multiple archives within the same second, right? I'd say that's similar to Anyway, I don't think we really have a problem here. If we decide on dropping Not for me to decide, but I suggest dealing with that stuff in the backport PR.
It does: It can also match all archives within e.g. a full month relative to "now" (and added just recently, also relative to arbitrary other timestamps). However, I agree that there's no practical difference due to the microseconds scale. I didn't consider that. Yet I'd still vote for consistency. I just skimmed through @c-herz's work in #8776 and noticed that matching exclusively causes some problems there as well (minor, yet there are some). IMHO the consistency question still somewhat matters, so how about we simply decide on always matching inclusively? I'm thus also tagging @c-herz, WDYT? |
All reactions
Sorry, something went wrong.
After some consideration I think this new retention interval should also account for the oldest rule that was implemented for the retention count flags. I'll get back to this. |
All reactions
Sorry, something went wrong.
before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail. |
All reactions
-
👍 1 reaction
Sorry, something went wrong.
Guess I'ld like to merge this for next beta, can we finish this until then? |
All reactions
Sorry, something went wrong.
I'll endeavor to get something done soon then. Factorio has been a distraction 😄 |
All reactions
-
😄 1 reaction
Sorry, something went wrong.
btw, i dissected the |
All reactions
Sorry, something went wrong.
ThomasWaldmann
jdchristensen
Successfully merging this pull request may close these issues.
None yet
This PR adds optional interval handling for all retention filter flags of the
prune
command, previously only available on--keep-within
. E.g.prune --keep-hourly=7d
will keep hourly archives for the last 7 days regardless of count. Adds--keep
which acts as a combination of--keep-last
and--keep-within
.Implements #8637.
I opted to make the existing flags handle both ints and intervals instead of adding new flags as there are already so many flags for this command. Simplified some prune filtering: With the default filter function now also handling intervals,
prune_within
is no longer needed as a special case.I added a library to freeze time in testing, let me know if that's not wanted and I'll figure out something else. It's such a hassle to deal with timestamps relative to
now()
in test. The tests should be fairly comprehensive in checking both their timely filter (hourly/yearly, etc.) and the new inclusive timestamp check. I did not add new helper tests forprune_split
as this function is not used anywhere other thanprune_cmd
and isn't really a helper.TODOs:
TODOs from comments:
Notes:
--keep
: Merges functionality of--keep-within
and--keep-last
with new int-or-interval handling.--keep-last
is no longer an alias of--keep-secondly
and thus keeps archives made on the same second. It now fits together with--keep-within
and new flag--keep
.int_or_interval
and creatingtimedelta
objects it seemed weird to restrict input like this. Not extremely useful unless you want to prune on archives in the same second as they were created, but it seemed logical when setting up tests to verify new--keep-last
behavior.. Technically breaking, but will likely not meaningfully affect any real scenarios.xx:xx:10
and there's an archive atxx:xx:05
then--keep-within 5S
should cover that archive. Easy change to make when already altering the filtering logic. Technically breaking, but will likely not meaningfully affect any real scenarios.This has been a pet peeve of mine in the pruning command for a long time. In my mind the most clean backup regime keeps all backups for a short time (allows catching small errors quickly), then hourly backups for a reasonable time (say, 7 days), then daily backups for a little longer and then finally weekly/monthly as storage permits. This was not previously possible, requiring for example
--keep-within 7d --keep-hourly 168 --keep-daily 14 --keep-weekly -1
for an approximation. But keeping around 168 archives for a machine that's only running a few hours a day seems mighty excessive. So here we are :)With this implemented my ideal retention for my primary laptop with archives every 15m looks something like
--keep 3d --keep-hourly 7d --keep-daily 30d --keep-weekly -1
.