Skip to content
This repository was archived by the owner on Jul 19, 2023. It is now read-only.

Add support for global size-based retention policy #628

Conversation

kolesnikovae
Copy link
Contributor

@kolesnikovae kolesnikovae commented Apr 14, 2023

Resolves #612.

One of the drawbacks is that mount points and symlinks are not honoured: if one uses a separate file system for a specific tenant (or a group), this fact is completely ignored. An ultimate solution to this would be tracking each file system independently. As a workaround, we could make it so that a user can exclude tenants from this global policy.

This change also allows implementing per-tenant retention policies in an easy way.

Comment on lines +155 to +157
case b := <-f.evictCh:
b.evicted, b.err = f.blockQuerier.evict(b.blockID)
close(b.done)
Copy link
Contributor Author

@kolesnikovae kolesnikovae Apr 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to the PR, but: to me it's not entirely clear whether the head block is visible to queries in-between Flush and runBlockQuerierSync that should pick the new block:

		case <-f.Head().flushCh:
			if err := f.Flush(ctx); err != nil {
				level.Error(f.logger).Log("msg", "flushing head block failed", "err", err)
				continue
			}
			f.runBlockQuerierSync(ctx)

@kolesnikovae kolesnikovae force-pushed the feat/size-based-retention-policy-enforcement branch from d0dc191 to 0d8b636 Compare May 9, 2023 16:10
@kolesnikovae kolesnikovae marked this pull request as ready for review May 9, 2023 16:26
@kolesnikovae kolesnikovae requested a review from simonswine May 9, 2023 16:26
Comment on lines +24 to +26
defaultMinFreeDisk = 10 * 1024 * 1024 * 1024 // 10Gi
defaultMinDiskAvailablePercentage = 0.05
defaultRetentionPolicyEnforcementInterval = 5 * time.Minute
Copy link
Contributor Author

@kolesnikovae kolesnikovae May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth making these parameters configurable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, it will always be a balance between flexibility and making Pyroscope complex to operate, but being able to change that or disable deletion all together is more than a nice to have. That said I wouldn't mind if you do this in here or in a follow up PR.

Copy link
Collaborator

@simonswine simonswine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work! I have tested this extensively locally and it did behave exactly as I expected it to. Also good effort on covering it all with tests.

// cleanup there to avoid deleting all blocks when disk usage reporting
// is delayed.
if volumeStatsPrev != nil && volumeStatsPrev.BytesAvailable >= volumeStatsCurrent.BytesAvailable {
level.Warn(e.logger).Log("msg", "disk utilization is not lowered by deletion of a block, pausing until next cycle")
Copy link
Collaborator

@simonswine simonswine May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	defaultRetentionPolicyEnforcementInterval = 5 * time.Minute

Just a note here, for my own Laptop, the filesystem btrfs is fairly slow to reflect the free disk change (or doesn't at all because of snapshots). So that basically means we would delete a block every 5 minutes. Which I think is ok and better than deleting all blocks because the free disk space doesn't go down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a great point, which worth mentioning in docs. As a small optimization, I'll implement batch removal that relies on the actual block size on disk.

@kolesnikovae
Copy link
Contributor Author

Thank you so much for the review and help with testing @simonswine!

@kolesnikovae kolesnikovae changed the title feat: support for global size-based retention policy Add support for global size-based retention policy May 11, 2023
@kolesnikovae kolesnikovae merged commit 86b115f into grafana:main May 12, 2023
simonswine pushed a commit to simonswine/pyroscope that referenced this pull request Jun 30, 2023
* feat: support for global size-based retention policy

* feat: use manager for ingester subservices

* add querier block eviction test

* decouple retention enforcer and ingester
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cleanup blocks has unexpected behaviour when using multiple tenats
2 participants