Skip to content

feat(flags): add a flag_fallback_cache that tracks feature flag evaluation results and uses them as fallback values whenever the /flags API isn't available #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 7, 2025

Conversation

dmarticus
Copy link
Contributor

@dmarticus dmarticus commented Jun 27, 2025

what it says on the tin.

Somewhere between a prototype and an actual thing that I can ship, maybe a bit closer to the latter 😉

The idea here is laid out in this thread: https://posthog.slack.com/archives/C07Q2U4BH4L/p1751050758806129; the TL;DR is that rather than encourage our customers to build around any sort of downtime of our /flags API, we should have the SDKs handle this for them. It does introduce a new change to the SDK paradigm in that the SDK will no longer be stateless w.r.t. flags – part of the initialization process will involve maintaining a cache of flag evaluations by key for a given token.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Added native feature flag caching mechanism to provide fallback values when the /flags API is unavailable, improving reliability and performance.

  • Implemented FlagCache in posthog/utils.py with versioning, TTL support, and LRU eviction strategy
  • Added client configuration options in posthog/client.py for cache control (enable_flag_cache, flag_cache_size, flag_cache_ttl)
  • Added automatic cache invalidation when flag definitions change through version tracking
  • Implemented stale cache access as fallback when remote evaluation fails
  • Configured cache clearing on quota limit hits for efficient resource management

4 files reviewed, 1 comment
Edit PR Review Bot Settings | Greptile

@dmarticus dmarticus changed the title feat(flags): add a native flag evaluation that tracks the status of feature flags and uses them as fallback values whenever the /flags API isn't available feat(flags): add a native flag evaluation cache that tracks the status of feature flags and uses them as fallback values whenever the /flags API isn't available Jun 27, 2025
@dmarticus dmarticus changed the title feat(flags): add a native flag evaluation cache that tracks the status of feature flags and uses them as fallback values whenever the /flags API isn't available feat(flags): add a native flag evaluation cache that tracks feature flag evaluation results and uses them as fallback values whenever the /flags API isn't available Jun 27, 2025
@dmarticus dmarticus requested a review from a team June 27, 2025 23:16
@dmarticus dmarticus moved this to In Review in Feature Flags Jun 27, 2025
@dmarticus dmarticus changed the title feat(flags): add a native flag evaluation cache that tracks feature flag evaluation results and uses them as fallback values whenever the /flags API isn't available feat(flags): add a flag_fallback_cache that tracks feature flag evaluation results and uses them as fallback values whenever the /flags API isn't available Jul 1, 2025
Comment on lines 109 to 115
With Redis fallback cache for high-scale applications:
>>> client = Client(
... "your-api-key",
... enable_flag_fallback_cache=True,
... flag_fallback_cache_backend="redis",
... flag_fallback_cache_redis_url="redis://localhost:6379/0"
... )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose a slightly different API, similar to how we do configs on posthog-js

# In-memory
client = Client("api-key", flag_fallback_cache=True)

# Redis URL
client = Client("api-key", flag_fallback_cache=FlagFallbackCache(backend="redis", url="redis://localhost:6379/0")

# Existing redis
client = Client("api-key", flag_fallback_cache=FlagFallbackCache(backend="redis", client=redis_client)

This avoids us flooding the main args list and offloads all of the config-handling to a separate class. That class can also handle instantiating the cache, for example. I believe the code would look much more modular that way.

Copy link
Contributor

@oliverb123 oliverb123 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put the service name in the parameters at all, it's right there in the URL schema (I would just pass fallback_cache_url='redis://...', and then list the schema's we support as we extend support to further backend systems)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because someone might pass a custom client we dont recognize (custom redis wrapper) and we need to know how to use that. Im in favor of specifying the service. OR we make it optional, and fail in case we cant use it (or just dont use it, log a warning)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to specifically prevent people passing clients here, though - for reasons outlined here - letting them pass a custom client is the same as defining an interface, except now we'd need to define one for every cache service rather than an abstraction.

I also can't find places we do this custom wrapper stuff in posthog-js?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @oliverb123 for this first implementation. The less surface area we commit to, the better. Down the road, when our caching approach is mature, we could always add an interface for custom cache providers.

@@ -126,6 +154,12 @@ def __init__(
project_root=None,
privacy_mode=False,
before_send=None,
enable_flag_fallback_cache=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This param seems overkill - if the URL isn't None, it's enabled

posthog/utils.py Outdated
import logging
import numbers
import pickle
Copy link
Contributor

@oliverb123 oliverb123 Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use json for this - keep in mind more than just python SDKs could be talking to the same cache at the same time here. Pickle has other-platform implementations, but it's python-specific enough to be very off the beaten track for lots of them

@@ -126,6 +154,12 @@ def __init__(
project_root=None,
privacy_mode=False,
before_send=None,
enable_flag_fallback_cache=False,
flag_fallback_cache_size=10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell what the unit is here, and don't think a user will be able to easily either? See note below, seems backend-specific so I'd put it in a url param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout (the unit is entries and that's clearly not obvious). Agreed about your note with URL params, makes the abstraction even cleaner. Was trying to get too cute here.

enable_flag_fallback_cache=False,
flag_fallback_cache_size=10000,
flag_fallback_cache_ttl=300,
flag_fallback_cache_backend="memory",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - you'll need to accept URLs for this anyway, and URLs are fundamentally a serialisation format to describe an access protocol and a location. I'd put these as query parameters into the URL, where they're relevant, like:

  • memory://local/?ttl=300&size=1000
  • redis://username:password@host:port/?some_param=some_value.

This has the advantage of being universal across languages, so we can expose it in a consistent manner (avoiding interface skew a bit), and letting you take vastly different arguments depending on the backend without an endlessly growing list of parameters or wrapper/enum types exposed

Comment on lines +100 to +123
"""Create a new PostHog client.

Examples:
Basic usage:
>>> client = Client("your-api-key")

With memory-based feature flag fallback cache:
>>> client = Client(
... "your-api-key",
... flag_fallback_cache_url="memory://local/?ttl=300&size=10000"
... )

With Redis fallback cache for high-scale applications:
>>> client = Client(
... "your-api-key",
... flag_fallback_cache_url="redis://localhost:6379/0/?ttl=300"
... )

With Redis authentication:
>>> client = Client(
... "your-api-key",
... flag_fallback_cache_url="redis://username:password@localhost:6379/0/?ttl=300"
... )
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this overkill to include in the docstring? makes it seem like this is the only thing worth passing to the Client, but I didn't know where else to put the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, for now at least. Documenting the behaviour of these args on posthog.com seems like the generally accepted approach

Comment on lines +1420 to +1429
except ImportError:
self.log.warning(
"[FEATURE FLAGS] Redis not available, flag caching disabled"
)
return None
except Exception as e:
self.log.warning(
f"[FEATURE FLAGS] Redis connection failed: {e}, flag caching disabled"
)
return None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally i was falling back to "memory" if they couldn't connect to redis, but that was a bad idea because I don't want to silently create an in-memory cache if they can't initialize a redis connection. Better to just warn and move on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense I think - do what's asked, or not at all.

Copy link
Contributor

@oliverb123 oliverb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this feels like a good interface to me

Comment on lines +100 to +123
"""Create a new PostHog client.

Examples:
Basic usage:
>>> client = Client("your-api-key")

With memory-based feature flag fallback cache:
>>> client = Client(
... "your-api-key",
... flag_fallback_cache_url="memory://local/?ttl=300&size=10000"
... )

With Redis fallback cache for high-scale applications:
>>> client = Client(
... "your-api-key",
... flag_fallback_cache_url="redis://localhost:6379/0/?ttl=300"
... )

With Redis authentication:
>>> client = Client(
... "your-api-key",
... flag_fallback_cache_url="redis://username:password@localhost:6379/0/?ttl=300"
... )
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, for now at least. Documenting the behaviour of these args on posthog.com seems like the generally accepted approach

Comment on lines +1420 to +1429
except ImportError:
self.log.warning(
"[FEATURE FLAGS] Redis not available, flag caching disabled"
)
return None
except Exception as e:
self.log.warning(
f"[FEATURE FLAGS] Redis connection failed: {e}, flag caching disabled"
)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense I think - do what's asked, or not at all.

@github-project-automation github-project-automation bot moved this from In Review to Approved in Feature Flags Jul 4, 2025
@dmarticus dmarticus moved this from Approved to In Review in Feature Flags Jul 7, 2025
@dmarticus dmarticus enabled auto-merge (squash) July 7, 2025 07:13
@dmarticus dmarticus merged commit b965332 into master Jul 7, 2025
7 checks passed
@dmarticus dmarticus deleted the feat/flag-evaluation-cache branch July 7, 2025 07:13
@github-project-automation github-project-automation bot moved this from In Review to Done in Feature Flags Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants