Skip to content

Conversation

csowden-atlassian
Copy link

@csowden-atlassian csowden-atlassian commented Jun 26, 2025

Motivation

Our build cache system was invalidating more often than necessary due to changes in options that don't actually affect build output. For example:

  • Changes to instanceId would invalidate the cache even though it doesn't affect builds
  • Reading Object.keys() on option objects would trigger invalidation
  • Changes to nested feature flag values would invalidate entire feature flag object
  • No way to blocklist non-impactful options from triggering invalidation

This resulted in unnecessary rebuilds and reduced development performance, especially in watch mode where options might change frequently.

Changes

Implemented a more granular option invalidation system with the following improvements:

  1. Path-based tracking: Track exact paths of accessed options (e.g., ['featureFlags', 'granularOptionInvalidation']) instead of just top-level keys
    // Before: Invalidates all of featureFlags
    options.featureFlags.someFlag
    // After: Only invalidates specific flag
    invalidateOnOptionChange(['featureFlags', 'someFlag'])
  2. Option blocklist: Added configuration to prevent certain options from triggering invalidation:
optionInvalidation: {
  blocklist: ['instanceId', 'metrics.*'],
  trackMetrics: true
}
  1. Smarter enumeration handling: Skip invalidation for Object.keys() operations when they don't affect build output
  2. Invalidation metrics: Added detailed tracking of which options cause the most invalidations
  3. Feature flag controlled: All changes are behind granularOptionInvalidation flag for safe rollout

The changes maintain backward compatibility while significantly reducing unnecessary cache invalidations.

Changes

Initial testing shows:

  • Reduced cache invalidations by preventing non-impactful options from triggering rebuilds
  • More precise invalidation of nested options
  • Better visibility into what causes cache invalidations through metrics

Checklist

  • Extensive test suite added covering all new functionality
  • Added integration tests for cache behavior
  • Added documentation for option invalidation configuration
  • Maintained backward compatibility
  • Feature flag controlled for safe rollout

Copy link

changeset-bot bot commented Jun 26, 2025

🦋 Changeset detected

Latest commit: e835b22

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 91 packages
Name Type
@atlaspack/types Minor
@atlaspack/core Minor
@atlaspack/bundler-experimental Patch
atlaspack Patch
@atlaspack/package-manager Patch
@atlaspack/plugin Patch
@atlaspack/bundle-stats Patch
@atlaspack/optimizer-inline-requires Patch
@atlaspack/packager-html Patch
@atlaspack/packager-js Patch
@atlaspack/packager-svg Patch
@atlaspack/packager-xml Patch
@atlaspack/reporter-bundle-analyzer Patch
@atlaspack/reporter-bundle-stats Patch
@atlaspack/reporter-cli Patch
@atlaspack/reporter-dev-server-sw Patch
@atlaspack/reporter-dev-server Patch
@atlaspack/reporter-json Patch
@atlaspack/reporter-sourcemap-visualiser Patch
@atlaspack/transformer-babel Patch
@atlaspack/transformer-jsonld Patch
@atlaspack/validator-typescript Patch
@atlaspack/cli Patch
@atlaspack/register Patch
@atlaspack/test-utils Patch
@atlaspack/query Patch
@atlaspack/transformer-html Patch
@atlaspack/bundler-default Patch
@atlaspack/bundler-library Patch
@atlaspack/compressor-brotli Patch
@atlaspack/compressor-gzip Patch
@atlaspack/compressor-raw Patch
@atlaspack/namer-default Patch
@atlaspack/optimizer-blob-url Patch
@atlaspack/optimizer-css Patch
@atlaspack/optimizer-cssnano Patch
@atlaspack/optimizer-data-url Patch
@atlaspack/optimizer-htmlnano Patch
@atlaspack/optimizer-image Patch
@atlaspack/optimizer-svgo Patch
@atlaspack/optimizer-swc Patch
@atlaspack/optimizer-terser Patch
@atlaspack/packager-css Patch
@atlaspack/packager-raw-url Patch
@atlaspack/packager-raw Patch
@atlaspack/packager-ts Patch
@atlaspack/packager-wasm Patch
@atlaspack/packager-webextension Patch
@atlaspack/reporter-build-metrics Patch
@atlaspack/reporter-bundle-buddy Patch
@atlaspack/reporter-conditional-manifest Patch
@atlaspack/reporter-lsp Patch
@atlaspack/reporter-tracer Patch
@atlaspack/resolver-default Patch
@atlaspack/resolver-glob Patch
@atlaspack/resolver-repl-runtimes Patch
@atlaspack/runtime-browser-hmr Patch
@atlaspack/runtime-js Patch
@atlaspack/runtime-react-refresh Patch
@atlaspack/runtime-service-worker Patch
@atlaspack/runtime-webextension Patch
@atlaspack/transformer-css Patch
@atlaspack/transformer-glsl Patch
@atlaspack/transformer-graphql Patch
@atlaspack/transformer-image Patch
@atlaspack/transformer-inline-string Patch
@atlaspack/transformer-inline Patch
@atlaspack/transformer-js Patch
@atlaspack/transformer-json Patch
@atlaspack/transformer-less Patch
@atlaspack/transformer-mdx Patch
@atlaspack/transformer-postcss Patch
@atlaspack/transformer-posthtml Patch
@atlaspack/transformer-pug Patch
@atlaspack/transformer-raw Patch
@atlaspack/transformer-react-refresh-wrap Patch
@atlaspack/transformer-sass Patch
@atlaspack/transformer-svg-react Patch
@atlaspack/transformer-svg Patch
@atlaspack/transformer-toml Patch
@atlaspack/transformer-typescript-tsc Patch
@atlaspack/transformer-typescript-types Patch
@atlaspack/transformer-webextension Patch
@atlaspack/transformer-webmanifest Patch
@atlaspack/transformer-worklet Patch
@atlaspack/transformer-xml Patch
@atlaspack/transformer-yaml Patch
@atlaspack/validator-eslint Patch
@atlaspack/config-default Patch
@atlaspack/config-repl Patch
@atlaspack/config-webextension Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

return String(value);
}

if (typeof value === 'object') {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hashing every element (which is expensive for large collections), it creates a representative hash
using only the array's length and three sample elements (first, middle, last). This provides a reliable hash
signature that changes when significant array properties change, while avoiding the performance penalty of
processing every element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work when an element that's not in the hash changes that doesn't affect the length? Are these arrays of options from config getting passed here? Do we actually have any large collections in practice?

We hash multi-megabyte bundles, I'd argue that hashing a string with hundreds of options concatenated can't be too expensive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work when an element that's not in the hash changes that doesn't affect the length

Great point, TBH this didn't cross my mind at the time, but seems quite obvious now you mention 😬.

Since we don't really have large collections in practice either, I'll opt to remove this 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we should hash everything

return invalidatedKeys;
}

// New behavior with granular path tracking and blocklist
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option invalidation blocklist allows users to specify configuration options that shouldn't trigger rebuilds when changed. This improves performance by preventing unnecessary invalidations for options that rarely affect build output.

The implementation:

  1. Uses a default blocklist for options known to be non-impactful (e.g., 'instanceId')
  2. Supports user-configured blocklists through options.optionInvalidation.blocklist
  3. Supports prefix-based blocking with wildcard notation (e.g., 'metrics.*')
  4. Optionally tracks skipped invalidations for reporting and diagnostics

@csowden-atlassian csowden-atlassian changed the title Csowden/improve granular option invalidation Improve option invalidation granularity and reduce unnecessary cache invalidations Jun 27, 2025
@csowden-atlassian csowden-atlassian marked this pull request as ready for review June 27, 2025 06:04
@csowden-atlassian csowden-atlassian requested a review from a team as a code owner June 27, 2025 06:04
Copy link
Contributor

@marcins marcins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Some minor feedback, but mostly looks good! (TBF I didn't go to deep on the hundreds of lines of new tests... but appreciate they're there!)

// Check if we should track object enumeration operations
// This is controlled by a feature flag - the previous behavior was to track these
// but it can cause over-invalidation
const granularOptionInvalidationEnabled = getFeatureFlag(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary to assign this to a variable? (i.e. just stick the getFeatureFlag call inside the if - makes it easier to cleanup)

return String(value);
}

if (typeof value === 'object') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work when an element that's not in the hash changes that doesn't affect the length? Are these arrays of options from config getting passed here? Do we actually have any large collections in practice?

We hash multi-megabyte bundles, I'd argue that hashing a string with hundreds of options concatenated can't be too expensive.

assert.ok(onRead.calledOnce);
});

it('handles Object.keys() operations based on the granularOptionInvalidation feature flag', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the test run instead with granularOptionInvalidationEnabled set to true and again set to false (FWIW with Mocha I've been doing this lately for some tests:

// intentionally verbose so it's obvious rather than just `[true, false].forEach`
[{myFlag: true}, {myFlag: false}].forEach(({myFlag}) => {
   it(`does something (myFlag: ${myFlag.toString()})`, () => {
      .. set the feature flag whether it's in package.json or by using `setFeatureFlags`)
   }
});

Example:

await inputFS.mkdirp(inputDir);
});

async function createSimpleProject() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use fsFixture for inline test code for integration tests that makes this kind of setup easier to reason about.. Just search the codebase, plenty of examples. Avoids the need for physical files for fixtures as well.

value: mixed,
): RequestGraphNode => {
// Normalize option to string format for node ID
const optionKey = Array.isArray(option) ? option.join('.') : option;
Copy link
Contributor

@yamadapc yamadapc Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should JSON.stringify this to handle keys with . etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific commit

updated to serialise array based option paths using JSON.stringify(option) instead of joining with dots, which properly handles keys containing dots, special characters etc.

I've used an array: prefix to try to distinguish string/array based keys

e.g.

// Without prefix - ambiguous cases:
JSON.stringify(['a.b']) // → '["a.b"]' 
'["a.b"]'              // → could be a string key or serialized array

// With prefix - unambiguous:
`array:${JSON.stringify(['a.b'])}` // → 'array:["a.b"]' (clearly an array)
'["a.b"]'     

* Other reads work normally.
* NOTE: By default, we DO track Object.keys()/ownKeys operations for backward compatibility,
* but this can be disabled with the 'granularOptionInvalidation' feature flag. When disabled,
* property enumeration won't trigger cache invalidation, which prevents unnecessary invalidations
Copy link
Contributor

@yamadapc yamadapc Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumerating a field should trigger invalidations when keys are added/removed.

Because output from code listing a property might change if there are new items or keys.

It there code that was triggering this? Maybe we can try to rework it to avoid invalidation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific commit - was a bit of a tricky one to figure out, but I'm fairly confident this addresses the root cause

// Maximum number of parent nodes to invalidate per option change
maxInvalidationsPerOption?: number,
// Batch option invalidations together
batchInvalidations?: boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this and the maxInvalidationsPerOptions keys do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually used - just needed cleaning up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants