handle migrated kind nodes in diff and merge logic #6401

ajtmccarty · 2025-05-02T23:24:55Z

IFC-1452

TODOs

add test for node migrated and then deleted
add test for two node kind migrations on the same nodes on a branch
add test for inheritance migration

updates to the diff and merge logic to support nodes that have had their kind or namespace update on a branch
this problematic workflow is

make a branch
update the name or namespace of a schema on the branch. this runs a migration that creates new nodes with the same UUID as existing nodes, but with a new kind
merge the branch

previously, the diff calculation queries completely ignored these duplicated nodes, so they were not part of the diff or the merge logic at all. the only reason no one recognized the problem was that the schema migrations run during the merge operation would create a third node with the same UUID on the default branch. this kind of worked, but the database was in an unexpected state and would only get more unexpected as time continued

the basic solution I've implemented in this PR is to track which nodes are part of a kind migration as we calculate the diff and then merge the migrated-kind nodes using their own special query at the end of the diff merge logic. we cannot keep the migrated-kind nodes completely separate from the regular updates, b/c a node can have both kinds of changes (eg I update a relationship on a node, then I update the node schema's name, then I update an attribute on that node).

codspeed-hq · 2025-05-02T23:29:19Z

CodSpeed Performance Report

Merging #6401 will not alter performance

_{Comparing ajtm-05022024-diff-kind-migrations (33a6f93) with stable (f16e7c8)}

Summary

✅ 10 untouched benchmarks

migrated kind/inheritance

…f-kind-migrations

ajtmccarty · 2025-05-09T21:56:42Z

backend/infrahub/core/diff/calculator.py

@@ -69,6 +70,56 @@ async def _run_diff_calculation_query(
                has_more_data = last_result.get_as_type("has_more_data", bool)
            offset += limit

+    async def _apply_kind_migrated_nodes(


runs a new cypher query to determine which nodes are part of a node kind/inheritance migration and sets is_node_kind_migation = True on all of those nodes
this will be run any time we generate or update a diff and is_node_kind_migration will stay True on a given node across updates once it is set

ajtmccarty · 2025-05-09T21:57:49Z

backend/infrahub/core/diff/combiner.py

these are all updates to use NodeIdentifier for uniqueness instead of the node's UUID b/c the UUID will not actually be unique in the case of a node kind/inheritance migration

ajtmccarty · 2025-05-09T21:58:26Z

backend/infrahub/core/diff/enricher/hierarchy.py

changes to support using the db_id instead of labels for node uniqueness

ajtmccarty · 2025-05-09T21:59:37Z

backend/infrahub/core/diff/merger/merger.py

@@ -53,6 +59,11 @@ async def merge_graph(self, at: Timestamp) -> EnrichedDiffRoot:
        )
        log.info(f"Diff {latest_diff.uuid} retrieved")
        batch_num = 0
+        migrated_kinds_id_map = {


this map tells the node merge queries which node with a given UUID is the new one and which nodes are part of a node kind migration on the branch

ajtmccarty · 2025-05-09T22:00:20Z

backend/infrahub/core/diff/merger/merger.py

                )
                await merge_properties_query.execute(db=self.db)
            log.info(f"Batch #{batch_num} merged")
            batch_num += 1
+        migrated_kind_uuids = {n.identifier.uuid for n in enriched_diff.nodes if n.is_node_kind_migration}
+        if migrated_kind_uuids:
+            migrated_merge_query = await DiffMergeMigratedKindsQuery.init(


the new specialized query to merge nodes with migrated kinds

ajtmccarty · 2025-05-09T22:08:14Z

backend/infrahub/core/diff/query/merge.py

        self.add_to_query(query=query)


+class DiffMergeMigratedKindsQuery(Query):


new query to merge nodes with a kind migration
when a node's kind or inheritance is migrated, we create a new node with the same UUID and an updated kind property and/or labels, then add "deleted" edges for all edges linked to the old node and "active" edges to the new node
this query just copies the latest edges linked to any node with a given UUID from the source to the target branch, unless they already exist, in which case they are ignored

ajtmccarty · 2025-05-09T22:08:53Z

backend/infrahub/core/diff/query/save.py

changes to save is_node_kind_migration property and uniquely identify nodes using UUID/database ID instead of just UUID

ajtmccarty · 2025-05-09T22:09:34Z

backend/infrahub/core/diff/query_parser.py

-            diff_node.identifier.kind = database_path.node_kind
-            diff_node.db_id = database_path.node_db_id
-            diff_node.from_time = database_path.node_changed_at
-            diff_node.status = database_path.node_status


this was necessary when we were combining diffs for a node with an updated kind into 1, but we don't do that any longer

ajtmccarty · 2025-05-09T22:10:19Z

backend/infrahub/core/query/diff.py

-    AND ALL(
-        r in [r_node, r_prop]
-        WHERE r.from < $to_time AND r.branch = top_diff_rel.branch
-    )


apparently using all is less efficient than just doing the same conditional multiple times if you can

ajtmccarty · 2025-05-09T22:11:27Z

backend/infrahub/core/query/diff.py

+    has_more_data: bool
+
+
+class DiffMigratedKindNodesQuery(DiffCalculationQuery):


new lightweight (well, pretty lightweight) query to identify which nodes are part of a kind migration

ogenstad

Do we need to run any performance tests against this to see if there are any regressions?

ogenstad · 2025-05-12T11:52:23Z

backend/infrahub/core/diff/calculator.py

+    ) -> None:
+        has_more_data = True
+        offset = 0
+        limit = int(config.SETTINGS.database.query_size_limit)


config.SETTINGS.database.query_size_limit already is an int

removed int() here and in one other place above for the same config parameter

ogenstad · 2025-05-12T11:54:50Z

backend/infrahub/core/diff/calculator.py

+            )
+            log.info(f"Getting one batch of migrated kind nodes {limit=}, {offset=}")
+            await diff_query.execute(db=self.db)
+            log.info("Migrated kind nodes query complete")


It might be confusing to have this show up multiple times as it's just within this iteration, perhaps also include the offset to indicate that the query is complete for a given offset?

I added the {limit=}, {offset=} to the query complete log message here and above in _run_diff_calculation_query

ogenstad · 2025-05-12T12:02:48Z

backend/infrahub/core/diff/model/path.py

+            self.path_to_node.get("branch"),
+            self.path_to_attribute.get("branch"),
+            self.path_to_property.get("branch"),
+        )


When using this property would the order of these tuples matter in any way? I.e. would the caller have to know how the first str differs from the other two?

actually, nothing uses this anymore, so just going to delete it

LucasG0 · 2025-05-12T13:27:09Z

backend/infrahub/core/query/diff.py

+    ORDER BY diff_rel.from ASC
+    WITH collect(diff_rel.status) AS statuses
+    RETURN statuses = ["active", "deleted"] AS intra_branch_update
+}


Could we have a comment explaining above CALL block?

added a comment

ajtmccarty

thanks for the review

ajtmccarty · 2025-05-12T17:53:40Z

backend/infrahub/core/query/diff.py

+    ORDER BY diff_rel.from ASC
+    WITH collect(diff_rel.status) AS statuses
+    RETURN statuses = ["active", "deleted"] AS intra_branch_update
+}


added a comment

ajtmccarty · 2025-05-12T17:54:24Z

backend/infrahub/core/diff/model/path.py

+            self.path_to_node.get("branch"),
+            self.path_to_attribute.get("branch"),
+            self.path_to_property.get("branch"),
+        )


actually, nothing uses this anymore, so just going to delete it

ajtmccarty · 2025-05-12T17:58:26Z

backend/infrahub/core/diff/calculator.py

+            )
+            log.info(f"Getting one batch of migrated kind nodes {limit=}, {offset=}")
+            await diff_query.execute(db=self.db)
+            log.info("Migrated kind nodes query complete")


I added the {limit=}, {offset=} to the query complete log message here and above in _run_diff_calculation_query

ajtmccarty · 2025-05-12T17:59:27Z

backend/infrahub/core/diff/calculator.py

+    ) -> None:
+        has_more_data = True
+        offset = 0
+        limit = int(config.SETTINGS.database.query_size_limit)


removed int() here and in one other place above for the same config parameter

ajtmccarty · 2025-05-12T18:03:11Z

backend/infrahub/core/diff/model/path.py

@@ -87,13 +87,13 @@ class NodeIdentifier:

    uuid: str
    kind: str
-    labels: frozenset[str]
+    db_id: str


I think this should be fine because we don't allow deleting nodes from the database on the default branch and the only way that a node would be deleted on a branch is if the whole branch was deleted, in which case, there would be no use for the diff
this is a good question though. I needed to think about it for a minute

github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label May 2, 2025

Base automatically changed from ajtm-04242025-migrated-kind-diff-bug to stable May 6, 2025 13:38

ajtmccarty force-pushed the ajtm-05022024-diff-kind-migrations branch from 8e81395 to 2e7329d Compare May 6, 2025 13:44

ajtmccarty added 5 commits May 6, 2025 10:54

fix bug in rel create for migrated kind nodes

46cdeb0

handle agnostic rels correctly

a93fafd

WIP on including migrated kinds in diff

36499e1

small changes to diff calculation queries for properties of nodes with

bf01fdb

migrated kind/inheritance

expand unit test for migrated kind diff... big time

8ece551

ajtmccarty force-pushed the ajtm-05022024-diff-kind-migrations branch from 2e7329d to 8ece551 Compare May 6, 2025 20:28

ajtmccarty changed the base branch from stable to ajtm-05062025-rel-create-migrated-kind May 6, 2025 20:29

ajtmccarty added 6 commits May 6, 2025 14:29

finish up giant unit test

12f9647

use db_id instead of labels, will need it for merge updates

0ec811c

move is_node_kind_migration outside of existing diff queries

895e989

format

d2bac2d

fix broken tests/query

127ec33

update DiffCombiner with identifiers

e7e8fb8

Base automatically changed from ajtm-05062025-rel-create-migrated-kind to stable May 8, 2025 06:08

ajtmccarty added 2 commits May 7, 2025 23:09

Merge branch 'stable' into ajtm-05022024-diff-kind-migrations

0ce777d

Merge branch 'ajtm-05082025-rel-query-updates' into ajtm-05022024-dif…

eebfb77

…f-kind-migrations

ajtmccarty changed the base branch from stable to ajtm-05082025-rel-query-updates May 9, 2025 16:13

ajtmccarty added 5 commits May 9, 2025 09:24

undo some changes in DiffQueryParser that are no longer needed

c8c37a8

comment placement fix

49beff0

test and support for merging branch with migrated kind node

e59d7ea

update unit test for is_node_kind_migration

8d129ba

add changelog

15c67bb

ajtmccarty commented May 9, 2025

View reviewed changes

remove commented out code

545fa09

ajtmccarty marked this pull request as ready for review May 9, 2025 22:18

ajtmccarty requested a review from a team as a code owner May 9, 2025 22:18

ajtmccarty changed the title ~~WIP on including migrated kinds in diff~~ handle migrated kind nodes in diff and merge logic May 9, 2025

handle migrated node that is later deleted

4638dea

ogenstad approved these changes May 12, 2025

View reviewed changes

LucasG0 reviewed May 12, 2025

View reviewed changes

LucasG0 approved these changes May 12, 2025

View reviewed changes

ajtmccarty added 3 commits May 12, 2025 10:52

add cypher comment

fe791b8

remove unused property

9e63513

update comment, remove unnecessary int() casting

e88d946

Base automatically changed from ajtm-05082025-rel-query-updates to stable May 12, 2025 20:47

ajtmccarty added 2 commits May 12, 2025 13:48

Merge branch 'stable' into ajtm-05022024-diff-kind-migrations

5a215ec

linting

b95380e

ajtmccarty commented May 12, 2025

View reviewed changes

ajtmccarty added 3 commits May 12, 2025 15:54

one more test

1688c1f

add migration to delete diffs b/c they must be recalculated

3edc685

actually include the migration

33a6f93

ajtmccarty merged commit 4afed39 into stable May 12, 2025
55 of 56 checks passed

ajtmccarty deleted the ajtm-05022024-diff-kind-migrations branch May 12, 2025 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle migrated kind nodes in diff and merge logic #6401

handle migrated kind nodes in diff and merge logic #6401

ajtmccarty commented May 2, 2025 •

edited

Loading

codspeed-hq bot commented May 2, 2025 •

edited

Loading

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ajtmccarty May 9, 2025

ogenstad left a comment

ogenstad May 12, 2025

ajtmccarty May 12, 2025

ogenstad May 12, 2025

ajtmccarty May 12, 2025

ogenstad May 12, 2025

ajtmccarty May 12, 2025

LucasG0 May 12, 2025

ajtmccarty May 12, 2025

ajtmccarty left a comment

ajtmccarty May 12, 2025

ajtmccarty May 12, 2025

ajtmccarty May 12, 2025

ajtmccarty May 12, 2025

ajtmccarty May 12, 2025

		self.add_to_query(query=query)


		class DiffMergeMigratedKindsQuery(Query):

		has_more_data: bool


		class DiffMigratedKindNodesQuery(DiffCalculationQuery):

handle migrated kind nodes in diff and merge logic #6401

handle migrated kind nodes in diff and merge logic #6401

Conversation

ajtmccarty commented May 2, 2025 • edited Loading

codspeed-hq bot commented May 2, 2025 • edited Loading

Merging #6401 will not alter performance

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogenstad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajtmccarty left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajtmccarty commented May 2, 2025 •

edited

Loading

codspeed-hq bot commented May 2, 2025 •

edited

Loading