Skip to content

patching command framework infrahub db patch plan/apply/revert #6311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 24, 2025

Conversation

ajtmccarty
Copy link
Contributor

@ajtmccarty ajtmccarty commented Apr 16, 2025

new patching related commands and framework that allow applying and reverting patches in an idempotent manner. these are the new commands

  • infrahub db patch plan takes in a python-style path (infrahub.patches.this_patch) and writes a "patch plan" to the given directory (default infrahub-patches). the plan is a series of files that just save the exact changes to be made as JSON
  • infrahub db patch apply takes in a path to a directory where a patch plan exists and runs the patch plan against the database. the patch plan is run using idempotent code that ensures no duplicate nodes/edges are created. this command also updates some of the files in the patch plan directory with the IDs of created/deleted nodes so that they can be used in a revert and so that we know what has been added/deleted in case the command fails partway through
  • infrahub db patch revert takes in a path to a directory where a patch plan exists and reverts the database to its pre-patch state. this code is also idempotent. this command updates the same files as the apply command, removing the previously added/deleted elements as they are un-added and un-deleted

follow-up PR(s) will have code for actual patches for removing duplicated edges and nodes

@github-actions github-actions bot added the group/backend Issue related to the backend (API Server, Git Agent) label Apr 16, 2025
Copy link

codspeed-hq bot commented Apr 18, 2025

CodSpeed Performance Report

Merging #6311 will not alter performance

Comparing ajtm-04152025-duplication-patch (6261a39) with stable (a987255)

Summary

✅ 10 untouched benchmarks

@ajtmccarty ajtmccarty force-pushed the ajtm-04152025-duplication-patch branch from 603e562 to 7e66f61 Compare April 21, 2025 20:32
@github-actions github-actions bot added the type/documentation Improvements or additions to documentation label Apr 21, 2025
@ajtmccarty ajtmccarty changed the title WIP patch for removing dup kind-migrated nodes following a merge patching command framework infrahub db patch plan/apply/revert Apr 21, 2025
for e_to_add in edges_to_add:
all_prop_keys |= set(e_to_add.after_props.keys())

cypher_variable_map = "{" + ",".join([f"{p}: edge_to_add.after_props.{p}" for p in all_prop_keys]) + "}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is necessary so that we can correctly use MERGE in the cypher query below to avoid adding duplicate edges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a comment in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upon further review, this workaround is no longer necessary now that the Adder components use CREATE instead of MERGE in their cypher queries. So I've deleted it from here and the PatchPlanVertexAdder

abstract_id = result.get("abstract_id")
concrete_id = result.get("db_id")
abstract_to_concrete_id_map[abstract_id] = concrete_id
return abstract_to_concrete_id_map
Copy link
Contributor Author

@ajtmccarty ajtmccarty Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dictionary is to track the database IDs of edges added as part of this patch so that we know which edges to remove if the patch is reverted and can track which edges have been added already if the patch fails partway through

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could be part of the function docstring contains so the reader knows what/why dict[str, str] is returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. Added docstrings for PatchPlanEdgeAdder.execute and PatchPlanVertexAdder.execute

edges_to_add: list[EdgeToAdd] = field(default_factory=list)
edges_to_update: list[EdgeToUpdate] = field(default_factory=list)
edges_to_delete: list[EdgeToDelete] = field(default_factory=list)
added_node_db_id_map: dict[str, str] = field(default_factory=dict)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a dictionary for tracking the database IDs of added vertices/edges in case the patch is reverted and the entities must be deleted

from ..models import PatchPlan


class PatchQuery(ABC):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the class that we must implement when creating a new patch. The actual work will be in the plan method that must return a PatchPlan instance to record exactly what changes to make and how to reverse them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this class here b/c it seems like the kind of thing that might be useful for other tests

@ajtmccarty ajtmccarty marked this pull request as ready for review April 21, 2025 22:16
@ajtmccarty ajtmccarty requested review from a team as code owners April 21, 2025 22:16
@ajtmccarty ajtmccarty marked this pull request as draft April 22, 2025 00:21
@ajtmccarty ajtmccarty marked this pull request as ready for review April 22, 2025 23:02
db_id: str
from_db_id: str
to_db_id: str
edge_type: set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should edge_type be str here? It makes me wonder, is mypy enabled on these test files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing is disabled on all of the other directories under backend.tests, but not this file, so I'm not sure why mypy didn't catch it. I tried deleting all the [[tool.mypy.overrides]] entries for backend.tests paths and running mypy backend/tests/db_snapshot.py still didn't pick anything up. either way, I fixed this type hint. good catch

await patch_runner.revert(patch_plan_directory=patch_plan_dir)
rprint(f"{SUCCESS_BADGE} Patch plan successfully reverted")

await db.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice to have a plan_and_apply_patch command to make it simpler to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. I added an optional --apply flag to the infrahub db patch plan command, which seems more efficient than adding another command that combines them

@dataclass
class VertexToAdd:
labels: list[str]
after_props: dict[str, str | int | bool | None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an alias like PropertyType could be used instead of multiple str | int | bool | None. I am wondering if float should be there too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this change and included float

plan_reader = PatchPlanReader()
return PatchRunner(
plan_writer=plan_writer,
plan_reader=plan_reader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really an issue but it would feel more consistent to not assign the plan_writer and plan_reader variables before the return as we don't use them and don't assign anything else in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. made this change

for _, cls in inspect.getmembers(patch_module, inspect.isclass):
if issubclass(cls, PatchQuery) and cls is not PatchQuery:
patch_query_class = cls
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this always be from files that we control? De we need to raise an error if there are two classes inheriting from PatchQuery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. added handling for this case and improved the error message

logging.getLogger("infrahub").setLevel(logging.WARNING)
logging.getLogger("neo4j").setLevel(logging.ERROR)
logging.getLogger("prefect").setLevel(logging.ERROR)
config.load_and_exit(config_file_name=config_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps only highlighted by this PR but we have this same setup in a number of the cli commands, might be that we should use a function for these lines or look at what we're doing in infrahubctl with CONFIG_PARAM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sound like a good idea. I'm not going to do it right now

for result in results:
abstract_id = result.get("abstract_id")
concrete_id = result.get("db_id")
abstract_to_concrete_id_map[abstract_id] = concrete_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of .get() here will it always return a value or can abstract_id ever be None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should always have a value b/c the EdgeToAdd object has a required identifier field that uses a string UUID by default

async def execute(self, edges_to_delete: list[EdgeToDelete]) -> AsyncGenerator[set[str], None]:
for i in range(0, len(edges_to_delete), self.batch_size_limit):
edges_slice = edges_to_delete[i : i + self.batch_size_limit]
ids_to_delete = [e.db_id for e in edges_slice]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to avoid creating the extra list and just have:

ids_to_delete = [e.db_id for e in edges_to_delete[i : i + self.batch_size_limit]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. made the change here and in PatchPlanVertexDeleter

"cypher_variable_map": cypher_variable_map,
"id_func_name": self.db.get_id_function_name(),
}
results = await self.db.execute_query(
Copy link
Contributor

@LucasG0 LucasG0 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this query is executed with 100 vertices to add, if for some reason it fails while only 60 vertices were created, would this function raise an error, implying these 60 created vertices ids would not later be added to the list of created vertices? If so, running this query in a transaction that would be rollback in case of failure would fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good question. thanks for reading through and really understanding what is going on here. I think that you are right and a transaction will be helpful here, so I've added it (and in the same place in PatchPlanEdgeAdder). I think there is still a possible failure point where the data written to the database will not match what we write to the file (say if the commit part of the transaction fails), but that failure space is much smaller now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the transaction commit fails, my understanding is that an error would be raised so nothing would be written to the file?

await self._revert_added_edges(patch_plan=patch_plan, patch_plan_directory=patch_plan_directory)
await self._revert_added_vertices(patch_plan=patch_plan, patch_plan_directory=patch_plan_directory)
vertices_to_update = []
for vertex_update_to_revert in patch_plan.vertices_to_update:
Copy link
Contributor

@LucasG0 LucasG0 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list comprehension could be used here, and on similar patterns below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. I've made the change and it is more readable this way

@LucasG0
Copy link
Contributor

LucasG0 commented Apr 24, 2025

Nice one!

@ajtmccarty ajtmccarty merged commit debb8a9 into stable Apr 24, 2025
35 checks passed
@ajtmccarty ajtmccarty deleted the ajtm-04152025-duplication-patch branch April 24, 2025 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
group/backend Issue related to the backend (API Server, Git Agent) type/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants