-
Notifications
You must be signed in to change notification settings - Fork 25
patching command framework infrahub db patch plan/apply/revert
#6311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance ReportMerging #6311 will not alter performanceComparing Summary
|
603e562
to
7e66f61
Compare
infrahub db patch plan/apply/revert
backend/infrahub/patch/edge_adder.py
Outdated
for e_to_add in edges_to_add: | ||
all_prop_keys |= set(e_to_add.after_props.keys()) | ||
|
||
cypher_variable_map = "{" + ",".join([f"{p}: edge_to_add.after_props.{p}" for p in all_prop_keys]) + "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is necessary so that we can correctly use MERGE
in the cypher query below to avoid adding duplicate edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a comment in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upon further review, this workaround is no longer necessary now that the Adder
components use CREATE
instead of MERGE
in their cypher queries. So I've deleted it from here and the PatchPlanVertexAdder
abstract_id = result.get("abstract_id") | ||
concrete_id = result.get("db_id") | ||
abstract_to_concrete_id_map[abstract_id] = concrete_id | ||
return abstract_to_concrete_id_map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this dictionary is to track the database IDs of edges added as part of this patch so that we know which edges to remove if the patch is reverted and can track which edges have been added already if the patch fails partway through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment could be part of the function docstring contains so the reader knows what/why dict[str, str]
is returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion. Added docstrings for PatchPlanEdgeAdder.execute
and PatchPlanVertexAdder.execute
backend/infrahub/patch/models.py
Outdated
edges_to_add: list[EdgeToAdd] = field(default_factory=list) | ||
edges_to_update: list[EdgeToUpdate] = field(default_factory=list) | ||
edges_to_delete: list[EdgeToDelete] = field(default_factory=list) | ||
added_node_db_id_map: dict[str, str] = field(default_factory=dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a dictionary for tracking the database IDs of added vertices/edges in case the patch is reverted and the entities must be deleted
from ..models import PatchPlan | ||
|
||
|
||
class PatchQuery(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the class that we must implement when creating a new patch. The actual work will be in the plan
method that must return a PatchPlan
instance to record exactly what changes to make and how to reverse them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this class here b/c it seems like the kind of thing that might be useful for other tests
backend/tests/db_snapshot.py
Outdated
db_id: str | ||
from_db_id: str | ||
to_db_id: str | ||
edge_type: set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should edge_type
be str
here? It makes me wonder, is mypy enabled on these test files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typing is disabled on all of the other directories under backend.tests
, but not this file, so I'm not sure why mypy didn't catch it. I tried deleting all the [[tool.mypy.overrides]]
entries for backend.tests
paths and running mypy backend/tests/db_snapshot.py
still didn't pick anything up. either way, I fixed this type hint. good catch
await patch_runner.revert(patch_plan_directory=patch_plan_dir) | ||
rprint(f"{SUCCESS_BADGE} Patch plan successfully reverted") | ||
|
||
await db.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to have a plan_and_apply_patch
command to make it simpler to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion. I added an optional --apply
flag to the infrahub db patch plan
command, which seems more efficient than adding another command that combines them
backend/infrahub/patch/models.py
Outdated
@dataclass | ||
class VertexToAdd: | ||
labels: list[str] | ||
after_props: dict[str, str | int | bool | None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an alias like PropertyType
could be used instead of multiple str | int | bool | None
. I am wondering if float
should be there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this change and included float
backend/infrahub/cli/patch.py
Outdated
plan_reader = PatchPlanReader() | ||
return PatchRunner( | ||
plan_writer=plan_writer, | ||
plan_reader=plan_reader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really an issue but it would feel more consistent to not assign the plan_writer and plan_reader variables before the return as we don't use them and don't assign anything else in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. made this change
backend/infrahub/cli/patch.py
Outdated
for _, cls in inspect.getmembers(patch_module, inspect.isclass): | ||
if issubclass(cls, PatchQuery) and cls is not PatchQuery: | ||
patch_query_class = cls | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always be from files that we control? De we need to raise an error if there are two classes inheriting from PatchQuery
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. added handling for this case and improved the error message
logging.getLogger("infrahub").setLevel(logging.WARNING) | ||
logging.getLogger("neo4j").setLevel(logging.ERROR) | ||
logging.getLogger("prefect").setLevel(logging.ERROR) | ||
config.load_and_exit(config_file_name=config_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps only highlighted by this PR but we have this same setup in a number of the cli commands, might be that we should use a function for these lines or look at what we're doing in infrahubctl with CONFIG_PARAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sound like a good idea. I'm not going to do it right now
backend/infrahub/patch/edge_adder.py
Outdated
for result in results: | ||
abstract_id = result.get("abstract_id") | ||
concrete_id = result.get("db_id") | ||
abstract_to_concrete_id_map[abstract_id] = concrete_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of .get() here will it always return a value or can abstract_id ever be None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should always have a value b/c the EdgeToAdd
object has a required identifier
field that uses a string UUID by default
async def execute(self, edges_to_delete: list[EdgeToDelete]) -> AsyncGenerator[set[str], None]: | ||
for i in range(0, len(edges_to_delete), self.batch_size_limit): | ||
edges_slice = edges_to_delete[i : i + self.batch_size_limit] | ||
ids_to_delete = [e.db_id for e in edges_slice] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to avoid creating the extra list and just have:
ids_to_delete = [e.db_id for e in edges_to_delete[i : i + self.batch_size_limit]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. made the change here and in PatchPlanVertexDeleter
"cypher_variable_map": cypher_variable_map, | ||
"id_func_name": self.db.get_id_function_name(), | ||
} | ||
results = await self.db.execute_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this query is executed with 100 vertices to add, if for some reason it fails while only 60 vertices were created, would this function raise an error, implying these 60 created vertices ids would not later be added to the list of created vertices? If so, running this query in a transaction that would be rollback in case of failure would fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good question. thanks for reading through and really understanding what is going on here. I think that you are right and a transaction will be helpful here, so I've added it (and in the same place in PatchPlanEdgeAdder
). I think there is still a possible failure point where the data written to the database will not match what we write to the file (say if the commit part of the transaction fails), but that failure space is much smaller now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the transaction commit fails, my understanding is that an error would be raised so nothing would be written to the file?
backend/infrahub/patch/runner.py
Outdated
await self._revert_added_edges(patch_plan=patch_plan, patch_plan_directory=patch_plan_directory) | ||
await self._revert_added_vertices(patch_plan=patch_plan, patch_plan_directory=patch_plan_directory) | ||
vertices_to_update = [] | ||
for vertex_update_to_revert in patch_plan.vertices_to_update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list comprehension could be used here, and on similar patterns below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion. I've made the change and it is more readable this way
Nice one! |
new patching related commands and framework that allow applying and reverting patches in an idempotent manner. these are the new commands
infrahub db patch plan
takes in a python-style path (infrahub.patches.this_patch
) and writes a "patch plan" to the given directory (defaultinfrahub-patches
). the plan is a series of files that just save the exact changes to be made as JSONinfrahub db patch apply
takes in a path to a directory where a patch plan exists and runs the patch plan against the database. the patch plan is run using idempotent code that ensures no duplicate nodes/edges are created. this command also updates some of the files in the patch plan directory with the IDs of created/deleted nodes so that they can be used in a revert and so that we know what has been added/deleted in case the command fails partway throughinfrahub db patch revert
takes in a path to a directory where a patch plan exists and reverts the database to its pre-patch state. this code is also idempotent. this command updates the same files as theapply
command, removing the previously added/deleted elements as they are un-added and un-deletedfollow-up PR(s) will have code for actual patches for removing duplicated edges and nodes
PatchQuery
class from the next PR