-
Notifications
You must be signed in to change notification settings - Fork 170
Add affected, inserted, updated, deleted row to DatabricksAdapterResponse #883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…onse Signed-off-by: Malo Jaffré <[email protected]>
|
By the way, a much cleaner (and potentially more efficient) approach than the caching of Likewise, I don't believe this would work if the table is not Delta. It would not fail either, but it would be nice to avoid trying. I found no way in |
| return self._cursor.fetchall() | ||
|
|
||
| def fetchone(self) -> Optional[tuple]: | ||
| def query_impact(self) -> DatabricksQueryImpact: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little nit on the name. Maybe you can call it get_query_statistics - I'm not a native English speaker, but in other query engines the term statistics is used for this type of information.
If you like this name more, you should consider to replace impact with statistics everywhere for consistency.
|
Another great issue to tackle, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the feature. Can we add some unit tests?
|
|
||
|
|
||
| @dataclass | ||
| class DatabricksQueryImpact: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by the other commenter, 'statistics' are used elsewhere for a similar concept, so could be used here.
| def fetchone(self) -> Optional[Row]: | ||
| if self._cache_fetchone: | ||
| # If `fetchone` result was cached by `query_metadata`, return it and invalidate it | ||
| row = self._cache_fetchone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we invalidate the cache, but in the callsite reassign the cache to the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like you would just want:
def fetchone(self) -> Optional[Row]:
if not self._cache_fetchone:
self._cache_fetchone = self._cursor.fetchone()
return self._cache_fetchone|
@ghjklw I'll be on vacation next two weeks. Just wanted to give you heads up that I still want your two PRs, even though you won't see any activity for me until the new year. |
Thanks for the heads up and no worries! That will give me some time to polish these 😊 If you have a couple of minutes, just to circle back to my message above, do you know if there's any chance within |
I don't think there is any metadata that conveys this client-side, short of inspecting the SQL. Do you have any idea of the overhead of performing this for every operation? If it's small enough, we can ignore; if it's large, we probably want to give a config hook to opt into the behavior, since I don't think most users care about the data, though for some scenarios it is very nice to have. It might be possible to get the information from the cursor, but the type information in the databricks python sql connector (https://github.com/databricks/databricks-sql-python) is a little sparse. |
|
Just a short update to let you know that I'm not forgetting this (and the other PR)! Things have just been quite busy lately, but I absolutely intend to pick it up. In the meantime, if anyone feel like this is moving too slowly, feel free to take over ;) |
Thanks for the update. Things have been quite busy on my side as well, so I am content to wait until you free up :). Appreciate the effort and the communication. |
|
Hi! I have had this PR in the back of my head for quite some time, and the thing is that I really don't like my approach using "caching", I feel like there are many ways this can break fetchone/many/all. This PR also needs some pretty serious rebasing as the implementation of the connection/cursor has changed significantly since. I'm therefore thinking of creating an entirely new PR with a new approach: treating the cursor as an iterator, and using
The point is to make sure that fetching the statistics doesn't impact the cursor used by fetch*. I have done some research and there is unfortunately no way to reset it. I'd love to hear your thoughts before doing something that you wouldn't be comfortable with 😉 |
|
@ghjklw At this point I would recommend storing the stats on the CursorWrapper object. Since we don't execute async, I'm reasonably certain that at the point we create the cursor wrapper, we have completed execution but not fetched any results. Could we just store the stats in this wrapper as part of CursorWrapper construction? Do we even need to use |
|
Any updates on this? |
Resolves #351
Description
Try to parse the result of the databricks query to extract the following variables and add them to the
DatabricksAdapterResponse:num_affected_rowsnum_inserted_rowsnum_deleted_rowsnum_updated_rowsI have only quickly tested it and have been able to retrieve results like this in
run_results.json:I have not yet written any test for it as I would first like to get your feedback on the approach used. It also certainly requires some more testing to make sure that it returns these metrics consistently and did not have a side effect on other methods like
fetchone,fetchmanyandfetchall(the last 2 in particular are likely to have been impacted).Checklist
CHANGELOG.mdand added information about my change to the "dbt-databricks next" section.