Skip to content

Commit 3ee60a0

Browse files
Merge branch 'main' into polars_semi
2 parents 9d17286 + 7a83224 commit 3ee60a0

File tree

39 files changed

+2208
-281
lines changed

39 files changed

+2208
-281
lines changed

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,31 @@
44

55
[1]: https://pypi.org/project/bigframes/#history
66

7+
## [2.8.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.7.0...v2.8.0) (2025-06-23)
8+
9+
10+
### ⚠ BREAKING CHANGES
11+
12+
* add required param 'engine' to multimodal functions ([#1834](https://github.com/googleapis/python-bigquery-dataframes/issues/1834))
13+
14+
### Features
15+
16+
* Add `bpd.options.compute.maximum_result_rows` option to limit client data download ([#1829](https://github.com/googleapis/python-bigquery-dataframes/issues/1829)) ([e22a3f6](https://github.com/googleapis/python-bigquery-dataframes/commit/e22a3f61a02cc1b7a5155556e5a07a1a2fea1d82))
17+
* Add `bpd.options.display.repr_mode = "anywidget"` to create an interactive display of the results ([#1820](https://github.com/googleapis/python-bigquery-dataframes/issues/1820)) ([be0a3cf](https://github.com/googleapis/python-bigquery-dataframes/commit/be0a3cf7711dadc68d8366ea90b99855773e2a2e))
18+
* Add DataFrame.ai.forecast() support ([#1828](https://github.com/googleapis/python-bigquery-dataframes/issues/1828)) ([7bc7f36](https://github.com/googleapis/python-bigquery-dataframes/commit/7bc7f36fc20d233f4cf5ed688cc5dcaf100ce4fb))
19+
* Add describe() method to Series ([#1827](https://github.com/googleapis/python-bigquery-dataframes/issues/1827)) ([a4205f8](https://github.com/googleapis/python-bigquery-dataframes/commit/a4205f882012820c034cb15d73b2768ec4ad3ac8))
20+
* Add required param 'engine' to multimodal functions ([#1834](https://github.com/googleapis/python-bigquery-dataframes/issues/1834)) ([37666e4](https://github.com/googleapis/python-bigquery-dataframes/commit/37666e4c137d52c28ab13477dfbcc6e92b913334))
21+
22+
23+
### Performance Improvements
24+
25+
* Produce simpler sql ([#1836](https://github.com/googleapis/python-bigquery-dataframes/issues/1836)) ([cf9c22a](https://github.com/googleapis/python-bigquery-dataframes/commit/cf9c22a09c4e668a598fa1dad0f6a07b59bc6524))
26+
27+
28+
### Documentation
29+
30+
* Add ai.forecast notebook ([#1840](https://github.com/googleapis/python-bigquery-dataframes/issues/1840)) ([2430497](https://github.com/googleapis/python-bigquery-dataframes/commit/24304972fdbdfd12c25c7f4ef5a7b280f334801a))
31+
732
## [2.7.0](https://github.com/googleapis/python-bigquery-dataframes/compare/v2.6.0...v2.7.0) (2025-06-16)
833

934

bigframes/_config/compute_options.py

Lines changed: 39 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -55,29 +55,7 @@ class ComputeOptions:
5555
{'test2': 'abc', 'test3': False}
5656
5757
Attributes:
58-
maximum_bytes_billed (int, Options):
59-
Limits the bytes billed for query jobs. Queries that will have
60-
bytes billed beyond this limit will fail (without incurring a
61-
charge). If unspecified, this will be set to your project default.
62-
See `maximum_bytes_billed`: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_maximum_bytes_billed.
63-
64-
enable_multi_query_execution (bool, Options):
65-
If enabled, large queries may be factored into multiple smaller queries
66-
in order to avoid generating queries that are too complex for the query
67-
engine to handle. However this comes at the cost of increase cost and latency.
68-
69-
extra_query_labels (Dict[str, Any], Options):
70-
Stores additional custom labels for query configuration.
71-
72-
semantic_ops_confirmation_threshold (int, optional):
73-
.. deprecated:: 1.42.0
74-
Semantic operators are deprecated. Please use AI operators instead
75-
76-
semantic_ops_threshold_autofail (bool):
77-
.. deprecated:: 1.42.0
78-
Semantic operators are deprecated. Please use AI operators instead
79-
80-
ai_ops_confirmation_threshold (int, optional):
58+
ai_ops_confirmation_threshold (int | None):
8159
Guards against unexpected processing of large amount of rows by semantic operators.
8260
If the number of rows exceeds the threshold, the user will be asked to confirm
8361
their operations to resume. The default value is 0. Set the value to None
@@ -87,26 +65,57 @@ class ComputeOptions:
8765
Guards against unexpected processing of large amount of rows by semantic operators.
8866
When set to True, the operation automatically fails without asking for user inputs.
8967
90-
allow_large_results (bool):
68+
allow_large_results (bool | None):
9169
Specifies whether query results can exceed 10 GB. Defaults to False. Setting this
9270
to False (the default) restricts results to 10 GB for potentially faster execution;
9371
BigQuery will raise an error if this limit is exceeded. Setting to True removes
9472
this result size limit.
73+
74+
enable_multi_query_execution (bool | None):
75+
If enabled, large queries may be factored into multiple smaller queries
76+
in order to avoid generating queries that are too complex for the query
77+
engine to handle. However this comes at the cost of increase cost and latency.
78+
79+
extra_query_labels (Dict[str, Any] | None):
80+
Stores additional custom labels for query configuration.
81+
82+
maximum_bytes_billed (int | None):
83+
Limits the bytes billed for query jobs. Queries that will have
84+
bytes billed beyond this limit will fail (without incurring a
85+
charge). If unspecified, this will be set to your project default.
86+
See `maximum_bytes_billed`: https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_maximum_bytes_billed.
87+
88+
maximum_result_rows (int | None):
89+
Limits the number of rows in an execution result. When converting
90+
a BigQuery DataFrames object to a pandas DataFrame or Series (e.g.,
91+
using ``.to_pandas()``, ``.peek()``, ``.__repr__()``, direct
92+
iteration), the data is downloaded from BigQuery to the client
93+
machine. This option restricts the number of rows that can be
94+
downloaded. If the number of rows to be downloaded exceeds this
95+
limit, a ``bigframes.exceptions.MaximumResultRowsExceeded``
96+
exception is raised.
97+
98+
semantic_ops_confirmation_threshold (int | None):
99+
.. deprecated:: 1.42.0
100+
Semantic operators are deprecated. Please use AI operators instead
101+
102+
semantic_ops_threshold_autofail (bool):
103+
.. deprecated:: 1.42.0
104+
Semantic operators are deprecated. Please use AI operators instead
95105
"""
96106

97-
maximum_bytes_billed: Optional[int] = None
107+
ai_ops_confirmation_threshold: Optional[int] = 0
108+
ai_ops_threshold_autofail: bool = False
109+
allow_large_results: Optional[bool] = None
98110
enable_multi_query_execution: bool = False
99111
extra_query_labels: Dict[str, Any] = dataclasses.field(
100112
default_factory=dict, init=False
101113
)
114+
maximum_bytes_billed: Optional[int] = None
115+
maximum_result_rows: Optional[int] = None
102116
semantic_ops_confirmation_threshold: Optional[int] = 0
103117
semantic_ops_threshold_autofail = False
104118

105-
ai_ops_confirmation_threshold: Optional[int] = 0
106-
ai_ops_threshold_autofail: bool = False
107-
108-
allow_large_results: Optional[bool] = None
109-
110119
def assign_extra_query_labels(self, **kwargs: Any) -> None:
111120
"""
112121
Assigns additional custom labels for query configuration. The method updates the

bigframes/core/compile/compiler.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ def compile_sql(request: configs.CompileRequest) -> configs.CompileResult:
6565
ordering: Optional[bf_ordering.RowOrdering] = result_node.order_by
6666
result_node = dataclasses.replace(result_node, order_by=None)
6767
result_node = cast(nodes.ResultNode, rewrites.column_pruning(result_node))
68+
result_node = cast(nodes.ResultNode, rewrites.defer_selection(result_node))
6869
sql = compile_result_node(result_node)
6970
# Return the ordering iff no extra columns are needed to define the row order
7071
if ordering is not None:

bigframes/core/compile/googlesql/query.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ def sql(self) -> str:
125125
return "\n".join(text)
126126

127127

128-
@dataclasses.dataclass
128+
@dataclasses.dataclass(frozen=True)
129129
class SelectExpression(abc.SQLSyntax):
130130
"""This class represents `select_expression`."""
131131

bigframes/core/compile/sqlglot/compiler.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,9 @@ def _compile_sql(self, request: configs.CompileRequest) -> configs.CompileResult
8787
nodes.ResultNode, rewrite.column_pruning(result_node)
8888
)
8989
result_node = self._remap_variables(result_node)
90+
result_node = typing.cast(
91+
nodes.ResultNode, rewrite.defer_selection(result_node)
92+
)
9093
sql = self._compile_result_node(result_node)
9194
return configs.CompileResult(
9295
sql, result_node.schema.to_bigquery(), result_node.order_by
@@ -97,6 +100,9 @@ def _compile_sql(self, request: configs.CompileRequest) -> configs.CompileResult
97100
result_node = typing.cast(nodes.ResultNode, rewrite.column_pruning(result_node))
98101

99102
result_node = self._remap_variables(result_node)
103+
result_node = typing.cast(
104+
nodes.ResultNode, rewrite.defer_selection(result_node)
105+
)
100106
sql = self._compile_result_node(result_node)
101107
# Return the ordering iff no extra columns are needed to define the row order
102108
if ordering is not None:
@@ -205,6 +211,13 @@ def compile_projection(
205211
)
206212
return child.project(projected_cols)
207213

214+
@_compile_node.register
215+
def compile_filter(
216+
self, node: nodes.FilterNode, child: ir.SQLGlotIR
217+
) -> ir.SQLGlotIR:
218+
condition = scalar_compiler.compile_scalar_expression(node.predicate)
219+
return child.filter(condition)
220+
208221
@_compile_node.register
209222
def compile_concat(
210223
self, node: nodes.ConcatNode, *children: ir.SQLGlotIR

bigframes/core/compile/sqlglot/scalar_compiler.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,3 +99,10 @@ def compile_addop(op: ops.AddOp, left: TypedExpr, right: TypedExpr) -> sge.Expre
9999

100100
# Numerical addition
101101
return sge.Add(this=left.expr, expression=right.expr)
102+
103+
104+
def compile_ge(
105+
op: ops.ge_op, left: TypedExpr, right: TypedExpr # type: ignore[valid-type]
106+
) -> sge.Expression:
107+
108+
return sge.GTE(this=left.expr, expression=right.expr)

bigframes/core/compile/sqlglot/sqlglot_ir.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,16 @@ def project(
250250
new_expr = self._encapsulate_as_cte().select(*projected_cols_expr, append=True)
251251
return SQLGlotIR(expr=new_expr, uid_gen=self.uid_gen)
252252

253+
def filter(
254+
self,
255+
condition: sge.Expression,
256+
) -> SQLGlotIR:
257+
"""Filters the query with the given condition."""
258+
new_expr = self._encapsulate_as_cte()
259+
return SQLGlotIR(
260+
expr=new_expr.where(condition, append=False), uid_gen=self.uid_gen
261+
)
262+
253263
def insert(
254264
self,
255265
destination: bigquery.TableReference,

bigframes/core/nodes.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ def additive_base(self) -> BigFrameNode:
7575
...
7676

7777
@abc.abstractmethod
78-
def replace_additive_base(self, BigFrameNode):
78+
def replace_additive_base(self, BigFrameNode) -> BigFrameNode:
7979
...
8080

8181

@@ -1568,6 +1568,10 @@ class ExplodeNode(UnaryNode):
15681568
# Offsets are generated only if this is non-null
15691569
offsets_col: Optional[identifiers.ColumnId] = None
15701570

1571+
def _validate(self):
1572+
for col in self.column_ids:
1573+
assert col.id in self.child.ids
1574+
15711575
@property
15721576
def row_preserving(self) -> bool:
15731577
return False
@@ -1646,6 +1650,10 @@ class ResultNode(UnaryNode):
16461650
limit: Optional[int] = None
16471651
# TODO: CTE definitions
16481652

1653+
def _validate(self):
1654+
for ref, name in self.output_cols:
1655+
assert ref.id in self.child.ids
1656+
16491657
@property
16501658
def node_defined_ids(self) -> Tuple[identifiers.ColumnId, ...]:
16511659
return ()

bigframes/core/rewrite/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
try_reduce_to_local_scan,
2323
try_reduce_to_table_scan,
2424
)
25+
from bigframes.core.rewrite.select_pullup import defer_selection
2526
from bigframes.core.rewrite.slices import pull_out_limit, pull_up_limits, rewrite_slice
2627
from bigframes.core.rewrite.timedeltas import rewrite_timedelta_expressions
2728
from bigframes.core.rewrite.windows import pull_out_window_order, rewrite_range_rolling
@@ -42,4 +43,5 @@
4243
"try_reduce_to_local_scan",
4344
"fold_row_counts",
4445
"pull_out_window_order",
46+
"defer_selection",
4547
]
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import dataclasses
16+
from typing import cast
17+
18+
from bigframes.core import expression, nodes
19+
20+
21+
def defer_selection(
22+
root: nodes.BigFrameNode,
23+
) -> nodes.BigFrameNode:
24+
"""
25+
Defers SelectionNode operations in the tree, pulling them up.
26+
27+
In many cases, these nodes will be merged or eliminated entirely, simplifying the overall tree.
28+
"""
29+
return nodes.bottom_up(root, pull_up_select)
30+
31+
32+
def pull_up_select(node: nodes.BigFrameNode) -> nodes.BigFrameNode:
33+
if isinstance(node, nodes.LeafNode):
34+
return node
35+
if isinstance(node, nodes.JoinNode):
36+
return pull_up_selects_under_join(node)
37+
if isinstance(node, nodes.ConcatNode):
38+
return handle_selects_under_concat(node)
39+
if isinstance(node, nodes.UnaryNode):
40+
return pull_up_select_unary(node)
41+
# shouldn't hit this, but not worth crashing over
42+
return node
43+
44+
45+
def pull_up_select_unary(node: nodes.UnaryNode) -> nodes.BigFrameNode:
46+
child = node.child
47+
if not isinstance(child, nodes.SelectionNode):
48+
return node
49+
50+
# Schema-preserving nodes
51+
if isinstance(
52+
node,
53+
(
54+
nodes.ReversedNode,
55+
nodes.OrderByNode,
56+
nodes.SliceNode,
57+
nodes.FilterNode,
58+
nodes.RandomSampleNode,
59+
),
60+
):
61+
pushed_down_node: nodes.BigFrameNode = node.remap_refs(
62+
{id: ref.id for ref, id in child.input_output_pairs}
63+
).replace_child(child.child)
64+
pulled_up_select = cast(
65+
nodes.SelectionNode, child.replace_child(pushed_down_node)
66+
)
67+
return pulled_up_select
68+
elif isinstance(
69+
node,
70+
(
71+
nodes.SelectionNode,
72+
nodes.ResultNode,
73+
),
74+
):
75+
return node.remap_refs(
76+
{id: ref.id for ref, id in child.input_output_pairs}
77+
).replace_child(child.child)
78+
elif isinstance(node, nodes.AggregateNode):
79+
pushed_down_agg = node.remap_refs(
80+
{id: ref.id for ref, id in child.input_output_pairs}
81+
).replace_child(child.child)
82+
new_selection = tuple(
83+
nodes.AliasedRef.identity(id).remap_refs(
84+
{id: ref.id for ref, id in child.input_output_pairs}
85+
)
86+
for id in node.ids
87+
)
88+
return nodes.SelectionNode(pushed_down_agg, new_selection)
89+
elif isinstance(node, nodes.ExplodeNode):
90+
pushed_down_node = node.remap_refs(
91+
{id: ref.id for ref, id in child.input_output_pairs}
92+
).replace_child(child.child)
93+
pulled_up_select = cast(
94+
nodes.SelectionNode, child.replace_child(pushed_down_node)
95+
)
96+
if node.offsets_col:
97+
pulled_up_select = dataclasses.replace(
98+
pulled_up_select,
99+
input_output_pairs=(
100+
*pulled_up_select.input_output_pairs,
101+
nodes.AliasedRef(
102+
expression.DerefOp(node.offsets_col), node.offsets_col
103+
),
104+
),
105+
)
106+
return pulled_up_select
107+
elif isinstance(node, nodes.AdditiveNode):
108+
pushed_down_node = node.replace_additive_base(child.child).remap_refs(
109+
{id: ref.id for ref, id in child.input_output_pairs}
110+
)
111+
new_selection = (
112+
*child.input_output_pairs,
113+
*(
114+
nodes.AliasedRef(expression.DerefOp(col.id), col.id)
115+
for col in node.added_fields
116+
),
117+
)
118+
pulled_up_select = dataclasses.replace(
119+
child, child=pushed_down_node, input_output_pairs=new_selection
120+
)
121+
return pulled_up_select
122+
# shouldn't hit this, but not worth crashing over
123+
return node
124+
125+
126+
def pull_up_selects_under_join(node: nodes.JoinNode) -> nodes.JoinNode:
127+
# Can in theory pull up selects here, but it is a bit dangerous, in particular or self-joins, when there are more transforms to do.
128+
# TODO: Safely pull up selects above join
129+
return node
130+
131+
132+
def handle_selects_under_concat(node: nodes.ConcatNode) -> nodes.ConcatNode:
133+
new_children = []
134+
for child in node.child_nodes:
135+
# remove select if no-op
136+
if not isinstance(child, nodes.SelectionNode):
137+
new_children.append(child)
138+
else:
139+
inputs = (ref.id for ref in child.input_output_pairs)
140+
if inputs == tuple(child.child.ids):
141+
new_children.append(child.child)
142+
else:
143+
new_children.append(child)
144+
return dataclasses.replace(node, children=tuple(new_children))

0 commit comments

Comments
 (0)