Skip to content

Rolling Window Calculations with Multiple Dimensions #9353

Open
@kraft6

Description

@kraft6

Describe the bug
Topic: Rolling Window Calculations with Multiple Dimensions
Product Version:
• Cube: 1.1.16
• Source: Trino/Starburst
Configuration:
• CUBESQL_SQL_PUSH_DOWN=false
• CUBEJS_ALLOW_UNGROUPED_WITHOUT_PRIMARY_KEY=true

I am experiencing an issue with rolling window calculations when using multiple dimensions. Here is the relevant configuration:

Rolling month calculations

- name: current_month_sum
  sql: totalprice
  type: sum
  rolling_window:
    trailing: 1 month
    offset: end

- name: previous_month_sum
  sql: totalprice
  type: sum
  rolling_window:
    trailing: 1 month
    offset: start

- name: month_over_month_ratio
  sql: "{total_price} / {order_cost}"

The dynamically generated query executed in Trino/Starburst uses an "INNER JOIN" between the current and previous month results:

(select …) as q_0
INNER JOIN 
(select …) as q_1 
ON (q_0."orders__clerk" = q_1."m__orders__clerk" OR 
    (q_0."m__orders__clerk" IS NULL AND q_1."m__orders__clerk" IS NULL))
AND 
(q_0."m__orders__orderdate_year" = q_1."m__orders__orderdate_year" OR 
    (q_0."m__orders__orderdate_year" IS NULL AND q_1."m__orders__orderdate_year" IS NULL))

The use of "INNER JOIN" results in output only when there is data for both the current and previous months. This becomes particularly problematic when additional dimensions are added, as it restricts the results to cases where data is present for both periods.

Expected behavior
I believe that using a "LEFT JOIN" or "FULL JOIN" instead of an "INNER JOIN" could resolve this issue. This would ensure that results are displayed even when one period has data, and the other does not, allowing for more accurate rolling calculations.

Example of Improved Query:

(select …) as q_0
LEFT JOIN 
(select …) as q_1 
ON (q_0."orders__clerk" = q_1."m__orders__clerk" OR 
    (q_0."m__orders__clerk" IS NULL AND q_1."m__orders__clerk" IS NULL))
AND 
(q_0."m__orders__orderdate_year" = q_1."m__orders__orderdate_year" OR 
    (q_0."m__orders__orderdate_year" IS NULL AND q_1."m__orders__orderdate_year" IS NULL))

By making this adjustment, we can ensure that rolling calculations are displayed even when one dimension has null values, providing a more complete and accurate representation of the data.

Request for Feedback:
Could you please review this approach and provide your thoughts on whether using a "LEFT JOIN" or "FULL JOIN" would be a suitable solution for this issue? Your guidance on this matter would be greatly appreciated.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions