Skip to content

Rolling Window Calculations with Multiple Dimensions #9353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kraft6 opened this issue Mar 16, 2025 · 0 comments · May be fixed by #9502
Open

Rolling Window Calculations with Multiple Dimensions #9353

kraft6 opened this issue Mar 16, 2025 · 0 comments · May be fixed by #9502
Assignees

Comments

@kraft6
Copy link

kraft6 commented Mar 16, 2025

Describe the bug
Topic: Rolling Window Calculations with Multiple Dimensions
Product Version:
• Cube: 1.1.16
• Source: Trino/Starburst
Configuration:
• CUBESQL_SQL_PUSH_DOWN=false
• CUBEJS_ALLOW_UNGROUPED_WITHOUT_PRIMARY_KEY=true

I am experiencing an issue with rolling window calculations when using multiple dimensions. Here is the relevant configuration:

Rolling month calculations

- name: current_month_sum
  sql: totalprice
  type: sum
  rolling_window:
    trailing: 1 month
    offset: end

- name: previous_month_sum
  sql: totalprice
  type: sum
  rolling_window:
    trailing: 1 month
    offset: start

- name: month_over_month_ratio
  sql: "{total_price} / {order_cost}"

The dynamically generated query executed in Trino/Starburst uses an "INNER JOIN" between the current and previous month results:

(select …) as q_0
INNER JOIN 
(select …) as q_1 
ON (q_0."orders__clerk" = q_1."m__orders__clerk" OR 
    (q_0."m__orders__clerk" IS NULL AND q_1."m__orders__clerk" IS NULL))
AND 
(q_0."m__orders__orderdate_year" = q_1."m__orders__orderdate_year" OR 
    (q_0."m__orders__orderdate_year" IS NULL AND q_1."m__orders__orderdate_year" IS NULL))

The use of "INNER JOIN" results in output only when there is data for both the current and previous months. This becomes particularly problematic when additional dimensions are added, as it restricts the results to cases where data is present for both periods.

Expected behavior
I believe that using a "LEFT JOIN" or "FULL JOIN" instead of an "INNER JOIN" could resolve this issue. This would ensure that results are displayed even when one period has data, and the other does not, allowing for more accurate rolling calculations.

Example of Improved Query:

(select …) as q_0
LEFT JOIN 
(select …) as q_1 
ON (q_0."orders__clerk" = q_1."m__orders__clerk" OR 
    (q_0."m__orders__clerk" IS NULL AND q_1."m__orders__clerk" IS NULL))
AND 
(q_0."m__orders__orderdate_year" = q_1."m__orders__orderdate_year" OR 
    (q_0."m__orders__orderdate_year" IS NULL AND q_1."m__orders__orderdate_year" IS NULL))

By making this adjustment, we can ensure that rolling calculations are displayed even when one dimension has null values, providing a more complete and accurate representation of the data.

Request for Feedback:
Could you please review this approach and provide your thoughts on whether using a "LEFT JOIN" or "FULL JOIN" would be a suitable solution for this issue? Your guidance on this matter would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants