Skip to content

docs: add the first sample for the Single time-series forecasting from Google Analytics data tutorial #623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 23, 2024

Conversation

SalemJorden
Copy link
Contributor

@SalemJorden SalemJorden commented Apr 19, 2024

BigQuery DataFrames sample for Single time-series forecasting from Google Analytics data, Step two (optional): Visualize the time series you want to forecast.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels Apr 19, 2024
# Start by selecting the data you'll use for training. `read_gbq` accepts
# either a SQL query or a table ID. Since this example selects from multiple
# tables via a wildcard, use SQL to define this data. Watch issue
# https://github.com/googleapis/python-bigquery-dataframes/issues/169
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard tables are now supported. You aren't using SQL here.

# [START bigquery_dataframes_single_timeseries_forecasting_model_tutorial]
import bigframes.pandas as bpd

# Start by selecting the data you'll use for training. `read_gbq` accepts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "Step two" (https://cloud.google.com/bigquery/docs/arima-single-time-series-forecasting-tutorial#step_two_optional_visualize_the_time_series_you_want_to_forecast) we aren't doing any training yet.

Instead, this sentence from the SQL version seems more applicable:

The FROM bigquery-public-data.google_analytics_sample.ga_sessions_* clause indicates that you are querying the ga_sessions_* tables in the google_analytics_sample dataset.

Please rephrase that to apply to what you're doing here.

'bigquery-public-data.google_analytics_sample.ga_sessions_*'
)
parsed_date = bpd.to_datetime(df.date, format= "%Y%m%d", utc = True)
total_visits = df.groupby(["date"])["parsed_date"].sum()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our 1:1 we did a series groupby to calculate the number of visits per day. It is possible to do the same with a DataFrame groupby, but if so, you'll need to select just the "visits" field here before calling sum(), which is slightly more convoluted since visits is a subfield of a struct.

@tswast tswast mentioned this pull request Apr 22, 2024
4 tasks
@tswast tswast marked this pull request as ready for review April 22, 2024 20:35
@tswast tswast requested review from a team as code owners April 22, 2024 20:35
Copy link

snippet-bot bot commented Apr 22, 2024

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 22, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 22, 2024
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 22, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 22, 2024
@tswast
Copy link
Collaborator

tswast commented Apr 23, 2024

Looks like the e2e tests passed! 🎉 But there was a flake in the Kokoro presubmit tests that is unrelated to this change. Re-running the tests should hopefully let us merge.

@tswast tswast changed the title Docs: Single Time Series Forecasting Code Sample Step 2 docs: add the first sample for the Single time-series forecasting from Google Analytics data tutorial Apr 23, 2024
@tswast tswast enabled auto-merge (squash) April 23, 2024 14:56
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 23, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Apr 23, 2024
@tswast tswast merged commit 2b84c4f into googleapis:main Apr 23, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: s Pull request size is small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants