-
Notifications
You must be signed in to change notification settings - Fork 48
docs: add the first sample for the Single time-series forecasting from Google Analytics data tutorial #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uery-dataframes into salem_timeseriessample
…es into salem_timeseriessample
# Start by selecting the data you'll use for training. `read_gbq` accepts | ||
# either a SQL query or a table ID. Since this example selects from multiple | ||
# tables via a wildcard, use SQL to define this data. Watch issue | ||
# https://github.com/googleapis/python-bigquery-dataframes/issues/169 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wildcard tables are now supported. You aren't using SQL here.
# [START bigquery_dataframes_single_timeseries_forecasting_model_tutorial] | ||
import bigframes.pandas as bpd | ||
|
||
# Start by selecting the data you'll use for training. `read_gbq` accepts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In "Step two" (https://cloud.google.com/bigquery/docs/arima-single-time-series-forecasting-tutorial#step_two_optional_visualize_the_time_series_you_want_to_forecast) we aren't doing any training yet.
Instead, this sentence from the SQL version seems more applicable:
The
FROM bigquery-public-data.google_analytics_sample.ga_sessions_*
clause indicates that you are querying thega_sessions_*
tables in thegoogle_analytics_sample
dataset.
Please rephrase that to apply to what you're doing here.
'bigquery-public-data.google_analytics_sample.ga_sessions_*' | ||
) | ||
parsed_date = bpd.to_datetime(df.date, format= "%Y%m%d", utc = True) | ||
total_visits = df.groupby(["date"])["parsed_date"].sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our 1:1 we did a series groupby to calculate the number of visits per day. It is possible to do the same with a DataFrame groupby, but if so, you'll need to select just the "visits" field here before calling sum(), which is slightly more convoluted since visits is a subfield of a struct.
Here is the summary of changes. You are about to add 1 region tag.
This comment is generated by snippet-bot.
|
Looks like the |
BigQuery DataFrames sample for Single time-series forecasting from Google Analytics data, Step two (optional): Visualize the time series you want to forecast.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕