-
Notifications
You must be signed in to change notification settings - Fork 50
docs: add sample for getting started with BQML #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
c1573f0
0b69c57
4e7d81c
7e2094f
a22640b
0fc8d09
8751f50
a67068e
9ec139e
ec651b1
be95c76
fbbe32b
d4591c8
a2b7f2f
c899565
8364454
509c1f4
16d4f18
34eb65a
9aa6e7a
0a9f06d
c4a3b55
16c6fb0
77c22b9
bcdc9e2
93f911d
f3aee5d
1ac855d
b25bb26
1f0910a
5494f46
a47a777
016e81c
7a04299
fbf2527
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -14,14 +14,23 @@ | |||||||||||||||
|
||||||||||||||||
|
||||||||||||||||
def test_bqml_getting_started(): | ||||||||||||||||
<<<<<<< HEAD | ||||||||||||||||
# [start bigquery_getting_Started_bqml_tutorial] | ||||||||||||||||
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
import bigframes.pandas as bpd | ||||||||||||||||
======= | ||||||||||||||||
# [START bigquery_getting_Started_bqml_tutorial] | ||||||||||||||||
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e | ||||||||||||||||
from bigframes.ml.linear_model import LogisticRegression | ||||||||||||||||
import bigframes.pandas as bpd | ||||||||||||||||
|
||||||||||||||||
<<<<<<< HEAD | ||||||||||||||||
# Read_gbq loads a DataFrame from BiqQuery and gives an unordered, | ||||||||||||||||
# unindexed data source. The default DataFrame will have an arbitary | ||||||||||||||||
# index and ordering. | ||||||||||||||||
|
||||||||||||||||
======= | ||||||||||||||||
# EXPLANATION - REFERENCE GBQ DOCS! | ||||||||||||||||
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e | ||||||||||||||||
df = bpd.read_gbq( | ||||||||||||||||
# Generate_UUID produces a random universally uniquee identifier | ||||||||||||||||
# as a STRING value. | ||||||||||||||||
|
@@ -34,18 +43,18 @@ def test_bqml_getting_started(): | |||||||||||||||
""", | ||||||||||||||||
index_col="rowindex", | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
# Extract the total number of transactions within | ||||||||||||||||
# the Google Analytics session. | ||||||||||||||||
# | ||||||||||||||||
# Because the totals column is a STRUCT data type, we need to call | ||||||||||||||||
tswast marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
# Series.struct.field("transactions") to extract the transactions field. | ||||||||||||||||
# See the reference documentation below: | ||||||||||||||||
# Series.struct.field("transactions") to extract the transactions field. | ||||||||||||||||
# See the reference documentation below: | ||||||||||||||||
# https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor#bigframes_operations_structs_StructAccessor_field | ||||||||||||||||
transactions = df['totals'].struct.field("transactions") | ||||||||||||||||
transactions = df["totals"].struct.field("transactions") | ||||||||||||||||
|
||||||||||||||||
# If the number of transactions is NULL, the value in the label | ||||||||||||||||
# column is set to 0. Otherwise, it is set to 1. These values | ||||||||||||||||
# If the number of transactions is NULL, the value in the label | ||||||||||||||||
# column is set to 0. Otherwise, it is set to 1. These values | ||||||||||||||||
# represent the possible outcomes. | ||||||||||||||||
label = transactions.notnull().map({True: 1, False: 0}) | ||||||||||||||||
|
||||||||||||||||
|
@@ -59,14 +68,20 @@ def test_bqml_getting_started(): | |||||||||||||||
# Extract where the visitors country of origin is. | ||||||||||||||||
country = df["geoNetwork"].struct.field("country").fillna("") | ||||||||||||||||
|
||||||||||||||||
<<<<<<< HEAD | ||||||||||||||||
# Extract the total pageviews from the totals column. | ||||||||||||||||
pageviews = df['totals'].struct.field("pageviews").fillna(0) | ||||||||||||||||
======= | ||||||||||||||||
# Total number of pageviews within the session. | ||||||||||||||||
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
pageviews = df["totals"].struct.field("pageviews").fillna(0) | ||||||||||||||||
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e | ||||||||||||||||
|
||||||||||||||||
# Selecting values to represent data in columns in DataFrames. | ||||||||||||||||
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
features = bpd.DataFrame( | ||||||||||||||||
{"os": operatingSystem, "is_mobile": isMobile, "pageviews": pageviews} | ||||||||||||||||
) | ||||||||||||||||
|
||||||||||||||||
<<<<<<< HEAD | ||||||||||||||||
# Logistic Regression model splits data into two classes,giving the | ||||||||||||||||
# probablity the data is in one of the classes. | ||||||||||||||||
model = LogisticRegression() | ||||||||||||||||
|
@@ -76,3 +91,13 @@ def test_bqml_getting_started(): | |||||||||||||||
# | ||||||||||||||||
model.to_gbq("bqml_tutorial.sample_model", replace=True) | ||||||||||||||||
# [END bigquery_getting_started_bqml_tutorial] | ||||||||||||||||
======= | ||||||||||||||||
# Logistic Regression model splits data into two classes, giving the | ||||||||||||||||
# probablity the data is in one of the classes. | ||||||||||||||||
tswast marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
model = LogisticRegression() | ||||||||||||||||
model.fit(features, label) | ||||||||||||||||
|
||||||||||||||||
# When writing a DataFrame to a BigQuery table, include destinaton table | ||||||||||||||||
# and parameters, index defaults to "True". | ||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment has nothing to do with BigQuery ML models. Please fix. Note: The important thing here is that we're taking our trained model and writing it to a permanent location. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, corrected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI: The comment is still talking about tables not models. I'll make a comment with a suggested edit.
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
model.to_gbq("bqml_tutorial.sample_model", replace=True) | ||||||||||||||||
DevStephanie marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI: We're getting
failure in our test suite: https://fusion2.corp.google.com/invocations/8a7513c8-e7c9-4b5b-82fe-9a83c176fbc1/targets/bigframes%2Fpresubmit%2Fe2e/log I think we'll need a test fixture for this to create a temporary place for the model and clean it up when the test finishes.
|
||||||||||||||||
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e |
Uh oh!
There was an error while loading. Please reload this page.