Skip to content

docs: add sample for getting started with BQML #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Dec 12, 2023
Merged
Changes from 2 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c1573f0
docs: add sample for getting started with BQML
DevStephanie Oct 25, 2023
0b69c57
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Oct 25, 2023
4e7d81c
Creating clarifying comments
DevStephanie Oct 25, 2023
7e2094f
Merging comments with this branch
DevStephanie Oct 25, 2023
a22640b
Correcting comments, merging with first branch bqml_tutorial from bqm…
DevStephanie Oct 25, 2023
0fc8d09
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Oct 25, 2023
8751f50
corrections on comments
DevStephanie Oct 25, 2023
a67068e
Correcting code comments from BQ docs
DevStephanie Oct 26, 2023
9ec139e
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Oct 26, 2023
ec651b1
Fixing code comments to reflect BQML documentation
DevStephanie Nov 1, 2023
be95c76
Correcting comments to reflect BQML documentation
DevStephanie Nov 1, 2023
fbbe32b
Correcting code comments
DevStephanie Nov 6, 2023
d4591c8
Merge branch 'main' into bqml_tutorial
DevStephanie Nov 6, 2023
a2b7f2f
Correcting documentation code
DevStephanie Nov 7, 2023
c899565
Correcting documentation errors
DevStephanie Nov 7, 2023
8364454
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Nov 7, 2023
509c1f4
Correcting documentation comments and correcting features
DevStephanie Nov 7, 2023
16d4f18
🦉 Updates from OwlBot post-processor
gcf-owl-bot[bot] Nov 7, 2023
34eb65a
Correcting documention comments for code samples
DevStephanie Nov 7, 2023
9aa6e7a
Merge branch 'bqml_tutorial' of https://github.com/googleapis/python-…
DevStephanie Nov 7, 2023
0a9f06d
Merge branch 'main' into bqml_tutorial
DevStephanie Nov 10, 2023
c4a3b55
Merge branch 'bqml_tutorial' of https://github.com/googleapis/python-…
DevStephanie Nov 10, 2023
16c6fb0
Apply suggestions from code review
DevStephanie Nov 10, 2023
77c22b9
Correcting documentation comments
DevStephanie Nov 13, 2023
bcdc9e2
Merge branch 'bqml_tutorial' of https://github.com/googleapis/python-…
DevStephanie Nov 13, 2023
93f911d
Correcting documentation comments
DevStephanie Nov 13, 2023
f3aee5d
Apply suggestions from code review
tswast Nov 16, 2023
1ac855d
Apply suggestions from code review
tswast Nov 16, 2023
b25bb26
Merge branch 'main' into bqml_tutorial
tswast Nov 16, 2023
1f0910a
Merge branch 'main' into bqml_tutorial
tswast Dec 11, 2023
5494f46
Fixtures for temporary resources
DevStephanie Dec 12, 2023
a47a777
Merge remote-tracking branch 'origin' into bqml_tutorial
DevStephanie Dec 12, 2023
016e81c
Merge remote-tracking branch 'origin/bqml_tutorial' into bqml_tutorial
DevStephanie Dec 12, 2023
7a04299
Deleting files
DevStephanie Dec 12, 2023
fbf2527
Merge branch 'main' into bqml_tutorial
tswast Dec 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 31 additions & 6 deletions samples/snippets/bqml_getting_started_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,23 @@


def test_bqml_getting_started():
<<<<<<< HEAD
# [start bigquery_getting_Started_bqml_tutorial]
import bigframes.pandas as bpd
=======
# [START bigquery_getting_Started_bqml_tutorial]
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e
from bigframes.ml.linear_model import LogisticRegression
import bigframes.pandas as bpd

<<<<<<< HEAD
# Read_gbq loads a DataFrame from BiqQuery and gives an unordered,
# unindexed data source. The default DataFrame will have an arbitary
# index and ordering.

=======
# EXPLANATION - REFERENCE GBQ DOCS!
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e
df = bpd.read_gbq(
# Generate_UUID produces a random universally uniquee identifier
# as a STRING value.
Expand All @@ -34,18 +43,18 @@ def test_bqml_getting_started():
""",
index_col="rowindex",
)

# Extract the total number of transactions within
# the Google Analytics session.
#
# Because the totals column is a STRUCT data type, we need to call
# Series.struct.field("transactions") to extract the transactions field.
# See the reference documentation below:
# Series.struct.field("transactions") to extract the transactions field.
# See the reference documentation below:
# https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor#bigframes_operations_structs_StructAccessor_field
transactions = df['totals'].struct.field("transactions")
transactions = df["totals"].struct.field("transactions")

# If the number of transactions is NULL, the value in the label
# column is set to 0. Otherwise, it is set to 1. These values
# If the number of transactions is NULL, the value in the label
# column is set to 0. Otherwise, it is set to 1. These values
# represent the possible outcomes.
label = transactions.notnull().map({True: 1, False: 0})

Expand All @@ -59,14 +68,20 @@ def test_bqml_getting_started():
# Extract where the visitors country of origin is.
country = df["geoNetwork"].struct.field("country").fillna("")

<<<<<<< HEAD
# Extract the total pageviews from the totals column.
pageviews = df['totals'].struct.field("pageviews").fillna(0)
=======
# Total number of pageviews within the session.
pageviews = df["totals"].struct.field("pageviews").fillna(0)
>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e

# Selecting values to represent data in columns in DataFrames.
features = bpd.DataFrame(
{"os": operatingSystem, "is_mobile": isMobile, "pageviews": pageviews}
)

<<<<<<< HEAD
# Logistic Regression model splits data into two classes,giving the
# probablity the data is in one of the classes.
model = LogisticRegression()
Expand All @@ -76,3 +91,13 @@ def test_bqml_getting_started():
#
model.to_gbq("bqml_tutorial.sample_model", replace=True)
# [END bigquery_getting_started_bqml_tutorial]
=======
# Logistic Regression model splits data into two classes, giving the
# probablity the data is in one of the classes.
model = LogisticRegression()
model.fit(features, label)

# When writing a DataFrame to a BigQuery table, include destinaton table
# and parameters, index defaults to "True".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has nothing to do with BigQuery ML models. Please fix.

Note: The important thing here is that we're taking our trained model and writing it to a permanent location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, corrected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The comment is still talking about tables not models. I'll make a comment with a suggested edit.

model.to_gbq("bqml_tutorial.sample_model", replace=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We're getting

E           google.api_core.exceptions.BadRequest: 400 Concurrent update on same model: bigframes-dev:bqml_tutorial.sample_model is not supported. Share your usecase with the BigQuery DataFrames team at the [https://bit.ly/bigframes-feedback](https://www.google.com/url?q=https://bit.ly/bigframes-feedback&sa=D) survey.

failure in our test suite: https://fusion2.corp.google.com/invocations/8a7513c8-e7c9-4b5b-82fe-9a83c176fbc1/targets/bigframes%2Fpresubmit%2Fe2e/log

I think we'll need a test fixture for this to create a temporary place for the model and clean it up when the test finishes.

  1. Create a file called samples/snippets/conftest.py.

  2. In the conftest.py file you create, add a fixture called random_model_id, similar to this one: https://github.com/googleapis/python-bigquery/blob/f804d639fe95bef5d083afe1246d756321128b05/samples/snippets/conftest.py#L101-L111 except it'll call delete_model(...) instead of delete_table(...).

    You'll also need to add "prefixer" https://github.com/googleapis/python-bigquery/blob/f804d639fe95bef5d083afe1246d756321128b05/samples/snippets/conftest.py#L21 and bigquery_client fixture https://github.com/googleapis/python-bigquery/blob/f804d639fe95bef5d083afe1246d756321128b05/samples/snippets/conftest.py#L33-L36

  3. Update your code sample to use the new random_model_id fixture.

    Look how we do it in the remote functions test:

    your_gcp_project_id = project_id
    # [START bigquery_dataframes_remote_function]
    import bigframes.pandas as bpd
    # Set BigQuery DataFrames options
    bpd.options.bigquery.project = your_gcp_project_id
    but instead you'll be setting your_model_id = random_model_id and calling

    model.to_gbq(
        your_model_id,  # "project.dataset.model_id" or "dataset.model_id"
        replace=True,
    )
    

>>>>>>> 9ec139e7e275e8082379022e3eb1ce06e9664c2e