Skip to content

use the latest release apache-datasketches-bigquery-1.1.1 #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AlexanderSaydakov
Copy link
Contributor

@danieldeleo
Copy link
Collaborator

/gcbrun

@danieldeleo
Copy link
Collaborator

@AlexanderSaydakov apache/datasketches-bigquery#142 introduced breaking changes to the build process for this repo. Can you revert that PR and instead specify that JS_BUCKET values must not end in forward slash? You can have JS_BUCKET specify a path to a folder but you just need to leave out the last forward slash.

@AlexanderSaydakov
Copy link
Contributor Author

I think I tried that and it did not work, but the modified Makefiles worked for both cases (bucket and folder). I will revisit this.

@danieldeleo
Copy link
Collaborator

I think I tried that and it did not work, but the modified Makefiles worked for both cases (bucket and folder). I will revisit this.

The build pipeline for datasketches in bigquery-utils repo passes a folder to the JS_BUCKET variable so I'm certain it works with folders (leaving out the trailing forward slash). Let me know if you can reproduce the issue you saw that led to the PR, I can help debug.

@AlexanderSaydakov
Copy link
Contributor Author

I tested the makefiles when I made the change and re-tested now with JS_BUCKET not having the trailing slash in both cases: gs://bucket and gs://bucket/folder. Works for me.

@danieldeleo
Copy link
Collaborator

I tested the makefiles when I made the change and re-tested now with JS_BUCKET not having the trailing slash in both cases: gs://bucket and gs://bucket/folder. Works for me.

You'll need to now revert the changes from apache/datasketches-bigquery#142 in order to publish via this repo

@AlexanderSaydakov
Copy link
Contributor Author

AlexanderSaydakov commented May 9, 2025

Could you clarify why do you think we need to revert the change? I have just re-tested the release 1.1.0 setting JS_BUCKET=gs://bucketname and JS_BUCKET=gs://bucketname/foldername (without slash at the end). Both ways work.

@danieldeleo
Copy link
Collaborator

danieldeleo commented May 9, 2025

Could you clarify why do you think we need to revert the change? I have just re-tested the release 1.1.0 setting JS_BUCKET=gs://bucketname and JS_BUCKET=gs://bucketname/foldername (without slash at the end). Both ways work.

PR#142 removed the forward slash from the following command:

gcloud storage cp $$file $(JS_BUCKET)/

which results in cp command thinking that the gs:// path is to a file instead of a folder.
If I pass JS_BUCKET=gs://my_bucket/my_folder, then every file will be written to a file named "my_folder" (overwriting the previous file too), instead of every file being written into a folder named "my_folder".

Here is an excerpt of the build logs from this repo (note how the makefiles all write/overwrite the same filename "2eaa735":

make[1]: Entering directory '/workspace/theta'
Copying file://theta_sketch.mjs to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://theta_sketch.js to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://theta_sketch.wasm to gs://bqutil-lib-test/datasketches/2eaa735
  
..
make[1]: Leaving directory '/workspace/theta'
make -C tuple upload
make[1]: Entering directory '/workspace/tuple'
Copying file://tuple_sketch_int64.mjs to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://tuple_sketch_int64.js to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://tuple_sketch_int64.wasm to gs://bqutil-lib-test/datasketches/2eaa735
....
make[1]: Leaving directory '/workspace/tuple'
make -C cpc upload
make[1]: Entering directory '/workspace/cpc'
Copying file://cpc_sketch.mjs to gs://bqutil-lib-test/datasketches/2eaa735
...
Copying file://cpc_sketch.js to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://cpc_sketch.wasm to gs://bqutil-lib-test/datasketches/2eaa735
  
..
make[1]: Leaving directory '/workspace/cpc'
make -C hll upload
make[1]: Entering directory '/workspace/hll'
Copying file://hll_sketch.mjs to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://hll_sketch.js to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://hll_sketch.wasm to gs://bqutil-lib-test/datasketches/2eaa735
...
make[1]: Leaving directory '/workspace/hll'
make -C kll upload
make[1]: Entering directory '/workspace/kll'
Copying file://kll_sketch_float.mjs to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://kll_sketch_float.js to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://kll_sketch_float.wasm to gs://bqutil-lib-test/datasketches/2eaa735
..
make[1]: Leaving directory '/workspace/kll'
make -C fi upload
make[1]: Entering directory '/workspace/fi'
Copying file://fs_sketch.mjs to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://fs_sketch.js to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://fs_sketch.wasm to gs://bqutil-lib-test/datasketches/2eaa735
  
..
make[1]: Leaving directory '/workspace/fi'
make -C tdigest upload
make[1]: Entering directory '/workspace/tdigest'
Copying file://tdigest_double.mjs to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://tdigest_double.js to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://tdigest_double.wasm to gs://bqutil-lib-test/datasketches/2eaa735
..
make[1]: Leaving directory '/workspace/tdigest'
make -C req upload
make[1]: Entering directory '/workspace/req'
Copying file://req_sketch_float.mjs to gs://bqutil-lib-test/datasketches/2eaa735
..
Copying file://req_sketch_float.js to gs://bqutil-lib-test/datasketches/2eaa735
  
..
Copying file://req_sketch_float.wasm to gs://bqutil-lib-test/datasketches/2eaa735
  
..
make[1]: Leaving directory '/workspace/req'
(cd cicd && ./init_dataform.sh)
dataform run --tags "udfs"
Compiling...
Compiled successfully.

@AlexanderSaydakov
Copy link
Contributor Author

AlexanderSaydakov commented May 9, 2025

I guess that in your case the specified folder "2eaa735" does not exist. Is that right? Do you expect the copy command to create the folder?
I tested deploying into existing folder, and it works for me.

@danieldeleo
Copy link
Collaborator

I guess that in your case the specified folder "2eaa735" does not exist. Is that right? Do you expect the copy command to create the folder? I tested deploying into existing folder, and it works for me.

Correct, the specified folder doesn't exist nor should it need to since folders aren't actually real objects by default in Cloud Storage (see https://cloud.google.com/storage/docs/objects#flat-namespace for more details).

@danieldeleo
Copy link
Collaborator

/gcbrun

@AlexanderSaydakov AlexanderSaydakov changed the title use the latest release apache-datasketches-bigquery-1.1.0 use the latest release apache-datasketches-bigquery-1.1.1 May 31, 2025
@danieldeleo
Copy link
Collaborator

@AlexanderSaydakov The unit tests are not passing. There are 117 failures (I've attached a file with error details).

Were you able to make the unit tests pass in your own BigQuery project or is this the first time the unit tests are run in a BigQuery project?

bquxjob_2c49b85f_19730f101bb.json

@AlexanderSaydakov
Copy link
Contributor Author

Tests pass for me and for several others who voted on the release.

@AlexanderSaydakov
Copy link
Contributor Author

It seems to me that you did not rerun the tests after the version change to 1.1.1

@danieldeleo
Copy link
Collaborator

Tests pass for me and for several others who voted on the release.

How are you running the tests?

@danieldeleo
Copy link
Collaborator

It seems to me that you did not rerun the tests after the version change to 1.1.1

https://github.com/GoogleCloudPlatform/bigquery-utils/pull/492/checks?check_run_id=43226504649

@AlexanderSaydakov
Copy link
Contributor Author

I am running 'make', 'make install' and 'make test'

@danieldeleo
Copy link
Collaborator

danieldeleo commented Jun 2, 2025

I am running 'make', 'make install' and 'make test'

I ran these exact commands in a clean cloud shell instance for datasketches tag v1.1.1 and got the same failures.

@AlexanderSaydakov
Copy link
Contributor Author

AlexanderSaydakov commented Jun 2, 2025

I used emscripten 4.0.7 (installed by brew on MacOS).
In your log I see 4.0.9, so I upgraded to this version (brew upgrade emscripten), and now I see the same errors.
I will look into this when I have time.
In the meantime, would it be possible to pin emscripten to version 4.0.7 in your deployment?

@danieldeleo
Copy link
Collaborator

I used emscripten 4.0.7 (installed by brew on MacOS). In your log I see 4.0.9, so I upgraded to this version (brew upgrade emscripten), and now I see the same errors. I will look into this when I have time. In the meantime, would it be possible to pin emscripten to version 4.0.7 in your deployment?

I've added a not-so-pretty shim in this repo to pin emsdk to version 4.0.7 but only as a temporary workaround until apache/datasketches-bigquery#156 makes its way to the next release.

Thanks for the help debugging and finding the version issue!

@danieldeleo
Copy link
Collaborator

/gcbrun

@danieldeleo danieldeleo merged commit 0aad41a into GoogleCloudPlatform:master Jun 3, 2025
87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants