Skip to content

chore(devcontainer): use debian's protobuf-compiler package #16687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fvj
Copy link
Contributor

@fvj fvj commented Jul 4, 2025

Which issue does this PR close?

None. I didn't find any issue for this problem but it's too small of a change for an issue, in my opinion. Happy to retroactively create one, though.

Rationale for this change

unfortunately, the current Dockerfile does not set the PATH variable correctly. the RUN directive is evaluated at build time. thus, the $HOME variable is resolved correctly to /root/ when installing the protoc binary; the binary ends up in /root/.local/bin/protoc. in contrast, an ENV directive is evaluated at a container's runtime, so if $HOME is not correctly set at runtime, the PATH addition $HOME/.local/bin resolves to /.local/bin, which does not exist.

What changes are included in this PR?

  • change the way the devcontainer's Docker image installs protoc from "from-source" to using the base image's (debian bookworm) package repository, in line with [1].
  • downgrade from protoc v3.25.1 to v3.21.12. the docs [1] say any version above v3.15 is fine.

[1] https://datafusion.apache.org/contributor-guide/development_environment.html#protoc-installation

Are these changes tested?

Yes. I tested the container to check if the protoc binary is found:

$ docker build . -t datafusion-devcontainer:latest &>/dev/null && \ 
  docker run --rm -it datafusion-devcontainer:latest protoc --version
libprotoc 3.21.12

Additionally, I used the devcontainer to build DataFusion inside the devcontainer.

Are there any user-facing changes?

No, none.

unfortunately, the current `Dockerfile` does not set the `PATH` variable
correctly. the `RUN` directive is evaluated at build time. thus, the
`$HOME` variable is resolved correctly to `/root/` when
installing the `protoc` binary; the binary ends up in
`/root/.local/bin/protoc`.
in contrast, an `ENV` directive is evaluated at a container's runtime,
so if `$HOME` is not correctly set at runtime, the `PATH` addition
`$HOME/.local/bin` resolves to `/.local/bin`, which does not exist.
since the linked docs now also recommend to use a package manager
to install the `protoc` compiler, the `Dockerfile` does the same now.

at the time of writing, bookworm's version is `libprotoc 3.21.12`,
which is higher than the required version of `3.15`.
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @fvj

I verified that trying to build datafusion in the current dev contianer did not work. For anyone who tries this, here is the error I got:

Caused by:
  process didn't exit successfully: `/datafusion/target/debug/build/substrait-d198aa99ad943137/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-env-changed=FORCE_REBUILD
  cargo:rerun-if-changed=substrait
  cargo:rerun-if-changed=substrait/text/simple_extensions_schema.yaml
  cargo:rerun-if-changed=substrait/proto/substrait/type.proto
  cargo:rerun-if-changed=substrait/proto/substrait/parameterized_types.proto
  cargo:rerun-if-changed=substrait/proto/substrait/extensions/extensions.proto
  cargo:rerun-if-changed=substrait/proto/substrait/type_expressions.proto
  cargo:rerun-if-changed=substrait/proto/substrait/plan.proto
  cargo:rerun-if-changed=substrait/proto/substrait/capabilities.proto
  cargo:rerun-if-changed=substrait/proto/substrait/extended_expression.proto
  cargo:rerun-if-changed=substrait/proto/substrait/function.proto
  cargo:rerun-if-changed=substrait/proto/substrait/algebra.proto

  --- stderr
  Error: Custom { kind: NotFound, error: "Could not find `protoc`. If `protoc` is installed, try setting the `PROTOC` environment variable to the path of the `protoc` binary. To install it on Debian, run `apt-get install protobuf-compiler`. It is also available at https://github.com/protocolbuffers/protobuf/releases  For more information: https://docs.rs/prost-build/#sourcing-protoc" }

However, even with your change the build still fails for me like this

error: failed to run custom build command for `substrait v0.58.0`

Caused by:
  process didn't exit successfully: `/datafusion/target/docker2/debug/build/substrait-3971c227a7ff9527/build-script-build` (exit status: 1)
  --- stdout
  cargo:rerun-if-env-changed=FORCE_REBUILD
  cargo:rerun-if-changed=substrait
  cargo:rerun-if-changed=substrait/text/simple_extensions_schema.yaml
  cargo:rerun-if-changed=substrait/proto/substrait/type.proto
  cargo:rerun-if-changed=substrait/proto/substrait/parameterized_types.proto
  cargo:rerun-if-changed=substrait/proto/substrait/extensions/extensions.proto
  cargo:rerun-if-changed=substrait/proto/substrait/type_expressions.proto
  cargo:rerun-if-changed=substrait/proto/substrait/plan.proto
  cargo:rerun-if-changed=substrait/proto/substrait/capabilities.proto
  cargo:rerun-if-changed=substrait/proto/substrait/extended_expression.proto
  cargo:rerun-if-changed=substrait/proto/substrait/function.proto
  cargo:rerun-if-changed=substrait/proto/substrait/algebra.proto

  --- stderr
  Error: Custom { kind: Other, error: "protoc failed: google/protobuf/empty.proto: File not found.\nsubstrait/type.proto:6:1: Import \"google/protobuf/empty.proto\" was not found or had errors.\nsubstrait/type.proto:234:7: \"google.protobuf.Empty\" is not defined.\n" }
warning: build failed, waiting for other jobs to finish...

did you find some way to work around that?

@fvj
Copy link
Contributor Author

fvj commented Jul 5, 2025

Weird, it worked on my machine (tm). Let me investigate! Should I close the PR for now and reopen when I fixed?

edit: I think I had a dirty devcontainer running when testing. will ping again once confirmed it works on a clean one.

@fvj
Copy link
Contributor Author

fvj commented Jul 5, 2025

latest patch should include the missing files:

$ docker build . -t datafusion-devcontainer:latest &>/dev/null && docker run --rm -it datafusion-devcontainer:latest bash -c "ls /usr/include/google/protobuf/*.proto"
/usr/include/google/protobuf/any.proto	      /usr/include/google/protobuf/empty.proto		 /usr/include/google/protobuf/timestamp.proto
/usr/include/google/protobuf/api.proto	      /usr/include/google/protobuf/field_mask.proto	 /usr/include/google/protobuf/type.proto
/usr/include/google/protobuf/descriptor.proto  /usr/include/google/protobuf/source_context.proto  /usr/include/google/protobuf/wrappers.proto
/usr/include/google/protobuf/duration.proto    /usr/include/google/protobuf/struct.proto

the bookworm package protobuf-compiler, in contrast to ubuntu's, does not include google's common protobuf files. I've added libprotobuf-dev, which contains them (in addition to prebuilt header files?) as a dependency.

@fvj fvj requested a review from alamb July 5, 2025 13:23
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fvj -- I tried it locally now and everything seems to work great

cd .devcontainer/
docker build . -t datafusion-devcontainer:latest
docker run -v `pwd`/..:/datafusion --rm -it datafusion-devcontainer:latest bash

And then in the container

root@c32993dd1926:/# cd datafusion/
root@c32993dd1926:/datafusion# CARGO_TARGET_DIR=`pwd`/target/linux cargo run -p datafusion-cli

@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

🚀

@alamb alamb merged commit a17292d into apache:main Jul 8, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 8, 2025

Thanks again @fvj

@fvj fvj deleted the chore/jvf/devcontainer-fix-protoc-installation branch July 8, 2025 22:10
@fvj
Copy link
Contributor Author

fvj commented Jul 8, 2025

you're very welcome. I hope I can contribute more valuable stuff in the future :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants