Skip to content

Use response files for zipper arguments to prevent "Argument list too long" errors #2958

Open
@ka-higa

Description

@ka-higa

🚀 feature request

Relevant Rules

py_binary, py_test (from @rules_python)

Description

When building Python binaries or tests using Bazel (specifically with py_binary and py_test),
we encounter an "Argument list too long" error during the build process.
This happens when our projects depend on a very large number of files, particularly those from large Python libraries managed by pip_parse (e.g., boto3, msgraph-sdk-python).

The root cause seems to be that Bazel calls the zipper tool by passing all file paths to be included in the package directly as command-line arguments.

def _create_zip_file(ctx, *, output, original_nonzip_executable, zip_main, runfiles):
"""Create a Python zipapp (zip with __main__.py entry point)."""
workspace_name = ctx.workspace_name
legacy_external_runfiles = _py_builtins.get_legacy_external_runfiles(ctx)
manifest = ctx.actions.args()
manifest.use_param_file("@%s", use_always = True)
manifest.set_param_file_format("multiline")
manifest.add("__main__.py={}".format(zip_main.path))
manifest.add("__init__.py=")
manifest.add(
"{}=".format(
_get_zip_runfiles_path("__init__.py", workspace_name, legacy_external_runfiles),
),
)
for path in runfiles.empty_filenames.to_list():
manifest.add("{}=".format(_get_zip_runfiles_path(path, workspace_name, legacy_external_runfiles)))
def map_zip_runfiles(file):
if file != original_nonzip_executable and file != output:
return "{}={}".format(
_get_zip_runfiles_path(file.short_path, workspace_name, legacy_external_runfiles),
file.path,
)
else:
return None
manifest.add_all(runfiles.files, map_each = map_zip_runfiles, allow_closure = True)
inputs = [zip_main]
if _py_builtins.is_bzlmod_enabled(ctx):
zip_repo_mapping_manifest = ctx.actions.declare_file(
output.basename + ".repo_mapping",
sibling = output,
)
_py_builtins.create_repo_mapping_manifest(
ctx = ctx,
runfiles = runfiles,
output = zip_repo_mapping_manifest,
)
manifest.add("{}/_repo_mapping={}".format(
_ZIP_RUNFILES_DIRECTORY_NAME,
zip_repo_mapping_manifest.path,
))
inputs.append(zip_repo_mapping_manifest)
for artifact in runfiles.files.to_list():
# Don't include the original executable because it isn't used by the
# zip file, so no need to build it for the action.
# Don't include the zipfile itself because it's an output.
if artifact != original_nonzip_executable and artifact != output:
inputs.append(artifact)
zip_cli_args = ctx.actions.args()
zip_cli_args.add("cC")
zip_cli_args.add(output)
ctx.actions.run(
executable = ctx.executable._zipper,
arguments = [zip_cli_args, manifest],
inputs = depset(inputs),
outputs = [output],
use_default_shell_env = True,
mnemonic = "PythonZipper",
progress_message = "Building Python zip: %{label}",
)

This leads to the argument list exceeding the operating system's ARG_MAX limit.

This error makes our Bazel builds unstable and undermines the reliability of our CI/CD pipelines for large Python projects.

Describe the solution you'd like

We propose enhancing rules_python to always pass arguments to the zipper tool via a temporary response file, rather than directly on the command line.

Upon inspecting zipper's zip_main.cc source code, it appears to support reading arguments from a file using the @ syntax (as indicated by logic to process arguments starting with @).

By consistently utilizing this response file capability, rules_python can entirely bypass OS ARG_MAX limitations.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions