Skip to content

[DO NOT MERGE!] Fix unicode entrypoint issue #254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
27 changes: 10 additions & 17 deletions src/installer/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,16 +65,6 @@
"WheelFilename", ["distribution", "version", "build_tag", "tag"]
)

# Adapted from https://github.com/python/importlib_metadata/blob/v3.4.0/importlib_metadata/__init__.py#L90
_ENTRYPOINT_REGEX = re.compile(
r"""
(?P<module>[\w.]+)\s*
(:\s*(?P<attrs>[\w.]+))\s*
(?P<extras>\[.*\])?\s*$
""",
re.VERBOSE | re.UNICODE,
)

# According to https://www.python.org/dev/peps/pep-0427/#id7
SCHEME_NAMES = cast(AllSchemes, ("purelib", "platlib", "headers", "scripts", "data"))

Expand Down Expand Up @@ -244,16 +234,19 @@ def parse_entrypoints(text: str) -> Iterable[tuple[str, str, str, "ScriptSection

for name, value in config.items(section):
assert isinstance(name, str)
match = _ENTRYPOINT_REGEX.match(value)
assert match
assert ":" in value

module, attrs = [x.strip() for x in value.split(":", 1)]
assert all(x.isidentifier() for x in module.split(".")), (
f"{module} are not all valid identifiers"
)

module = match.group("module")
assert isinstance(module, str)
if "[" in attrs and "]" in attrs:
attrs, extras = [x.strip() for x in attrs.split("[", 1)]

attrs = match.group("attrs")
# TODO: make this a proper error, which can be caught.
assert attrs is not None
assert isinstance(attrs, str)
assert len(attrs), "Attributes are empty"
assert attrs.isidentifier(), f"{attrs} is not a valid identifier"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should raise proper exceptions, instead of assertion errors, for issues related to the user provided inputs. (the older asserts were there to guide type checkers).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that @pradyunsg
By the way, is the discussion here satisfactory? In brief, they have confirmed that our proposed change here is standards-compliant (more so than the original one) and note that other tools in the ecosystem may require a similar patch as well for unicode to fully function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should raise proper exceptions, instead of assertion errors, for issues related to the user provided inputs. (the older asserts were there to guide type checkers).

Is there a particular Exception class you would recommend?


script_section = cast("ScriptSection", section[: -len("_scripts")])

Expand Down
20 changes: 20 additions & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,26 @@ class TestParseEntryPoints:
],
id="cli-and-gui",
),
pytest.param(
"""
[console_scripts]
நான் = ஓர்.ஒருங்குறி:கட்டளை
""",
[
("நான்", "ஓர்.ஒருங்குறி", "கட்டளை", "console"),
],
id="unicode",
),
pytest.param(
"""
[console_scripts]
நான் = ஓர்.ஒருங்குறி:கட்டளை[கூடுதல்]
""",
[
("நான்", "ஓர்.ஒருங்குறி", "கட்டளை", "console"),
],
id="unicode with extras",
),
],
)
def test_valid(self, script, expected):
Expand Down