Skip to content

properly define what "canonical" means in os.path.realpath #134639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
calestyo opened this issue May 24, 2025 · 2 comments · May be fixed by #134755
Open

properly define what "canonical" means in os.path.realpath #134639

calestyo opened this issue May 24, 2025 · 2 comments · May be fixed by #134755
Labels
docs Documentation in the Doc dir

Comments

@calestyo
Copy link
Contributor

calestyo commented May 24, 2025

Documentation

The documentation says:

Return the canonical path of the specified filename, eliminating any symbolic
links encountered in the path (if they are supported by the operating
system). On Windows, this function will also resolve MS-DOS (also called 8.3)
style names such as ``C:\\PROGRA~1`` to ``C:\\Program Files``.

So it merely mentions symlink resolution, but not what else it means (if anything).

It does however mention that whatever it does is OS dependent:

This function emulates the operating system's procedure for making a path
canonical, which differs slightly between Windows and UNIX with respect
to how links and subsequent path components interact.

IMO, it should be clearly documented what it does or at least:

  • what it guarantees to the least (For example will the pathname be absolute? Will it be normalised?)
    and/or:
  • which OS function it’s identical to, like on POSIX where that would probably realpath(), and whether there are differences to that

Especially since IMO "canonical" pathname (though I think it has no formal definition, or is there anything in POSIX?) means rather just that there are no symlinks left, i.e. that the file is "reached" by its true (=canonical) name.

But that could still be a relative pathname, and perhaps even one that is not normalised.

Cheers,
Chris.

Linked PRs

@calestyo calestyo added the docs Documentation in the Doc dir label May 24, 2025
@terryjreedy
Copy link
Member

abspath returns "a normalize absolutized path", implying that there can be more than one for a given file. I undestand realpath to return the canonical abspath, so that path-a and path-b are the same file iff realpath(path-a) == realpath(path-b).

@calestyo
Copy link
Contributor Author

calestyo commented May 24, 2025

Hmm. I think even abspath will always return the same result (for the same string), so the a is perhaps also a bit misleading as there are not more than one possible result.

It does however not resolve symbolic links, so the result may simply point to another pathname (as documented).

Your example is true, though IMO canonical does not mean whether or not the resulting paths are then relative (I would however say the word implies normalised).

In your example, assuming a cwd of /home and a path-a of /tmp and a path-b of ../tmp, the equality would still be true if realpath would return ../tmp or /../tmp.

From my testing, realpath always returns absolute and normalised pathnames (as well as any symlinks being resolved, which is however already documented).


What I'm a bit unsure about how to deal with is, that it also normalises absolute pathnames with exactly two leading /:

>>> os.path.realpath("//tmp",strict=True)
'/tmp'

Which in pure POSIX is IMO wrong, as // has a special meaning (though Linux - and probably all others? - handles it like normal paths).

Some Python library parts do in fact account for this, e.g.

>>> import pathlib
>>> pathlib.Path("//tmp").parts
('//', 'tmp')

>>> pathlib.Path("/tmp").parts
('/', 'tmp')

>>> pathlib.Path("///tmp").parts
('/', 'tmp')

So I'm tempted to say, that os.path.realpath("//tmp",strict=True) should yield '//tmp'.

Unless anyone strongly disagrees, I'd open a separate issue for that particular aspect.

ashm-dev added a commit to ashm-dev/cpython that referenced this issue May 26, 2025
Improve documentation for os.path.realpath by clearly defining what a
"canonical path" means. The updated documentation now explicitly states
that a canonical path:
- Is an absolute path
- Has all symbolic links resolved
- Is normalized (redundant separators, '.' and '..' components removed)

Also clarify platform-specific behavior:
- On Windows: resolves MS-DOS (8.3) style names and junction points
- On POSIX: roughly equivalent to the system's realpath() function

Closes python#134639
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

2 participants