-
Notifications
You must be signed in to change notification settings - Fork 382
.glob("**/filename")
returns incorrect results
#1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This should be fixed as explained above. However, I was thinking of changing the URL scheme (or adding an alternative) as /user-or-org/repo/branch-tag-or-commit/path/file rather than using special characters (:, @) and/or kwargs. This is long-winded, but given dirFS, we can always make a sub filesystem at a specific repo/version without dealing with complex URL deconstruction. It would also fix this specific issue for githubFS. |
I thought glob.glob doesn't support "**" at all? |
I used githubFS for the sake of example - this bug was initially spotted here. Since there are many more such filesystems (I assume), fixing the conversion in
It does with |
Actually, no. Most (all?) filesystems not requiring an initial "/" are the remote key-value stores (s3fs, gcsfs, abfs), where you are not permitted to do a glob/find at the top level, because that would imply considering all files in all buckets. |
But this is allowed for the archive filesystems such as |
Opened a PR that relies on |
Thanks, will look. |
Reproducer (requires
fsspec>=2023.9.0
):**/
is converted to.*
in theAbstactFileSystem.glob
, so the result contains filenames that don't start witheval
, making it incorrect.This bug can only be reproduced on filesystems that don't resolve the glob path internally, as prefixing the path with
<base_path>/
leads to/**
taking precedence over**/
in the "glob to regex" conversion, which stops**/
from being replaced by.*
.Maybe the most robust solution would be to align the implementation with
glob.glob
(behaves as expected in this instance) to stop such inconsistencies from happening in the future. The non-posix behavior is error-prone, so dropping it should be fine, no? I'm happy to submit a PR if this sounds good to you (based on python/cpython#106703).The text was updated successfully, but these errors were encountered: