-
Notifications
You must be signed in to change notification settings - Fork 344
Eng/pr 84049330 #3482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: experimental/cas/main
Are you sure you want to change the base?
Eng/pr 84049330 #3482
Conversation
llvm/tools/llvm-cas/llvm-cas.cpp
Outdated
Optional<PrefixMapper> PM; | ||
if (!Mappings.empty()) { | ||
PM.emplace(Saver); | ||
PM->addRange(Mappings); | ||
PM->sort(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PrefixMapper won't handle symlinks correctly, and it's also necessarily safe to sort. Can you add a FIXME to use RealPathPrefixMapper, and also explain in the FIXME why it didn't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why PrefixMapper doesn't work on symlinks correctly with CachingOnDiskFileSystem? The testcase shows it works fine so it would be good if you have some counter example on your mind. Looking at the clang depscan implementation, it is basically using PrefixMapper, not RealPathPrefixMapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why RealPathPrefixMapper
doesn't work. Imagine following test case:
/symlink -> /dir
/dir/file
When building the tree in CAS, realPath(symlink)
give you dir, so you try to write a symlink node at path dir
, then you cannot write /dir/file to the tree since /dir is node now, not a tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the clang depscan implementation, it is basically using PrefixMapper, not RealPathPrefixMapper.
I copied this logic from updateCompilerInvocation() in MappingHelper.h (still need to go back and refactor that to use RealPathPrefixMapper).
In updateCompilerInvocation(), getRealPath()
is called in the remap
lambda. The remap
lambda is used by all the other remap logic.
Is RealPathPrefixMapper doing something different from that? If so, maybe there was a bug when I extracted the logic.
Do you know why PrefixMapper doesn't work on symlinks correctly with CachingOnDiskFileSystem?
No, I don't remember. We should probably work through some specific cases and figure out desired outcomes. E.g.,
symlink/=new/
wheresymlink -> old
; shouldold/X
get remapped? (I think so...)old/=new/
wheresymlink -> old
; shouldsymlink
be left alone, or be updated to targetnew
?old/symlink -> ../escape
andold/=new/
; what shouldold/symlink/file
get remapped to?old/symlink -> file
andold/=new/
; what shouldold/symlink
get remapped to?
Why
RealPathPrefixMapper
doesn't work. Imagine following test case:/symlink -> /dir /dir/file
When building the tree in CAS,
realPath(symlink)
give you dir, so you try to write a symlink node at pathdir
, then you cannot write /dir/file to the tree since /dir is node now, not a tree.
What are the prefix maps in your test case? (Without the missing input I'm not sure how to think through the algorithm.)
I'm not doubting that there's a problem BTW; but if this is using different logic from tablegen (which uses RealPathPrefixMapper) and clang (which uses equivalent logic) because the logic there is buggy, then I'd like to figure out exactly what the bug is and fix it (or add clear/precise FIXMEs), not just work around the problem in llvm-cas
.
It could be the interaction between the remapper and the CachingOnDiskFileSystem is insufficient right now. Or maybe the getRealPath
logic should be deleted from clang and tablegen. I just don't feel comfortable diverging with no FIXMEs documenting why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I remember the primary purpose of the getRealPath()
. I think that without that, it's not really safe to sort the prefixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply reproduce it with the test case I added in this commit (replacing with RealPathPrefixMapper). I guess it is probably clang/tablegen do not have a direct access to the symlink that is a directory so it was never a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I remember the primary purpose of the
getRealPath()
. I think that without that, it's not really safe to sort the prefixes.
Sorting the prefixes allows both of these to work:
--prefix-map=path/to/source/build=/^build --prefix-map=path/to/source=/^source
(build nested in source)--prefix-map=path/to/build=/^build --prefix-map=path/to/build/source=/^source
(source nested in build)
I.e., build and source can be nested inside each other, and the tool passing--prefix-map
doesn't need to know about it and can specify in either order.
More specifically, the original motivation for calling getRealPath()
in remap
in MappingHelper.h was to support the implementation of -DLLVM_EXPERIMENTAL_ENABLE_DEPSCAN=ON
, which passes -fdepscan-prefix-map
in a fixed order:
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map-sdk=/^sdk" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map-toolchain=/^toolchain" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map=${source_root}=/^source" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map=${CMAKE_BINARY_DIR}=/^build" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking in more detail at Clang:
computeFullMapping()
(which interprets-fdepscan-prefix-map*
) usesgetRealPath()
to set up the prefixes (like RealPathPrefixMapper).cc1depscand_main.cpp
's call togetDependencyTreeFromCompilerInvocation()
passes the plain-oldremapPath()
toCachingOnDiskFileSystem::createTreeFromNewAccesses()
, but:CachingOnDiskFileSystem::createTreeFromNewAccesses()
callsgetRealPath()
on all the entries. It uses the RemapPath callback on the result ofgetRealPath()
before pushing them to the builder. This probably meansRealPathPrefixMapper
andPrefixMapper
would be equivalent here, and the only difference is in the setup of the prefixes.
updateCompileInvocation()
(which updates the-cc1
) usesgetRealPath()
, as I described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more clarification after looking deeper:
CachingOnDiskFileSystem::createTreeFromNewAccesses()
callsgetRealPath()
on all the entries.
It calls DirectoryEntry::getRealPath()
, which is a no-follow-symlink version of getRealPath()
.
- This is the actual path in the filesystem graph to this
DirectoryEntry
. - A better name would be
getDirectPath()
orgetActualPath()
, maybe. - Below, I'll call this
getDirectPath()
. - Outside of CachingOnDiskFileSystem, this could be computed by calling
FS.getRealPath()
on the parent directory of an object, then re-appending the object.
Here's an idea I have for a consistent model:
- To set up the prefix map, each prefix goes through
getDirectPath()
logic. Looking at examples where we have/path/to/symlink -> target
:/path/to/symlink/file=/new
should be stored as/path/to/target/file=/new
(remap symlink's target)/path/to/symlink/=/new
should be stored as/path/to/target=/new
(symlink dereferenced, so remap its target)/path/to/symlink=/new
should be stored as/path/to/symlink=/new
(mapping the symlink itself!)
- Improve prefix-mapping to remap relative paths as relative paths (when possible), taking both a path and a context-dir. E.g., assuming input path is "target" and context-dir is "/path/to", then:
- For the prefix map
/path/to/target=/new
, mapped path is/new
- For the prefix map
/path/to/target=/path/to/new
, mapped path isnew
. - For the prefix map
/path/to=/new
, mapped path istarget
.
- For the prefix map
- In
createTreeFromNewAccesses()
.- Use
getDirectPath()
(as now) to compare against the prefixes. - Symlink targets should be remapped, using the new relative-path aware remapping. E.g., assuming:
/path/to/symlink -> target
- For the prefix map
/path/to/target=/new
, the remapped symlink should be/path/to/symlink -> /new
- For the prefix map
/path/to=/new
, the remapped symlink should be/new/symlink -> target
- For the prefix map
- Use
- Other uses of the prefix map should the same logic.
- Compare against prefixes using the "direct path".
- Spit out a relative path when possible.
WDYT of the model?
In terms of implementation, probably we need some help from CachingOnDiskFileSystem to make the "relative path" adjustment and "getdirectpath" bits efficient. I am also not sure the "relative path" bits are necessary; could just be an aesthetic choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply reproduce it with the test case I added in this commit (replacing with RealPathPrefixMapper). I guess it is probably clang/tablegen do not have a direct access to the symlink that is a directory so it was never a problem?
Well, clang doesn't use it yet, and tablegen never sees symlinks I think.
BTW, in case it wasn't clear above, happy to approve this once there are good FIXMEs describing why RealPathPrefixMapper doesn't work. I suggest:
- Add a FIXME to the
RealPathPrefixMapper
header docs explaining in precise language what goes wrong. - Add a FIXME to here, your use of
PrefixMapper
, explaining that RealPathPrefixMapper is broken for symlinks and pointing at it.
@cachemeifyoucan , I just sent you a PR for a different utility that I'm hoping will work: #3496 Once you commit, I can hopefully resolve the FIXMEs using that. Or, I can push first and you can rebase on top. |
Teach `llvm-cas` to canonicalize the file path using a prefix map. rdar://84049330
a5c7cb4
to
07d464a
Compare
Add
--prefix-map
option tollvm-cas
tool for file system ingestion.