Skip to content

Eng/pr 84049330 #3482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: experimental/cas/main
Choose a base branch
from

Conversation

cachemeifyoucan
Copy link

@cachemeifyoucan cachemeifyoucan commented Oct 28, 2021

Add --prefix-map option to llvm-cas tool for file system ingestion.

Comment on lines 462 to 451
Optional<PrefixMapper> PM;
if (!Mappings.empty()) {
PM.emplace(Saver);
PM->addRange(Mappings);
PM->sort();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrefixMapper won't handle symlinks correctly, and it's also necessarily safe to sort. Can you add a FIXME to use RealPathPrefixMapper, and also explain in the FIXME why it didn't work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why PrefixMapper doesn't work on symlinks correctly with CachingOnDiskFileSystem? The testcase shows it works fine so it would be good if you have some counter example on your mind. Looking at the clang depscan implementation, it is basically using PrefixMapper, not RealPathPrefixMapper.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why RealPathPrefixMapper doesn't work. Imagine following test case:

/symlink -> /dir
/dir/file

When building the tree in CAS, realPath(symlink) give you dir, so you try to write a symlink node at path dir, then you cannot write /dir/file to the tree since /dir is node now, not a tree.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the clang depscan implementation, it is basically using PrefixMapper, not RealPathPrefixMapper.

I copied this logic from updateCompilerInvocation() in MappingHelper.h (still need to go back and refactor that to use RealPathPrefixMapper).

In updateCompilerInvocation(), getRealPath() is called in the remap lambda. The remap lambda is used by all the other remap logic.

Is RealPathPrefixMapper doing something different from that? If so, maybe there was a bug when I extracted the logic.

Do you know why PrefixMapper doesn't work on symlinks correctly with CachingOnDiskFileSystem?

No, I don't remember. We should probably work through some specific cases and figure out desired outcomes. E.g.,

  • symlink/=new/ where symlink -> old; should old/X get remapped? (I think so...)
  • old/=new/ where symlink -> old; should symlink be left alone, or be updated to target new?
  • old/symlink -> ../escape and old/=new/; what should old/symlink/file get remapped to?
  • old/symlink -> file and old/=new/; what should old/symlink get remapped to?

Why RealPathPrefixMapper doesn't work. Imagine following test case:

/symlink -> /dir
/dir/file

When building the tree in CAS, realPath(symlink) give you dir, so you try to write a symlink node at path dir, then you cannot write /dir/file to the tree since /dir is node now, not a tree.

What are the prefix maps in your test case? (Without the missing input I'm not sure how to think through the algorithm.)

I'm not doubting that there's a problem BTW; but if this is using different logic from tablegen (which uses RealPathPrefixMapper) and clang (which uses equivalent logic) because the logic there is buggy, then I'd like to figure out exactly what the bug is and fix it (or add clear/precise FIXMEs), not just work around the problem in llvm-cas.

It could be the interaction between the remapper and the CachingOnDiskFileSystem is insufficient right now. Or maybe the getRealPath logic should be deleted from clang and tablegen. I just don't feel comfortable diverging with no FIXMEs documenting why.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I remember the primary purpose of the getRealPath(). I think that without that, it's not really safe to sort the prefixes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply reproduce it with the test case I added in this commit (replacing with RealPathPrefixMapper). I guess it is probably clang/tablegen do not have a direct access to the symlink that is a directory so it was never a problem?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I remember the primary purpose of the getRealPath(). I think that without that, it's not really safe to sort the prefixes.

Sorting the prefixes allows both of these to work:

  • --prefix-map=path/to/source/build=/^build --prefix-map=path/to/source=/^source (build nested in source)
  • --prefix-map=path/to/build=/^build --prefix-map=path/to/build/source=/^source (source nested in build)
    I.e., build and source can be nested inside each other, and the tool passing --prefix-map doesn't need to know about it and can specify in either order.

More specifically, the original motivation for calling getRealPath() in remap in MappingHelper.h was to support the implementation of -DLLVM_EXPERIMENTAL_ENABLE_DEPSCAN=ON, which passes -fdepscan-prefix-map in a fixed order:

append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map-sdk=/^sdk" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map-toolchain=/^toolchain" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map=${source_root}=/^source" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
append_if(SUPPORTS_DEPSCAN "-fdepscan-prefix-map=${CMAKE_BINARY_DIR}=/^build" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking in more detail at Clang:

  • computeFullMapping() (which interprets -fdepscan-prefix-map*) uses getRealPath() to set up the prefixes (like RealPathPrefixMapper).
  • cc1depscand_main.cpp's call to getDependencyTreeFromCompilerInvocation() passes the plain-old remapPath() to CachingOnDiskFileSystem::createTreeFromNewAccesses(), but:
    • CachingOnDiskFileSystem::createTreeFromNewAccesses() calls getRealPath() on all the entries. It uses the RemapPath callback on the result of getRealPath() before pushing them to the builder. This probably means RealPathPrefixMapper and PrefixMapper would be equivalent here, and the only difference is in the setup of the prefixes.
  • updateCompileInvocation() (which updates the -cc1) uses getRealPath(), as I described above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more clarification after looking deeper:

CachingOnDiskFileSystem::createTreeFromNewAccesses() calls getRealPath() on all the entries.

It calls DirectoryEntry::getRealPath(), which is a no-follow-symlink version of getRealPath().

  • This is the actual path in the filesystem graph to this DirectoryEntry.
  • A better name would be getDirectPath() or getActualPath(), maybe.
  • Below, I'll call this getDirectPath().
  • Outside of CachingOnDiskFileSystem, this could be computed by calling FS.getRealPath() on the parent directory of an object, then re-appending the object.

Here's an idea I have for a consistent model:

  1. To set up the prefix map, each prefix goes through getDirectPath() logic. Looking at examples where we have /path/to/symlink -> target:
    • /path/to/symlink/file=/new should be stored as /path/to/target/file=/new (remap symlink's target)
    • /path/to/symlink/=/new should be stored as /path/to/target=/new (symlink dereferenced, so remap its target)
    • /path/to/symlink=/new should be stored as /path/to/symlink=/new (mapping the symlink itself!)
  2. Improve prefix-mapping to remap relative paths as relative paths (when possible), taking both a path and a context-dir. E.g., assuming input path is "target" and context-dir is "/path/to", then:
    • For the prefix map /path/to/target=/new, mapped path is /new
    • For the prefix map /path/to/target=/path/to/new, mapped path is new.
    • For the prefix map /path/to=/new, mapped path is target.
  3. In createTreeFromNewAccesses().
    • Use getDirectPath() (as now) to compare against the prefixes.
    • Symlink targets should be remapped, using the new relative-path aware remapping. E.g., assuming: /path/to/symlink -> target
      • For the prefix map /path/to/target=/new, the remapped symlink should be /path/to/symlink -> /new
      • For the prefix map /path/to=/new, the remapped symlink should be /new/symlink -> target
  4. Other uses of the prefix map should the same logic.
    • Compare against prefixes using the "direct path".
    • Spit out a relative path when possible.

WDYT of the model?

In terms of implementation, probably we need some help from CachingOnDiskFileSystem to make the "relative path" adjustment and "getdirectpath" bits efficient. I am also not sure the "relative path" bits are necessary; could just be an aesthetic choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply reproduce it with the test case I added in this commit (replacing with RealPathPrefixMapper). I guess it is probably clang/tablegen do not have a direct access to the symlink that is a directory so it was never a problem?

Well, clang doesn't use it yet, and tablegen never sees symlinks I think.

BTW, in case it wasn't clear above, happy to approve this once there are good FIXMEs describing why RealPathPrefixMapper doesn't work. I suggest:

  • Add a FIXME to the RealPathPrefixMapper header docs explaining in precise language what goes wrong.
  • Add a FIXME to here, your use of PrefixMapper, explaining that RealPathPrefixMapper is broken for symlinks and pointing at it.

@dexonsmith
Copy link

@cachemeifyoucan , I just sent you a PR for a different utility that I'm hoping will work: #3496

Once you commit, I can hopefully resolve the FIXMEs using that. Or, I can push first and you can rebase on top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants