Skip to content

[Clang] [Diagnostics] Simplify filenames that contain '..' #143520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,9 @@
// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
// RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' -quiet %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s

// Check that `-header-filter` operates on the same file paths as paths in
// diagnostics printed by ClangTidy.
// `-header-filter` operates on the actual file path that the user provided in
// the #include directive; however, Clang's path name simplification causes the
// path to be printed in canonicalised form here.
#include "dir1/dir2/../header_alias.h"
// CHECK_HEADER_ALIAS: dir1/dir2/../header_alias.h:1:11: warning: single-argument constructors
// CHECK_HEADER_ALIAS: dir1/header.h:1:11: warning: single-argument constructors
// CHECK_HEADER-NOT: warning:
16 changes: 16 additions & 0 deletions clang/include/clang/Basic/SourceManager.h
Original file line number Diff line number Diff line change
Expand Up @@ -824,6 +824,12 @@ class SourceManager : public RefCountedBase<SourceManager> {

mutable std::unique_ptr<SrcMgr::SLocEntry> FakeSLocEntryForRecovery;

/// Cache for filenames used in diagnostics. See 'getNameForDiagnostic()'.
mutable llvm::StringMap<StringRef> DiagNames;

/// Allocator for absolute/short names.
mutable llvm::BumpPtrAllocator DiagNameAlloc;

/// Lazily computed map of macro argument chunks to their expanded
/// source location.
using MacroArgsMap = std::map<unsigned, SourceLocation>;
Expand Down Expand Up @@ -1848,6 +1854,16 @@ class SourceManager : public RefCountedBase<SourceManager> {
/// \return Location of the top-level macro caller.
SourceLocation getTopMacroCallerLoc(SourceLocation Loc) const;

/// Retrieve the name of a file suitable for diagnostics.
// FIXME: Passing in the DiagnosticOptions here is a workaround for the
// fact that installapi does some weird things with DiagnosticsEngines,
// which causes the 'Diag' member of SourceManager (or at least the
// DiagnosticsOptions member thereof) to be a dangling reference
// sometimes. We should probably fix that or decouple the two classes
// to avoid this issue entirely.
StringRef getNameForDiagnostic(StringRef Filename,
const DiagnosticOptions &Opts) const;

private:
friend class ASTReader;
friend class ASTWriter;
Expand Down
72 changes: 72 additions & 0 deletions clang/lib/Basic/SourceManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2390,3 +2390,75 @@ SourceManagerForFile::SourceManagerForFile(StringRef FileName,
assert(ID.isValid());
SourceMgr->setMainFileID(ID);
}

StringRef
SourceManager::getNameForDiagnostic(StringRef Filename,
const DiagnosticOptions &Opts) const {
OptionalFileEntryRef File = getFileManager().getOptionalFileRef(Filename);
if (!File)
return Filename;

bool SimplifyPath = [&] {
if (Opts.AbsolutePath)
return true;

// Try to simplify paths that contain '..' in any case since paths to
// standard library headers especially tend to get quite long otherwise.
// Only do that for local filesystems though to avoid slowing down
// compilation too much.
if (!File->getName().contains(".."))
return false;

// If we're not on Windows, check if we're on a network file system and
// avoid simplifying the path in that case since that can be slow. On
// Windows, the check for a local filesystem is already slow, so skip it.
#ifndef _WIN32
if (!llvm::sys::fs::is_local(File->getName()))
return false;
#endif

return true;
}();

if (!SimplifyPath)
return Filename;
Comment on lines +2423 to +2424
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be more efficient to check the map first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure. This is a perf/ branch anyway so when I’m done fixing the tests (I think it’s just a clang-tidy test that’s left at this point?) we can just check if there are any regressions and try doing this instead if so.


// This may involve computing canonical names, so cache the result.
StringRef &CacheEntry = DiagNames[Filename];
if (!CacheEntry.empty())
return CacheEntry;

// We want to print a simplified absolute path, i. e. without "dots".
//
// The hardest part here are the paths like "<part1>/<link>/../<part2>".
// On Unix-like systems, we cannot just collapse "<link>/..", because
// paths are resolved sequentially, and, thereby, the path
// "<part1>/<part2>" may point to a different location. That is why
// we use FileManager::getCanonicalName(), which expands all indirections
// with llvm::sys::fs::real_path() and caches the result.
//
// On the other hand, it would be better to preserve as much of the
// original path as possible, because that helps a user to recognize it.
// real_path() expands all links, which sometimes too much. Luckily,
// on Windows we can just use llvm::sys::path::remove_dots(), because,
// on that system, both aforementioned paths point to the same place.
SmallString<256> TempBuf;
#ifdef _WIN32
TempBuf = File->getName();
llvm::sys::fs::make_absolute(TempBuf);
llvm::sys::path::native(TempBuf);
llvm::sys::path::remove_dots(TempBuf, /* remove_dot_dot */ true);
#else
TempBuf = getFileManager().getCanonicalName(*File);
#endif
Comment on lines +2446 to +2453
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of pre-existing, but I am not sure doing something different for windows actually make sense.
Symlinks on Windows exist, they are just very rare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I was interpreting the comment above this to mean that Windows itself doesn’t care about symlinks when resolving .. and actually just deletes preceding path segments, but I’m not much of a Windows person so I don’t know to be fair...


// In some cases, the resolved path may actually end up being longer (e.g.
// if it was originally a relative path), so just retain whichever one
// ends up being shorter.
if (!Opts.AbsolutePath && TempBuf.size() > Filename.size())
CacheEntry = Filename;
else
CacheEntry = TempBuf.str().copy(DiagNameAlloc);

return CacheEntry;
}
31 changes: 1 addition & 30 deletions clang/lib/Frontend/SARIFDiagnostic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,36 +163,7 @@ SARIFDiagnostic::addDiagnosticLevelToRule(SarifRule Rule,

llvm::StringRef SARIFDiagnostic::emitFilename(StringRef Filename,
const SourceManager &SM) {
if (DiagOpts.AbsolutePath) {
auto File = SM.getFileManager().getOptionalFileRef(Filename);
if (File) {
// We want to print a simplified absolute path, i. e. without "dots".
//
// The hardest part here are the paths like "<part1>/<link>/../<part2>".
// On Unix-like systems, we cannot just collapse "<link>/..", because
// paths are resolved sequentially, and, thereby, the path
// "<part1>/<part2>" may point to a different location. That is why
// we use FileManager::getCanonicalName(), which expands all indirections
// with llvm::sys::fs::real_path() and caches the result.
//
// On the other hand, it would be better to preserve as much of the
// original path as possible, because that helps a user to recognize it.
// real_path() expands all links, which is sometimes too much. Luckily,
// on Windows we can just use llvm::sys::path::remove_dots(), because,
// on that system, both aforementioned paths point to the same place.
#ifdef _WIN32
SmallString<256> TmpFilename = File->getName();
llvm::sys::fs::make_absolute(TmpFilename);
llvm::sys::path::native(TmpFilename);
llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
Filename = StringRef(TmpFilename.data(), TmpFilename.size());
#else
Filename = SM.getFileManager().getCanonicalName(*File);
#endif
}
}

return Filename;
return SM.getNameForDiagnostic(Filename, DiagOpts);
}

/// Print out the file/line/column information and include trace.
Expand Down
34 changes: 1 addition & 33 deletions clang/lib/Frontend/TextDiagnostic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -738,39 +738,7 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
}

void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
#ifdef _WIN32
SmallString<4096> TmpFilename;
#endif
if (DiagOpts.AbsolutePath) {
auto File = SM.getFileManager().getOptionalFileRef(Filename);
if (File) {
// We want to print a simplified absolute path, i. e. without "dots".
//
// The hardest part here are the paths like "<part1>/<link>/../<part2>".
// On Unix-like systems, we cannot just collapse "<link>/..", because
// paths are resolved sequentially, and, thereby, the path
// "<part1>/<part2>" may point to a different location. That is why
// we use FileManager::getCanonicalName(), which expands all indirections
// with llvm::sys::fs::real_path() and caches the result.
//
// On the other hand, it would be better to preserve as much of the
// original path as possible, because that helps a user to recognize it.
// real_path() expands all links, which sometimes too much. Luckily,
// on Windows we can just use llvm::sys::path::remove_dots(), because,
// on that system, both aforementioned paths point to the same place.
#ifdef _WIN32
TmpFilename = File->getName();
llvm::sys::fs::make_absolute(TmpFilename);
llvm::sys::path::native(TmpFilename);
llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
Filename = StringRef(TmpFilename.data(), TmpFilename.size());
#else
Filename = SM.getFileManager().getCanonicalName(*File);
#endif
}
}

OS << Filename;
OS << SM.getNameForDiagnostic(Filename, DiagOpts);
}

/// Print out the file/line/column information and include trace.
Expand Down
6 changes: 3 additions & 3 deletions clang/test/Frontend/absolute-paths.c
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@

#include "absolute-paths.h"

// Check whether the diagnostic from the header above includes the dummy
// directory in the path.
// NORMAL: SystemHeaderPrefix
// Check that the bogus prefix we added is stripped out even if absolute paths
// are disabled.
// NORMAL-NOT: SystemHeaderPrefix
// ABSOLUTE-NOT: SystemHeaderPrefix
// CHECK: warning: non-void function does not return a value

Expand Down
18 changes: 18 additions & 0 deletions clang/test/Frontend/simplify-paths.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// REQUIRES: shell

// RUN: rm -rf %t
// RUN: mkdir -p %t/a/b/
// RUN: echo 'void x;' > %t/test.h
// RUN: echo 'const void x;' > %t/header_with_a_really_long_name.h
// RUN: ln -s %t/header_with_a_really_long_name.h %t/a/shorter_name.h
//
// RUN: %clang_cc1 -fsyntax-only -I %t %s 2> %t/output.txt || true
// RUN: cat %t/output.txt | FileCheck %s

// Check that we strip '..' by canonicalising the path...
#include "a/b/../../test.h"
// CHECK: simplify-paths.c.tmp/test.h:1:6: error: variable has incomplete type 'void'

// ... but only if the resulting path is actually shorter.
#include "a/b/../shorter_name.h"
// CHECK: simplify-paths.c.tmp/a/b/../shorter_name.h:1:12: error: variable has incomplete type 'const void'
Loading